return to pancake.io
pancake.io :: blog

Goodbye, Heroku

Ping Wong

Two weeks ago, Pancake.io was moved from Heroku to Rackspace. This decision was partly financially motivated: the ease of deployment on Heroku comes at a premium. More importantly, however, moving to our own server enables us to work on features that were previously impossible in Heroku's sandboxed environment.

What follows are some things we had to learn on the way there.

1. Process Management

Moving to your own server means having to worry about your keeping your services alive. There are many very-similar tools that can help with this; this and this are both excellent overviews.

We ended up going with Upstart, partly because it was already installed as part of Ubuntu, and partly because it comes extensive instructions in the Upstart cookbook. Most Upstart config files, which goes in /etc/init/, are as simple as this:

description "my-app"

start on runlevel [2345]
stop on runlevel [016]

# upstart can track processes that fork or daemonize
expect fork

# restart service if it goes down
# if respawned > 3 times in 5 seconds, stop trying

respawn
respawn limit 3 5

exec /usr/local/bin/my-app

One common requirement is to run a service as a running as a different user (for security) and to put configuration settings in the environment. The way I accomplish this is to place environment variables in a file that gets sourced in the ~/.profile file of the user I want to use, and then have something similar to the following in the init script:

exec su -l appuser "/usr/local/bin/my-app"

The -l tells su to run the command as appuser in login mode, which sources that user's ~/.profile file if it exists (though only if ~/.bash_profile and ~/.login don't exist). Alternatively, you could also load the environment from a service instead of reading them from disk, for example if you have a centralized source for managing these variables.

Another thing I found helpful was to run the commands in foreground mode where possible, as processes which fork or daemonize require a bit more finessing to get right. This can lead to situations difficult to extricate onself from.

2. Directory Conventions

Having conventions for where to put things makes it easier to manage multiple servers. Pancake.io uses the following

path source
/usr/local/lib/<package name> downloaded source code
/usr/local/etc/<package name> configuration files
/usr/local/bin binaries symlinked from lib into here
/var/logs/<package name> logs
/usr/local/<package name> service-specific folders

This is also helpful when using a release other than the one provided by the operating system's package manager.

3. SSH keys

Avoiding having to type your password over and over with SSH key authorization is common practice. Github has good instructions for generating a ssh key (step 1 and step 2). Copy the ~/.ssh/id_rsa.pub you generated from your local machine and append the contents to the ~/.ssh/authorized_keys file (note: not a folder) on the remote server. If you have multiple ssh keys on your local computer, you can use ssh -A when you ssh to the remote server, and this will allow ssh connections from the remote host to use the ssh keys on your local computer.

Also, setting up aliases in your local ~/.ssh/config file saves the trouble of having to remember IP addresses, e.g.

# in ~/.ssh/config
Host mywebserver
  User web
  HostName 111.111.111.111

4. Uncomplicated Firewall

I use ufw, an uncomplicated firewall, for a little extra dose of security. Pancake.io's servers are mostly closed to incoming traffic. You can also set it up to only allow connections from certain hosts, which is useful for backend servers.

$ ufw default deny
$ ufw allow http
$ ufw allow https
$ ufw allow ssh
$ ufw --force enable

5. logrotate

logrotate is essential for keeping logs manageable. It turns your logging into a sort of circular buffer, and you get to determine whether logs are rotated on a fixed interval, or based on their filesize. Logrotate is also happy to gzip files for you (which you can search with zgrep). For bonus points, archive them to s3.

This is what a basic logrotate conf file looks like:

/var/log/myservice/*.log {
  daily
  missingok
  rotate 60
  compress
  notifempty
  nocreate
}

6. Repeatable Deploys

Both deploying new servers and new code should be easy and repeatable. The last thing you want to is to have a server fail and to not know how to set up a new one.

Like process management, there are a lot of options, but a homemade Fabric script has done the job so far. Here's a neat little snippet that shows what changes will be applied in an app deployment:

horizontal_line = '\n' + '-'*78

# fmt interpolates some shared variables
fmt = lambda s: s.format(**opts)

with cd(fmt('{prefix}{git_dir}')):
  with hide('output'):
    current_ref = run(fmt(
      'git show-ref | grep refs\/heads\/{app_branch}$ | cut -c1-12'))

  if len(current_ref) > 0:
    run(fmt('git symbolic-ref HEAD refs/heads/{app_branch}'))
    run(fmt('git fetch -q origin {app_branch}'))
    remote_ref = run(fmt(
      'git ls-remote origin {app_branch} | cut -c1-12'))
    remote_ref = re.compile(r'[\r\n]').split(remote_ref)[-1]

  print horizontal_line
  print 'Current ref:', cyan(current_ref)
  print 'Remote ref: ', yellow(remote_ref)

  if current_ref == remote_ref
    if not console.confirm('No updates found, continue?'):
      return

  else:
    print horizontal_line
    print white('Changelog', bold=True)
    run(
      'git log %s..%s --no-merges --color --oneline --abbrev=12'
      % (current_ref, remote_ref))

    if not console.confirm('Continue?'):
      return

  run(fmt('git fetch -q origin {app_branch}:{app_branch} -f'))
  run(fmt('git clone . {prefix}{release_dir} -b {app_branch}'))