Building, packaging & deploying Python using versioned artifacts in Debian packages.
We love Python at Nylas. The syntax is simple and expressive, there are tons of open source modules and frameworks available, and the community is welcoming and diverse. Our backend is written exclusively in Python, and our team frequently givestalks at PyCon and meetups. You could say we are super fans.
However, one of Python’s big drawbacks is a lack of clear tools for deploying Python server apps. The state of the art seems to be “run git pull and pray”, which is not an option when users depend on your app. Python deployment becomes even more complicated when your app has a lot of dependencies that are also moving. This HN comment sums up the deplorable state of deploying Python.
Why, after so many years, there is no way for me to ship software written in python, in deb format? — Frustrated HN User
At Nylas, we’ve developed a better way to deploy Python code along with its dependencies, resulting in lightweight packages that can be easily installed, upgraded, or removed. And we’ve done it without transitioning our entire stack to a system like Docker, CoreOS, or fully-baked AMIs.
Baby’s first python deployment: git & pip
Python offers a rich ecosystem of modules. Whether you’re building a web server or a machine learning classifier, there’s probably a module to help you get started. Today’s standardized way of getting these modules is via pip, which downloads and installs from the Python Package Index (aka PyPI). This is just like apt, yum, rubygem, etc.
Most people set up their development environment by first cloning the code using git, and then installing dependencies via pip. So it makes sense why this is also how most people first try to deploy their code. A deploy script might look something like this:
But when deploying large production services, this strategy breaks down for several reasons:
pip does not offer a “revert deploy” strategy
Running pip uninstalldoesn’t always work properly, and there’s no way to “rollback” to a previous state. Virtualenv could help with this, but it’s really not built for managing a history of environments.
Installing dependencies with pip can make deploys painfully slow
Calling pip install for a module with C extensions will often build it from source, which can take on the order of minutes to complete for a new virtualenv. Deploys should be a fast lightweight process, taking on the order of seconds.
Building your code separately on each host will cause consistency issues
When you deploy with pip, the version of your app running is not guaranteed to be the same from server to server. Errors in the build process or existing dependencies result in inconsistencies that are difficult to debug.
**Deploys will fail if the PyPI or your git server are down pip install and git pull oftentimes depend on external servers. You can choose to use third party systems (e.g. Github, PyPI) or setup your own servers. Regardless, it is important to make sure that your deploy process meets the same expectations of uptime and scale. Often external services are the first to fail when you scale your own infrastructure, especially with large deployments.
If you’re running an app that people depend on, and running it across many servers, then the git+pip strategy will only cause headaches. What we need is a deploy strategy that’s fast, consistent and reliable. More specifically:
Capability to build code into a single, versioned artifact
Unit and system tests that can test the versioned artifact
A simple mechanism to cleanly install/uninstall artifacts from remote hosts
Having these three things would let us spend more time building features, and less time shipping our code in a consistent way.
“Just use Docker”
At first glance, this might seem like a perfect job for Docker, the popular container management tool. Within a Dockerfile, one simply adds a reference to the code repository and installs the necessary libraries and dependencies. Then we build a Docker image, and ship it as the versioned artifact to remote hosts.
However, we ran into a couple issues when we tried to implement this:
Distributing Docker images within a private network also requires a separate service which we would need to configure, test, and maintain.
Converting our ansible setup automation to a Dockerfile would be painful and require a lot of ugly hacks with our logging configuration, user permissions, secrets management, etc.
Even if we succeeded in fixing these issues, our engineering team would have to learn how to interface with Docker in order to debug production issues. We don’t think shipping code faster should involve reimplementing our entire infrastructure automation and orchestration layer. So we searched on.
Setting up PEX is simpler than Docker as it only involves running the resultant executable zip file, but building PEX files turned out to be a huge struggle. We ran into several issues building third party library requirements, especially when including static files. We were also confronted with confusing stack traces produced from within PEX’s source code, making it harder to debug builds. This was a dealbreaker, as our primary goal was to improve engineering productivity and make things easier to understand.
Using Docker would have added complexity to our runtime. Using PEX would have added complexity to our builds. We needed a solution that would minimize overall complexity, while giving us reliable deploys, so our search continued.
Packages: the original “containers”
A couple years ago, Spotify quietly released a tool called dh-virtualenv, which you can use to build a debian package that contains a virtualenv. We thought this was interesting, and already had lots of experience using Debian and running it in production. (One of our co-founders, Christine, is a Debian developer.)
Building with dh-virtualenv simply creates a debian package that includes a virtualenv, along with any dependencies listed in the requirements.txt file. When this debian package is installed on a host, it places the virtualenv at /usr/share/python/. That’s it.
This is the core of how we deploy code at Nylas. Our continuous integration server (Jenkins) runs dh-virtualenv to build the package, and uses Python’s wheel cache to avoid re-building dependencies. This creates a single bundled artifact (a debian package), which is then run through extensive unit and system tests. If the artifact passes, it is certified as safe for prod and uploaded to s3.
A key part of this process is that we can minimize the complexity of our deploy script by leveraging Debian’s builtin package manager, dpkg. A deploy script might look something like this:
To rollback, we simply deploy the previous versioned artifact. The dpkg utility handles cleaning up the old code for free.
One of the most important aspects of this strategy is that it achieves consistency and reliability, but still matches our development environment. Our engineers already use virtualenvs, and dh-virtualenv is really just a way to ship them to remote hosts. If we had chosen Docker or PEX, we would have had to dramatically change the way we develop locally and introduce a lot of complexity. We also didn’t want to introduce that complexity burden on the developers using our open source code.
Today, we ship all of our Python code with Debian packages. Our entire codebase (with dozens of dependencies) takes fewer than 2 minutes to build, and seconds to deploy.
Getting started with dh-virtualenv
If you are experiencing painful Python deployments, then ask your doctor about dh-virtualenv. It might be right for you!
Configuring Debian packages can be tricky for newcomers, so we’ve built a utility to help you get started called make-deb. It generates a Debian configuration based on the setup.py file in your Python project.
First install the make-deb tool, then run it from the root of your project:
pip install make-deb
If information is missing from your setup.py file, make-deb will ask you to add it. Once it has all the needed details, make-deb creates a debian directory at the root of your project that contains all the configuration you’ll need for dh-virtualenv.
Building a Debian package requires you to be running Debian with dh-virtualenv installed. If you’re not running Debian, we recommend Vagrant+Virtualbox to set up a Debian VM on Mac or Windows. You can see an example of this configuration by looking at the Vagrantfile in our sync engine Git repository.
Finally, running dpkg-buildpackage -us -uc will create the Debian package. You don’t need to call dh-virtualenv directly, because it’s already specified in the configuration rules that make-deb created for you. Once this command is finished, you should have a shiny build artifact ready for deployment!
A simple deploy script might look like this:
scp my-package.deb remote-host.example.org:
# Run the next commands on remote-host.example.org
dpkg -i my-package.deb
>>> import myproject # it works!
To deploy, you need to upload this artifact to your production machine. To install it, just run dpkg -i my-package.deb. Your virtualenv will be placed at /usr/share/python/and any script files defined in your setup.py will be available in the accompanying bindirectory. And that’s it! You’re on your way to simpler deploys.
When building large systems, the engineering dilemma is often to find a balance between creating proper tooling, but not constantly rearchitecting a new system from scratch. We think using Debian package-based deploys is a great solution for deploying Python apps, and most importantly it lets us ship code faster with fewer issues.