Learn how Nylas improved our continuous delivery pipeline by migrating from Jenkins to GitHub Actions.
At Nylas, we use Python for the vast majority of our production services to sync billions of emails, so naturally, we need a reliable, consistent way to deploy Python to our production servers. In our previous post, we covered our process of turning Python code into an installable Debian package. In this post, we’ll look at how we go from that Debian package to a service that’s running in production, and why we recently chose to migrate from Jenkins to GitHub Actions to improve our build systems.
The process begins with a GitHub pull request; all code changes at Nylas start off as a PR. Once code review is complete and all tests pass, we merge it and kick off a chain of test, build, and deploy events. First, our CI system runs a full test suite and a build process to create Debian packages in parallel. Once both of those steps complete successfully, we’re ready to deploy the new build to our production systems.
Jenkins Growing Pains
Previously, we used Jenkins to manage our entire continuous integration pipeline; it was setup to run unit tests for every single commit our developers made. The centralized services Jenkins offered sped up development because it had faster build times than employee’s laptops and is a standard environment that is tuned more closely to our production environment. It also made it easy to build deployable artifacts whenever commits were merged into the main branch.
The primary downsides of Jenkins are related to the fact that it was originally designed before the modern cloud era that emphasizes microservices and containerization, which makes it pretty difficult to automate. Further, it’s notorious for the amount of regular care and maintenance it requires, and we experienced this in full as we saw a growing number of build and test failures that were a result of problems with the Jenkins environment itself. So, we set out for a better alternative.
Why We Chose GitHub Actions
Security is a top priority at Nylas because, and we always need to take our SOC 2 Certification into consideration. So, we aren’t willing to open up our code to third-party CI/CD service providers, which rules out a lot of the modern solutions. However, we host our code on GitHub, so we decided to replace Jenkins entirely with GitHub Actions. Our primary goal with this migration is to reduce the amount of time our engineering team spends maintaining the build environment so they can spend more time making our products better. We also want to increase developer productivity by reducing the amount of friction related to deploying code to our staging and production environments. GitHub Actions was the clear solution to our problem.
GitHub Actions also allows anyone on the team to build their own workflows using clear and documented YAML configuration files. By enabling the entire development team to build workflows easily, we’ve seen a few useful automations sprout up organically as they solve the problems they face. In addition to running our automated tests, we’re now using GitHub Actions to check our code quality, type annotations, progress towards Python 3, and even verify that our developers give their pull requests meaningful descriptions. GitHub Actions powers a custom-built Slack integration that intelligently routes notifications to the correct team channels, and a Clubhouse integration that opens and manages tickets for small pull requests(like typo fixes) that are not worth the overhead of manually entering into our ticket tracking system. We’re excited to see what else our team builds with GitHub Actions!
Automated Continuous Delivery at Nylas
With GitHub Actions, our continuous delivery process is mostly automated and only requires human intervention during a few specific steps. Every time a build is completed, our CD system gets a notification to push out the new version to all applicable services. When the CD system is ready to deploy the build artifact(the Debian package), it takes the following steps:
It acquires a lock on the servers to which it intends to deploy. This ensures that only one deployment can happen at a time and prevents resource contention between multiple queued deployments.
It rolls out the deployment to the staging environment. This step happens in batches; first 10% of the service’s staging nodes, and then gradually increasing until the service is fully deployed.
At each stage of the rollout, the deployment system queries our monitoring and logging tools to detect any anomalies. If any are detected, such as a spike in errors or degraded application performance, the team is alerted to the potential issue and prompted to proceed or rollback. If a rollback is initiated, the entire environment is rolled back to the last known-good version.
Assuming all goes well, the successful staging deployment is promoted to production. This follows the same rollout pattern and continues to automatically verify against logs and metrics to catch any regressions before they get fully deployed.
This whole process is repeated for every new build and weekly at a minimum to ensure that we deploy regularly and often.
We finished this migration only a few weeks ago and we’ve already realized a30% decrease in the amount of time it takes to run tests for each PR. Even better, we’re now consistently seeing weeks with zero build failures, representing additional time our engineers now have to focus on other ways to make our infrastructure better. With this new system in place, we’re able to ship changes more quickly and often, with less developer involvement, and with more confidence.
Learn More About Python
Our deployment strategy is constantly evolving, and we’re excited about the future opportunities this new foundation will enable as we grow our containerization and automation efforts! Learn more about how we use Python at Nylas: