Over the course of sixteen years of building web applications at startups, I’ve seen it all. From editing PHP code live on production servers, to hand rolled SCP and FTP-based deployment workflows, year long waterfall development cycles with code freezes, and No Deploys On Fridays.
At previous startups, Continuous Deployment seemed like an achievable goal, but one that always seemed a month or a quarter away from reality.
That’s why, when we began building the codebases that eventually became Rhino in late 2016 and early 2017, it was important to me that we baked Continuous Delivery into our development processes as seamlessly as possible.
In Rhino’s early days, I read Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, which left a lasting impression on me. In it, the authors describe what they call the Four Keys metrics which can be used to assess Engineering teams’ performance.
Those four metrics are:
Change Failure Rate - how often a release introduces a regression
Time To Restore Service - how quickly a regression can be rolled back
Lead Time For Changes - how quickly a code change can move through the development cycle to production
Deployment Frequency - how often new changes are shipped to customers in production
According to the book, elite performing Engineering teams are ones who can deploy to production on-demand, deploying multiple times per day.
At Rhino, our small engineering team is able to ship about 15 releases to production every day. Here’s how we do it.
When an Engineer opens a new Pull Request on Github, our suite of (nearly 100,000) automated tests run against the new change, along with static analysis, linters (such as Rubocop), security scanners (such as Brakeman), and code coverage analysis (we use the awesome UndercoverCI).
If each and every one of these tests pass, an engineer will then perform code review, leaving suggestions and feedback on how to improve the code. After code review is completed, the feature is ready for testing in QA.
Since we are hosted on Heroku, we are able to take advantage of Review Apps, which are automatically spun up and deployed with a production-like environment running the code in the pull request.
We have written some small custom tools to seed these environments with test data and coordinate requisite companion services.
Once the code is approved by an engineer or stakeholder, the pull request is merged to the main branch.
After the feature is merged to the main branch, a new build is triggered on Semaphore to test that no regressions were introduced when we integrated the feature branch with main. If those tests successfully pass again, our Heroku Pipelines are triggered, and the code is automatically deployed to our staging environment (which we use as a long-standing internal environment for testing and demos) and immediately to production.
This requires us to think of the main branch as always deployable–or even one step further–always either already live on production or to be released imminently.
This gives us many advantages:
We don’t need to spend additional QA cycles once again ensuring that main is “ready to deploy.”
We also don’t need to worry about the situation in which many code changes, which were not tested together, all get shipped to production at once, leading to unexpected issues.
Instead, each individual commit gets deployed straight to production serially, which means that when something does go wrong, finding the problem commit is extremely easy.
Usually it’s the commit that was just deployed to production, and mitigating the issue is as simple as clicking the “Rollback” link in the Heroku web dashboard or with the CLI.
This means our Time To Restore metric is also well into the Elite criteria of under one hour to restore service.
Ever since adopting this workflow:
Our systems have been more stable than ever
We’re able to release working, high quality software to our partners and customers more quickly than ever
We spend almost no internal time synchronizing releases or issuing code freezes
…and, most importantly, we safely and reliably deploy on each and every Friday.
Image credit to original creator.