If you can’t ‘shift-left’ any further, try ‘shift-right’ software deployment practices
Why are “shift-right” practices of monitoring, testing and remediation of software and system issues starting to appear? Because they’re the next stage in the ongoing quest to improve software — not just before deployment but after it’s in production.
Shift-right is a byproduct of the DevOps movement, in which software developers and information technology staff collaborate closely to produce better code faster, and the rapid delivery of software into elastic cloud structures. Shift-right is a newer set of practices for observing and testing the behavior and performance of systems already delivered in production, in order to improve the customer experience of the applications they support.
That’s in contrast to shift-left, an approach in which software and system testing are done earlier in the development lifecycle. The shift-left mantra is almost sacrosanct for agile dev shops and collaborative information technology organizations espousing the modern DevOps movement, and with good reason: It reliably returns value and speeds successful feature delivery.
If you want to deliver software faster, with better quality, then automate and shift-left as much of the configuration, build, testing and security screening as you can. Getting it all done as close to requirements — and as early as possible in the software delivery lifecycle — makes issues quicker to find and easier to fix, well before they appear in front of customers.
Shift-left just makes sense — yet an increasing number of enterprises and vendors are taking heretical steps toward shift-right. Should shift-left move over for shift-right now?
Has shift-left become passé?
Since I first editorialized about the shift-right versus shift-left testing divide, I’ve taken in quite a lot of feedback (and some criticism) from experts with a stake in the broader argument about these approaches for software delivery and operation.
“We’ve noticed that leading digital businesses are releasing faster and with more confidence because they’ve instrumented everything involved in the delivery chain of their application, and can react quickly to any missteps,” said Buddy Brewer, global vice president and field chief technology officer for Americas at New Relic Inc. “They are shifting their testing practices to the right, and increasingly doing it in production with progressive rollouts.” (* Disclosure below.)
Much of the popularity of shift-right stems from an attempt to emulate the leading cloud-first development shops of high-volume consumer apps such as Netflix, Uber, Airbnb and Facebook.
“As cloud giants started opening up how they do their software development practices, people were surprised to find they didn’t have a long defined testing phase,” says Ken Ahrens, founder and chief executive at Speedscale Inc. “If you remove the time between the code being written by a developer and putting it in production in front of customers, the idea of shift-left no longer makes sense. Shift-right is an alternative for companies trying to move really fast. You still want to know if the code is any good, so the reasoning is, you can at least try to discover it in production.”
Obviously, many companies don’t have a deep enough bench of site reliability engineering resources to shift-right to production like the big cloud vendors do, so this is still an emerging practice. According to a 2021 chaos engineering study, only 34% of respondents claimed to conduct chaos tests in production.
“We absolutely still see customers shifting left and automating chaos tests into the CI/CD pipeline as a requirement before pushing code to production — but it’s nearly impossible to simulate a production environment in staging and very costly,” said Aileen Horgan, vice president of marketing for Gremlin Inc. “So we advocate for customers testing against what is in production. If shift-right is uncomfortable, start in lower-end environments, chaos test in staging, and then graduate into production during low traffic times — whatever it takes to get more comfortable with that concept.”
“I’m not against shifting left, but if you believe that you’re not already testing in production, you’re basically fooling yourself because the real test always happens in production, whether you like it or not,” said Olaf Molenveld, co-founder and CTO at Vamp.io. “Allowing data to flow from production into your tests and CD pipeline is a no-go for security and privacy regulations like GDPR. Yes, test as much as you can in a limited time window, but make sure you have safety nets in place for production.”
The shift-right manifest destiny
Agile software pipelines are moving from code to production at such breakneck speeds now that shift-right is an inevitability on the event horizon.
Shift-right can include fanatical production monitoring, real user monitoring, observability and issue resolution, tools often considered in the realm of IT Ops and SRE experts. It also includes various forms of progressive software delivery: feature flags, canary deployments, blue/green releases, dark launches and A/B testing favored by modern DevOps practitioners.
“When I talk to developers and technical people, they often think the software pipeline stops post-deployment,” said Molenveld. “The build is green, a deployment happened and it’s running somewhere — but business people feel the headaches when something doesn’t work right in production. They may not know the intricacies of K8s ingress or whatever, but they want to be involved, see a feature released to a user segment and see how it works.”
There are two styles of shift-right practices happening here — one where you are basically poking and monitoring production, then responding to remediate observed issues, and another where you gather telemetry and data from production to better inform pre-production phases of shift-left testing and development.
“Ultimately, we’re shooting for testing all along our lifecycle — it doesn’t start and end, it goes on and on,” Coty Rosenblath, CTO of Katalon Inc., said at the Kobiton Odyssey 2021 conference. “It gives us continuous learning throughout the product lifecycle. A canary release isn’t just about finding bad things, it is about getting a new release into production safely. You’re not just doing testing, you are putting real load on the system in a controlled fashion. You are also testing new infrastructure.”
Production, with all of its integrations and intricacies, could still be thought of collectively as a singular mono-environment that could never be duplicated entirely, one that delivers a stream of features to an end-user application.
“There’s still a reliance on really good test coverage, but for an executive, they want to reduce all the points of friction that can delay a feature release,” said Ben Rometsch, founder and CEO of Flagsmith. “It’s a bad tradeoff if either everything has to be perfect and nothing gets released, or you have a catastrophe every 6 months. Or, you can agree to have a different set of challenges that are more of a net benefit, so even if something slips into production you can track, remediate and redeploy. We’re finding people using feature flags in different ways, even dynamically configuring the application design.”
Driving right with microservices and cloud
The widespread use of cloud computing, along with declarative infrastructure-as-code and the advent of containerization, microservices and APIs, and now the adoption of Kubernetes as a cloud-native reference architecture, has become a major enabler for shift-right.
In another sense, Kubernetes is by design a fast-changing open-source project among many other interdependent projects that are also subject to the chaos of contribution.
“We would never tell anyone that monitoring Kubernetes alone is good enough — it’s far from a perfect measure of success,” said Ofer Idan, CTO of StormForge. “Let’s say Google comes out with a pilot of their new K8s HPA and a few weeks later, boom — it’s in GA and developers are using it. We believe you should shift-left testing on the Kubernetes components themselves, and find out what it works well on and what its scaling limits are. Then you can deploy and compare how it works in production.”
What has changed because of elastic cloud storage and compute is that we can now gather and process far more telemetry and event data to feed observability and automate testing and remediation activities than ever before.
“Containerization and APIs have made the application a lot more modular — we no longer have to take down the entire system to make a change — so you’ll notice companies are seldom keeping maintenance windows anymore,” said Speedscale’s Ahrens. “This is one of the drivers for shift-right, since it’s very hard to build a production environment in pre-production. But we can still bring in, store and process all of this telemetry and API data in ephemeral cloud instances.”
“Leading businesses are using telemetry data in the plan and build phases in order to inform where to focus their efforts on customer experience optimization,” added New Relic’s Brewer. “In this sense, they are pulling telemetry to the left. Where observability is concerned, we have found that the companies doing the best job at digital are increasingly blurring the lines between what is on the left and what is on the right.”
Learning to live with a shifting surface
One of the key stumbling blocks to shifting right is a sense of intimidation, when comparing one’s own IT delivery function’s rate of innovation with that of best-in-class industry peers.
“This is a team effort — I’d feel sorry for whomever owns both shift-left and shift-right,” said Idan. “Allowing teams to collaborate in the right way means the developer doesn’t just provide input to the performance test, and SREs do more than just monitoring the infrastructure.”
“A lot of enterprise organizations look at a company like Netflix that can deploy 50 times a day or more, and they are doing 4 releases a year and are taking those first baby steps with feature flags, which are one part of a whole catalog of capabilities,” said Flagsmith’s Rometsch. “It’s more a philosophy about ‘this is the desired customer state we should be aiming toward’ when 99.5% of a consumer app’s code is benign. If .5% of the code teams put in goes crazy when it starts to appear in production, you can spot it and think about that.”
There are interesting things to take back to shift-left from the idea of shift-right, such as taking observability and alerts from production to inform design and testing, and assuming code and configuration errors will always escape into production so you basically want faster, more graduated delivery along with faster reaction time.
“It’s really obvious, start small with shift-right,” said Gremlin’s Horgan. “You can experiment running one small chaos experiment and control your blast radius, and then scale it out as you gain confidence in how your systems react.”
“The reality is, to be resilient in today’s environments, you had better be really good at observability, and knowing exactly what’s going on out there in production,” said Jacob Smith, vice president of bare metal strategy at Equinix Inc.
The Intellyx take
There was never really a contest here. Shift-left, the incumbent practice of well-run software delivery organizations everywhere, is here to stay.
That said, shift-right will continue to grow, like the yin to the yang of shift-left. The ability to observe what real users are actually doing with software in production, and understand how the whole system operates under pressure is priceless, and it can actually feed valuable data and customer requirements back to shift-left practices in a virtuous cycle of improvement.
Yes, shift-right can still teach our once-hardened software delivery processes and systems a few new stretches.
Jason English is principal analyst and chief marketing officer at Intellyx LLC, an analyst firm that advises enterprises on their digital transformation initiatives, and publishes the weekly Cortex and BrainCandy newsletters. He wrote this article for SiliconANGLE. (* Disclosure: New Relic is an Intellyx customer, Gremlin and Kobiton are former Intellyx customers and the author advises Speedscale. No other parties mentioned in this story are Intellyx customers.)
Image: Sayam Gearspec, Erik F. Brandsborg/Flickr
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU