Introduction to DevOps and Software Delivery Performance

Published on

March 22, 2021

To outcompete in the market, technology organizations must accelerate, with faster delivery of delightful features, faster feedback loops with customers, and faster responses to the competition.

At the same time, organizations must continue to maintain a high bar for quality, ensuring their software is efficient, performant, and reliable.

How can technology organizations increase their throughput without sacrificing quality?

Leading technology researchers Dr. Nicole Forsgren, Jez Humble, and Gene Kim sought to answer this question by investigating what sets high-performing technology organizations apart.

Over the past six years, their organization DevOps Research & Assessment (DORA) has collected in-depth survey data from over 31,000 professionals and 2,000 organizations worldwide.

Their research has identified concrete performance measurements for quantifying software delivery performance. From these, a statistical cluster analysis was performed to identify key capabilities of organizations that made a significant impact upon these key metrics. These capabilities are enumerated and described in Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations.

The results of their work are illuminating: there are clear correlations between key software delivery capabilities and overall organizational performance.

Introducing software delivery and operational performance

The researchers identified four key software delivery metrics which positively impacted both company performance and employee engagement.

By focusing and improving upon these metrics, organizations are shown to achieve:

Larger market share and greater profitability relative to competitors
Higher rates of productivity
Greater employee satisfaction, stronger employee NPS, and lower rates of burnout

The four key software delivery metrics are:

Delivery Lead Time - Time it takes to go from committed code to successfully running in production
Deployment Frequency - How frequently a team deploys working software to production
Time to Restore Service - The average time it takes to restore service during a service outage.
Change Fail Rate - How often deployment failures occur that require immediate remedy (rollbacks)

Firms that invested in improving these four key software delivery metrics became more effective organizations by increasing throughput and improving product stability.

Investing in capabilities that drive improvement

Over the course of their research, the team identified more than 30 capabilities that correlate with these four key software delivery metrics. Improvements in any of these capabilities have a measurable impact on organizations.

These capabilities are multidimensional and dynamic. This means they can be tackled in any order, and companies can customize a unique improvement journey based on the context of their organization.

At Stride, we help clients develop many of these key capabilities. Below, you can read a quick introduction on each capability identified by the research, as well as resources for digging deeper into particular topics.

If you’d like to learn more about how we can help, please reach out we would love to talk!

Technical Capabilities

Process Capabilities

Measurement Capabilities

Monitor Systems to Inform Business Decisions
Proactive Failure Notifications
Visual Management
Limit Work in Process (WIP)
Monitoring & Observability

Cultural Capabilities

Job Satisfaction
Generative Organizational Culture
Learning Culture
Transformational Leadership

Technical Capabilities

Version Control

Version control tools like git help teams track, manage, and change source code and other technical assets. High-performing technical organizations are skilled at using version control to integrate code, isolate changesets that cause defects, and trace dependencies.

To improve version control practices, organizations can:

Choose a single hosted version control provider, like GitHub or Bitbucket
Commit all major development assets for each application/service to a shared version control repository, including: source code, dependencies, scripts, environment variables, and test scripts
Make sure that developers can set up production-like development environment using only assets checked into version control

Trunk-Based Development

In trunk-based development, engineers divide feature work into small batches and merge those small batches into their version control system’s trunk (in git, usually the main branch) multiple times per day. This compares to long-lived feature branch development, where multiple developers will work for days or weeks on a separate feature branch and then merge their completed work all at once. With trunk-based development, developers can work faster, with less complexity when merging code and no need for “code freezes”.

To implement trunk-based development, organizations can:

Maintain a test suite that executes quickly and continuously as developers work on new features on their local machines
Implement lightweight code-review processes that are performed synchronously to ensure small batches are committed quickly. Changes should take minutes to get merged into trunk, not hours or days
Require automated tests, static analysis, and builds to succeed on a continuous integration system before allowing code to be integrated into the trunk

Continuous Integration

Continuous Integration is the practice of ensuring that software is always in a working state. This is done by running automated tests, static analysis, and build scripts on every commit and requiring that these checks pass.

To empower teams to continuously integrate, organizations can:

Encourage teams to check in code regularly
Maintain an automated test suite and build scripts that execute quickly
Run automated checks before and after code is integrated to the mainline
Make broken builds visible to the team, so they can be quickly fixed

Deployment Automation

Deployment automation is the practice of combining all manual deployment steps into a single deployment script capable of deploying to any environment, including production. Deployment automation makes for safer, faster, and more standardized deployments.

To empower teams to practice deployment automation, organizations can:

Standardize deployment processes across environments
Maintain a test suite with unit, integration, and end to end tests
Use version control to keep track of deployment configurations, databases, and dependencies

Continuous Delivery

Continuous delivery ensures that any checked in code could be deployed successfully at a moment’s notice. Continuous delivery is a team-wide mindset that must be actively maintained. When trouble in a non-production environment occurs, teams should focus efforts to fix the issue so that the mainline can be returned to a releasable state.

To empower teams to practice continuous delivery, organizations can:

Value stream map (VSM) the deployment process and look for bottlenecks to address
Maintain an automated test suite with unit, integration, and end to end tests
Ensure security is built into every feature
Utilize monitoring and notifications to identify errors and deficiencies
Use version control to keep track of deployment configurations, databases, and dependencies
Reduce dependencies between applications through loosely coupled architecture

Loosely-Coupled Architecture

A loosely coupled architecture can be defined as a system of components separated by responsibilities, in which each component has minimal dependencies on other components. Loosely coupled architectures yield components that are testable and deployable independent from one another. These characteristics make it easy to validate and change a component without depending on other components. Outcomes of having this independence are a reducing/eliminating cross-team communication bottlenecks, team autonomy, fast feature delivery, small feedback loops, and a scalable engineering organization.

To implement loosely coupled architecture, organizations can:

Invest in finding the most appropriate bounded contexts for the domain
Align team and organization structure to fit bounded contexts
Empower teams to test and deploy their components/services independently of each other

Shift Left on Security

Information security is often reviewed towards the end of the development process, when required changes are more expensive and time-consuming to incorporate. High-performing teams integrate security into the development process and are aware of common security risks. This concept is called “shifting left”, as the security concerns are addressed earlier in the software development lifecycle.

To shift left on security, organizations can:

Fail builds when automated security scans detect vulnerabilities in third party software
Encourage developers to “build in security” by writing tests that address common security risks
Collaborate with InfoSec teams early in the product development process so they can identify security gaps before code is written

Database Change Management

Database change management ensures that database changes don’t slow down or prevent deployments. This practice contributes to continuous delivery, as it enables code changes that interact with the database to be deployed on demand. Organizations often need to manage databases across multiple environments, such as development, staging, testing, and production. Migrating databases in each of these environments should be easy to automate and painless for developers so that developers can focus on feature delivery.

To manage database changes, organizations can:

Use version control to track database schema changes in the same repository as application code
Record which environments have changed and the results of those changes
Collaborate with database administrators (DBAs) when implementing changes, if applicable to your organization

Empower Teams to Choose Tools

Teams are most effective when they are given the freedom to choose the tools they are most comfortable with and they believe is best for the job. At the same time, organizations must balance the complexity and technical debt that accrues from supporting multiple sets of tools for a given job. To strike this balance, high-performing organizations manage an approved set of tools and empower teams to choose from the basket of options.

To empower teams to choose tools, organizations can:

Develop a list of preferred and approved tools, including programming languages, testing and deployment tools, and backend systems
The preferred tools list should have a variety of options for each use case to provide flexibility and optionality for practitioners
Clearly define a process for selecting a tool outside the preferred list, and periodically review the set of tools within the preferred list

Test Data Management

Test data management describes the process of establishing high-quality non-production data to rigorously test an application. This allows a team to automatically test behavior and prevent regressions. There are several considerations to take into account when managing test data: reducing dependencies on external data, eliminating data defined outside of the test scope, and avoiding copying sensitive production data into tests. Having a suite of unit, integration, and end to end tests helps make good use of well-established test data.

To empower teams manage test data, organizations can:

Promote unit testing
Eliminate external and excess data from unit tests
Make test data easy to access and understand
Isolate test data so tests run in their well-defined environments

Code Maintainability

Over time, as a project’s scope and team change, maintainability helps ensure quick feature delivery and high product availability. A maintainable codebase exists when it is easy for team members to:

Keep the product running smoothly
Understand the system
Extend the system with future features

To empower teams to write maintainable code, organizations can:

Leverage tools like Dependabot that help keep dependencies up to date, with the latest security and features
Write tests to improve code design, prevent regressions, and document user experiences
Push for elegance and simplicity in both code and product design
Invest in making the right abstractions by refactoring and removing unused code
Create a deterministic build process so that builds behave the same across machines and any bugs that arise are easy to find and fix

Cloud Infrastructure

The five main characteristics cloud computing, according to the US National Institute of Standards and Technologies (NIST) are:

On demand self-service: consumers provision computing resources as needed
Broad network access: capabilities can be accessed from a variety of platforms (mobile phones, tablets, laptops, etc.)
Resource pooling: provider resources are pooled in a multi-tenant model
Rapid elasticity: capabilities can be elastically provisioned and released to rapidly scale on demand
Measured service: cloud systems can automatically control, optimize, and report resource usages

Adopting all of these characteristics enables teams to have faster throughput, more stability, better service availability, and improved cost visibility.

To empower teams to use cloud infrastructure, organizations can:

Choose a cloud infrastructure provider that is appropriate for its organizational and product complexity
Provision cloud infrastructure using code that is checked into version control
Ideally, use a declarative language that provides clear change sets and which can be automated
Align teams and individuals who will be implementing cloud infrastructure so that there is an understanding of the end goal and how to reach it
Invest in infrastructure monitoring and build capabilities to respond to outages

Process Capabilities

Team Experimentation

High-performing technical organizations empower teams to work independently. They are given an agreed-upon business outcome to achieve, and the team works collaboratively to achieve the outcome.

The team is free to design, define, and adjust their own specifications without having to get permission from outside the team. The team then experiments and tests their solutions and determines if their work will achieve the outcome or solve the problem.

To improve team experimentation, organizations can:

Encourage teams to iterate and continually improve solutions
Empower teams to regularly talk to customers
Hold hackathons and other cross-functional events to inspire divergent thinking

Streamlining Change Approval

Traditionally, change approvals and security reviews occur at the end of the development process, when code is feature complete. However, research indicates this is a costly and time-consuming process, and incorporating change approvals during the development process increases throughput without sacrificing quality. Additionally, when teams have a clear understanding of the change approval process, they are more likely to follow it.

To streamline change approvals, organizations can:

Move to just-in-time peer-review approval processes that happen during code check-in
Support just-in-time reviews with automated tests. Invest in a suite of highly-performant and reliable tests
Move repeated security and compliance checks to the platform and infrastructure layer

Gather Customer Feedback

High-performing organizations understand the importance of “getting out of the building” and interacting with their customers. Teams achieve higher performance and develop higher-quality products when they regularly collect customer satisfaction information, listen to customer feedback on products and features, and incorporate this feedback into the product development process.

To successfully incorporate customer feedback, organizations can:

Conduct customer discovery interviews early in the software development process to better understand the customer’s problem
Encourage teams to regularly schedule time to collect feedback from customers and provide ample budgets to adequately compensate customers who spend time providing feedback
Empower teams to act quickly on customer feedback, giving them control to make changes to designs or specifications in response to what they learn from customers

Make Work Visible

Work visibility represents how well teams understand the flow of work throughout the business, from ideation to prioritization to production. Teams with high work visibility both understand the workflows and have visibility into the current status of work through the system.

To make work visible, organizations can:

Ensure work is shown on visual displays and dashboards and that these visualizations are available broadly throughout the organization
Use value stream mapping to understand how work requests flow through the organization
Value stream maps highlight lead time (time through each step) and process time (time to complete work if focused and uninterrupted). Use these metrics to identify bottlenecks in the process
When creating value stream maps, make sure to gather individuals from across the organization to get a comprehensive picture

Work in Small Batches

Working in small batches is an organizing principle from lean product development in which work is organized into the smallest possible units of value and delivered incrementally. This allows for tighter feedback loops and allows the team to course-correct and revisit assumptions made during the development process. High-performing teams build this mindset into their planning processes, slicing up work into small, independent, vertical slices that can be developed quickly.

To being working in small batches, organizations can:

Focus on breaking up work into the smallest possible pieces
Divide the slices of work into sections that are independent of the overall status of the feature and can be completed in a few days. A good rule of thumb: code is checked in daily (see trunk-based development)
When building out new features, ensure the work is independent, negotiable, valuable, estimable, small, and testable
Use feature toggles to activate completed components when ready

Measurement Capabilities

Monitor Systems to Inform Business Decisions

Organizations create monitoring systems to display relevant business information and reference this information when making key decisions. When practiced well, monitoring gives you rapid feedback and allows you to quickly fix issues early in the software development process.

To apply monitoring systems, organizations can:

Ensure applications collect and record business data in each major step in the value chain.
Make business decisions informed by the data. Organizations should ensure the data is available and distributed to individuals.

Proactive Failure Notifications

High-performance organizations don’t rely on customers or staff to notify when an application or service is down. Instead, they configure proactive alerting rules with performance thresholds that notify developers as performance starts to degrade, but before the system goes down entirely. This allows the team to catch and resolve issues ahead of time, before they ultimately impact end users.

To implement proactive failure notifications, organizations can:

Set up alerting rules which specify the conditions for an alert being created and which channels the notification is sent (email, text, Slack, etc)
Configure alerts to trigger when a value begins to rise and will ultimately cause user-facing impact
Create alerts for important problems. If a team is inundated with too many alarms or alerts, they will become desensitized and be less likely to respond appropriately

Limit Work in Process (WIP)

Work in process (WIP) limits set the maximum amount of work that can exist in each step of a workflow. High-performing teams focus on completing a small number of high-priority tasks and use limits to ensure they are not scattering their focus. With WIP limits, throughput through the system is increased and the amount of work “nearly done” is decreased, fostering a culture of things getting done.

To limit work in process, organizations can:

Create a storyboard that displays all the work the team is doing on a board, along with columns that display the status of the work (ready, in progress, completed)
Empower teams to set a reasonably low WIP limit and wait for a card to move into the next column before pulling a card from the previous column
Make sure that all work is represented (no “invisible work”).
When teammates become idle due to a WIP limit, have them help out in other areas of the value stream instead of exceeding the WIP limit. As a rule of thumb, individuals shouldn’t be splitting their time between multiple tasks

Visual Management

A cornerstone lean development practice, Visual Management ensures that organizations have key information readily available to develop a shared understanding of the current state of operations. Teams visualize their work in progress with storyboards, create dashboards to monitor systems, and create burn-up or burn-down charts to document the cumulative status of all work being done. By doing so, information flows more freely through the organization, and teams become more aligned with what needs to be done.

To apply visual management, organizations can:

Encourage teams to co-create easy-to-understand, digestible information displays. The information presented should include details that are important or relevant to the team
The best visual displays are ones that are regularly referred to during the course of a working day. If a metric is not regularly used by the team, it should not be there
Teams should evolve and modify their displays to address issues they are facing in the present, and should be tailored to the unique challenges of the moment

Monitoring and Observability

Monitoring and observing technical systems are integral capabilities for high-performing technology organizations.

Monitoring - tooling which enables teams to view and comprehend the status of a system through the collection and creation of metrics and logs.
Observability - ability for teams to debug their system by exploring patterns in system data.

Effective monitoring and observability solutions report on the overall health of the system, display the customer-facing status, monitor key business metrics, and provide tooling systems to enable teams to debug and uncover issues.

To implement monitoring & observability, organizations can:

Track internal metrics to understand the effectiveness and usability of your monitoring system. How often are changes made to the monitoring configuration? What percentage of alerts are false positives? What is the mean time to recover when an issue arises?
Develop a retrospective and postmortem culture to review outages. What caused the outage? How might we improve our monitoring systems to catch the outage ahead of time?
Invest in monitoring system tooling. Check the tooling into version control, and treat it as much of a first-class object as your application code

Cultural Capabilities

Job Satisfaction

Employees who are engaged, who find their work challenging, meaningful, and fulfilling, drive positive operational performance. High-functioning organizations focus on employee job satisfaction, even though it is hard to quantify and measure, by having conversations with employees and discovering what they find meaningful.

To improve job satisfaction, organizations can:

Empower employees to choose their own tools and develop their own workflows to accomplish organizational outcomes
Encourage managers to help employees find meaningful work that taps into their special skills and expertise. Meaningful work is very personal and unique to each individual, so in-depth conversations and discussions are necessary to find meaning and purpose in the workplace

Generative Organizational Culture

Organizational success relies on fast-flowing streams of “good information”: timely, relevant, and effectively-presented data. Sociologist Dr. Ron Westrum researched complex organizations -- including the airline and healthcare industries -- and found that good information flows more freely in high-trust, psychologically safe learning environments.

Westrum classifies organizational cultures as:

Pathological (power-oriented) - Low cooperation; large amounts of fear; information is hoarded
Bureaucratic (rule-oriented) - Moderate cooperation; moderate amounts of fear; information sharing is tolerated
Generative (performance-oriented) - High cooperation; low amounts of fear; information flows freely

To foster a generative culture, organizations can:

Foster high cooperation within cross-functional teams that are responsible for fixed business outcomes
Treat failures as learning opportunities, not opportunities to assign blame. Hold blameless postmortems, uncover the fundamental causes of the failures, and develop an environment that makes it safe to fail
Encourage experimentation, freedom, and give employees the freedom to explore transformational ideas

Learning Culture

A learning culture is “a culture that supports an open mindset, an independent quest for knowledge, and shared learning directed toward the mission and goals of the organization.” Workers inside a learning culture have a “growth mindset”, are willing and able to learn, and are excited to share their knowledge with others. High-performing organizations treat the creation of a learning culture as a strategic and competitive advantage.

To develop a learning culture, organizations can:

Reward continuous learning by creating training budgets and advocate the use of these budgets. And when individuals attend an external training, have them share their new knowledge by presenting what they learned to the team
Build psychological safety and make it safe for employees to fail. If failures are seen as learning opportunities, they’ll be more comfortable innovating and exploring their curiosity
Set up regular opportunities for knowledge sharing, including lunch-and-learns, cross-team socials, and lightning talks

Transformational Leadership

Transformational leadership is defined as the ability to “motivate followers to achieve performance beyond expectations by transforming followers’ attitudes, beliefs, and values as opposed to simply gaining compliance.” Transformational leaders have vision, use inspirational communication, nurture intellectual stimulation, practice supportive leadership, and administer personal recognition. And while leadership characteristics alone cannot drive organizational performance and outcomes, they are a force multiplier that helps teams amplify their impact and provide teams with emotional “cover” to implement other capabilities.

To implement transformational leadership, organizations can:

Ask intellectually-stimulating questions, fostering a safe environment to have dynamic, innovative conversations
Live by the golden rule. Supportive leadership considers the personal feelings of others before taking action, incorporating the interests and needs of team members
Set audacious goals and empower the team to move on them, working to remove obstacles to achieve these goals by implementing other capabilities on this list

Stride helps organizations improve their ability to deliver, react faster to feedback, and gain greater visibility into their teams’ performance. If you’d like to learn more about how we can help make an impact across any one of these 30+ capabilities, drop us a line, we'd be glad to discuss.

And for more details about DevOps, we recommend exploring DORA's State of DevOps research program.

Stride Staff