What is Canary Testing Tutorial: Benefits, Process, Examples

This tutorial focuses on canary testing, its approaches, the process of running canary tests, challenges, and their solutions.

OVERVIEW

Canary testing is an approach to validate the newly added feature or version of the software application in a production (live) environment. Its main aim is to lower the risk and its impact on the users.

In the Software Development Life Cycle, various types of software testing approaches are executed by the DevOps team to ensure the reliability and quality of software applications. All organizations recognize the utmost importance of running tests on their software products before deploying them to end users.

Such testing is done because it demonstrates that the software meets the specified requirements of developers and end-users while also ensuring the absence of coding defects. The canary test complements the testing process and allows organizations to further validate the stability and quality of their software before a full-scale release.

DevOps teams use this approach to identify performance bottlenecks in software applications. This test observes how the software application behaves for a small chunk of end-users, similar to statistical sampling. Further, by analyzing this sample, DevOps teams can gain insights and estimates regarding the overall response.

What is Canary Testing?

Canary testing serves as a risk reduction and validation method for a new software application by introducing it to a limited number of real users. The team implements canary testing to test minor changes to the software application for a specific set of end-users. This group typically has a small percentage of the larger user base. By deploying this code to a sample group, the DevOps team can identify issues in the code.


what is canary testing

It is a valuable and practical method as developers gradually deploy the code. It allows them to test new features and functionalities during production while minimizing the impact on the application's users. By limiting the exposure of the new feature, teams can validate the changes without significantly affecting the overall user experience.

In many instances, canary testing is often used interchangeably with canary release and canary deployment. However, when referring to canary testing specifically, it involves releasing code to evaluate and test new features or versions using real users within the live production environment.

Origin of Canary Testing

The word "canary" is used in software development to describe a way of testing new software applications before releasing them to the public. The term comes from coal mining, where they used canary birds to keep miners safe. These birds were more sensitive to dangerous gases than people, so if the air became dangerous, the canaries would show signs of distress or even die, warning the miners to get out.

In terms of canary testing of a software application, a small group of users first tests out the new version. These users don't know they're like the canaries, helping to find any issues early on. If there are issues with the new code, developers fix it before letting more users try it. This way, they can make sure the software product works well for everyone and doesn't cause any major issues.

Note

Note : Test your web apps across 3000+ desktop and mobile browsers. Try LambdaTest Now!

Why use Canary Testing?

Assuming you already have a process in place to test your software upgrades, you most likely utilize various techniques from the DevOps realm, such as A/B testing and blue-green deployments.

Developers create automated tests for their software's new features and modifications. The changes are deployed to a testing environment where others can explore and interact with the new features. If everything goes smoothly, the new software update is rolled out to the production environment, allowing end users to benefit from the newly added feature.

However, given the nature of software, bugs tend to move into production. As humans, it is impossible to anticipate every potential edge case. Moreover, deadlines and budget constraints add to the pressure.

Canary testing is a technique that aims to limit the impact of these production bugs to a small subset of users. Traditionally, it involves having two identical production environments, although they need not be separate servers. For instance, two web applications could run on the same server.

Once you have a new release ready, you can deploy it to one of the environments. Then, you can direct a small portion of your users (around 5% is recommended) to this canary release. These users will experience the new features, while the other group will not encounter any changes.

You can then closely monitor this canary release and address any bugs. The objective is not to eliminate production bugs but to minimize their impact. If a bug exists in the new version, only 5% of your users are affected. While the bug still requires fixing, the pressure on you may be less than if all users were affected.

Benefits of Canary Testing

Testing helps to identify and address software application issues that affect user experience. Canary tests go a step further by introducing changes to the production environment to minimize or eliminate any negative impact on usability. Here are the advantages that make canary testing a valuable process:

  • Canary tests are easy to deploy. Since the new features are minor and initially targeted at a limited number of users, running canary tests is a quick and straightforward process.
  • If canary tests fail and users do not like the new software changes, it is easy to revert back to the previous state.
  • The overall test process requires minimal maintenance. Developers focus on individual small functionalities, push them into production, and then move on to the next one after analyzing and running the tests.
  • Canary tests do not require system downtime, ensuring the application's continuous availability.
  • Running most canary tests only requires a simple and small infrastructure. Everything is handled internally, and the cost of fixing any issues is also low.
  • Developers have the freedom to innovate and experiment with their code. Since the new code affe only a small group of selected users, developers can explore different ideas and possibilities.
  • Users can be involved in the beta version of the software application, providing valuable feedback to address issues and take the code to production.

What Does a Canary Test Do?

The canary test address the following aspects to ensure the less buggy software applications:

  • Network connectivity: It verifies that firewall rules are in compliance, ports are accessible, and the proxy is functioning correctly. It also monitors the ping to ensure it remains within an acceptable threshold.
  • Database and middleware: It confirms that the code meets the performance criteria established by developers and functions correctly with the available databases and middleware. Additionally, it validates the database schema and conducts data auditing to identify and address system bugs.
  • Disk quota: It checks for excessive space usage caused by logs.
  • Validity: It validates the legitimacy of logins and passwords.
  • Integrity of data references: It ensures that the reference data provided by developers is consistent and functions without issues.
  • Licenses: Canary tests verify user licenses' availability and active status, ensuring they are not expired.

If any issue arises, rolling back the changes and returning users to the original infrastructure is always possible.

When to Use Canary Testing?

Now that you know what the exact canary test does, it is important to understand when to execute the canary test. As we have learned in the previous section, the canary test is performed by the development team to check the functionality of the new version of the software applications.

Nonetheless, it remains crucial to test the code before deployment to prevent future issues thoroughly. To this, canary testing is performed to comprehensively understand the code's capabilities before updating the entire environment. Here are some other crucial scenarios where you have to perform canary tests:

  • Given the prevalence of regular software updates and the rapid advancements in technology, canary tests are important in minimizing downtime and ensuring optimal application performance.
  • Testing in live production mode becomes essential for applications built on legacy or third-party systems and infrastructure, as replicating such systems can be complex and expensive. Here, you can go for a canary test.
  • In situations where applications employing multiple independent microservices necessitate running canary tests in live production mode

Implementing canary tests is easy and helpful if you carefully follow the abovementioned steps. By doing this, you can successfully conduct canary tests and deployments, getting valuable insights to improve your software application and foster innovation.

Limiting the number of users affected by software changes makes identifying and addressing any software-related errors simpler. However, subtle distinctions exist between canary deployment and canary release that may cause confusion. Let's explore these differences in the following section.

...

What Are Canary Deployment and Canary Release?

Using a canary release is an effective method for gradually introducing incremental code changes associated with adding new features or developing a new software version. This approach involves releasing the code to real users in the production environment, allowing the development team to quickly assess whether the changes yield the intended or expected results.

Furthermore, canary deployment permits developers to migrate a small portion of users to the new functionality offered in a new release. Exposing only a subset of the overall user base to the new code minimizes potential issues related to the new software. Additionally, this approach facilitates an easier rollback of a faulty release, preventing it from impacting the entire user base.

Canary Testing Approaches

To execute the canary test, two approaches are mainly implemented to achieve reliable outcomes. Here are those two approaches:

Blue-Green Deployments

Blue-green deployment is one of the most used approaches to implement the canary test. In this, two identical environments are developed: "blue” and “green,” and the existing version of the software application are executed in a blue environment. In contrast, its new version is deployed in a green environment. However, there is a slight difference in the context of blue-green deployment.

Instead of having a separate environment to wait for switching to another environment once the deployment is done, a canary test using blue-green deployment involves initially switching over only a small subset of servers or nodes before proceeding with the rest.

This is how it is done:

  • A traffic router is used at the server level to split the traffic between these environments.
  • The traffic router directs a fraction of the incoming requests to the green environment while the rest continue to be routed to the blue environment.
  • This traffic division can be done based on various criteria, such as random selection, specific user groups, or predefined rules.

Various configurations can be implemented for canary deployments. The easiest method involves setting up your environment behind a load balancer as usual but with a few spare nodes or servers (depending on your application's size) that are not in use. These spare nodes or servers are designated as the deployment targets for your CI/CD pipeline.

Once you build, deploy, and test these nodes, you reintroduce them to the load balancer for a limited duration and a restricted group of users. This enables you to ensure the success of the changes before repeating the process with the remaining nodes in your cluster.

Feature Flags

Feature flags are a popular method of conducting canary tests that focuses on specific features. Instead of relying on releases, feature flags utilize code to allow development teams to activate or deactivate particular features for specific users. With the feature flag, you can limit the release to 1% of the users and monitor the key metrics, such as error rate and business metrics.


feature flags

This helps to ensure that new features added to the software application do not have any negative impact. This approach is handy for business stakeholders who need to test new features before implementing them for everyone. However, while performing a canary test, if any issue is detected during the deployment method, you can easily disable the new features by turning the feature off.

Here are some common scenarios where feature flags are applied:

  • Early access programs: Stakeholders seeking feedback on early features can utilize feature flags to gather valuable insights and make necessary refinements before releasing them to all users.
  • Blocking features: In some instances, applications may require blocking specific features for certain users, such as due to regulatory requirements. Feature flags simplify this process by providing a simple toggle switch instead of building complex user-facing options.
  • A/B testing features: Features can be easily compared through A/B testing without exposing either option to the entire user base. For instance, assessing the performance of different versions of a specific feature becomes possible.
  • New vs. power users: Some features might be too intricate for new users but essential for power users. Feature flags enable the deactivation of advanced features for new users, thereby avoiding confusion while simultaneously providing these features to users who can fully leverage them.

Canary or Automated Testing First?

Prior to conducting canary tests, it is important to execute your automated tests. This step ensures that the code intended for release to your selected users is free from bugs and meets the initial quality standards. Typically, organizations already have established processes for testing software updates. Many utilize techniques such as A/B testing and leverage DevOps practices to automate the development, testing, and deployment of code modifications in the form of builds.

Once you have completed the automated testing phase for your new code and it has passed the necessary checks, you can push it to the production environment for user access. Following this, the canary test process can begin.

It is advisable to perform automated testing using a suitable test automation framework or tool. This will provide a significant level of code control while accurately documenting and presenting the test results. Selecting an automation tool that allows you to execute test cases on web and mobile applications is important, as canary tests encompass these devices and environments. For this, you can opt for cloud-based testing platforms to test on vast combinations of browsers, devices, and platforms.

LambdaTest is one of the most used AI-powered test orchestration and execution platforms that allow testing across large farms of 3000+ browser and OS combinations. With its real device cloud infrastructure, you can perform both manual and automated tests. Deploy and scale faster with its cross browser testing capabilities.

With LambdaTest's cloud-based grid, you can efficiently execute tests using frameworks such as Selenium, Cypress, Playwright, and more. Check documentation to get started with automation testing on LambdaTest.

You can also look at the tutorial below that gives you a basic understanding of performing automation testing on LambdaTest.

Catch up on the latest tutorials around Selenium automation testing, Playwright testing, CI/CD, and more. Subscribe to the LambdaTest YouTube Channel.

...

Canary Testing Frameworks and Libraries

When the canary test has to be performed, you will need libraries and frameworks to streamline the test process and provide useful features. Here are some testing frameworks that you can use:

  • Kayenta:It is an open-source canary analysis testing tool which is developed by Netflix. Kayenta is associated with different metrics sources and used in integration with Spinnaker to sync for the continuous delivery pipeline.
  • Istio:This tool allows canary deployment functionality and helps control traffic. Istio helps in fine-grained traffic control, and with this, you can enable canary release by leveraging its traffic routing and load balancing ability.
  • Traffic Split: It is part of the Kubernetes service mesh ecosystem. Using this tool will allow you to deploy and test the canary release. It is mainly done by splitting traffic between diverse software application versions.
  • Prometheus: It is an open-source monitoring and alerting toolkit that helps you to collect and store metrics while you perform canary tests.
  • Grafana: It is a visualization tool that will help you perform the canary test by integrating with Prometheus. Using this tool, you can create dashboards and alerts to observe the behavior of the canary deployment.

Monitoring and Observability Tools

Utilizing specialized tools is indispensable to oversee canary releases and guarantee observability throughout testing effectively. Presented below are a few commonly employed tools for monitoring and observability:

  • New Relic: It is a comprehensive platform for monitoring that provides real-time insights into application performance, user experience, and infrastructure metrics. New Relic encompasses distributed tracing, anomaly detection, and synthetic monitoring functionalities.
  • Datadog: It is a cloud-based platform for monitoring and analytics that aids in tracking the performance of applications, servers, and other components. Datadog provides customizable dashboards, alerts, and integrations with diverse data sources.
  • Splunk: It is a widely adopted platform for managing and analyzing logs. Splunk enables centralized log collection, monitoring, and troubleshooting during canary tests, facilitating prompt detection and resolution of issues.
  • Dynatrace: It is an AI-powered observability platform that grants comprehensive visibility into the performance and dependencies of applications, microservices, and infrastructure. Dynatrace offers automated anomaly detection, root cause analysis, and intelligent alerts.
  • Elastic Stack: It is a collection of open-source tools, including Elasticsearch, Logstash, and Kibana, commonly referred to as ELK Stack. Elastic Stack enables log collection, storage, search, and visualization, thereby aiding in monitoring and analysis during canary testing.

Bear in mind that the selection of canary testing frameworks and monitoring tools depends on the specific requirements of your project, the technology stack being utilized, and the desired level of complexity in your canary deployment and monitoring processes.

How Canary Testing Works?

The process of canary testing operates in a structured manner, similar to other software testing methods. The steps involved are as follows:

Step 1: The development team carefully selects a group of users who will serve as testers. This group represents a small subset of the overall user base, yet it is large enough to yield meaningful statistical analysis. Importantly, these users are unaware that they are participating in the testing process.

Step 2: A dedicated testing environment is established, running alongside the existing live environment. The system load balancer is configured to direct user requests from the designated canary testers to the new environment.

Step 3: The canary test begins as developers route test user requests to the new environment. Throughout this period, the developers closely monitor the testers to ensure that the new version operates as expected.

Step 4: If the new version meets the predetermined deployment criteria, the new software feature or version can be released to all users. However, if the new version has numerous bugs, diminishes application performance, or introduces any other issues for users, the testers are redirected back to the original software version.

Step 5: The development team addresses the identified bugs and subsequently releases the software to a broader audience.

By following these steps, canary tests foster thorough evaluation and validation of software changes before deployment to a large user base.

Note

Note : Test your web and mobile apps in real-user conditions. Try LambdaTest Now!

Three Phases of Canary Testing

The process of the canary test involves three main phases, which are very simple to be executed. Below are the main phases of the canary test:

Planning a Canary Test Deployment

This phase can be the longest and most challenging of all. During the initial step of canary testing, it is crucial to engage in proper planning. In this phase of the canary test, a small group of users will receive the updated code before a full release, called a canary deployment. Several factors need to be considered when planning a canary test, including:

  • Number of users and stages: It is important to decide how many users will be in the canary deployment and how many stages will be involved. The canary test code is typically directed to 5% or 10% of the total user base. However, the test users may be selected based on their geographic location.
  • Timing/duration: During the planning phase, it is also important to establish boundaries for the test duration. Typically, canary tests run for a duration ranging from minutes to hours, necessitating close monitoring.
  • Evaluation criteria: Like any other software testing, the success or failure of a canary test can only be determined if the evaluation criteria have been defined beforehand.
  • Performance metrics Metrics need to be gathered to track the progress of the test, evaluate the application's performance (e.g., latency), measure CPU and memory usage, and monitor errors.

Once you have finalized these decisions, you and your team can start working on establishing the canary infrastructure. This involves the following steps:

  • Creating the canary and dividing your user base.
  • Routing the new code to the selected user base.
  • Preparing everything for the deployment on a staging server, including deployment manifests, configuration files, build artifacts, and testing scripts.

Followed to this, you need to generate a canary node using load balancing. You will replicate your production environment, creating a similar infrastructure to the currently active software environment. One of the clones will serve as the original or baseline, which you can rely on if the new code fails. If necessary, you can roll back to this baseline clone. The number of clones you create depends on the number of features you intend to test, with a minimum requirement of two.

Implementing a Canary Test Deployment

After finishing the planning phase, the development team proceeds with the actual deployment of the canary test by directing the updated code to the chosen test group. The team will prepare deployment manifests and configuration files, build artifacts, and create testing scripts.

The team will then establish a canary node by balancing the load and duplicating the existing production environment. At least two production environments are needed for canary testing, with one serving as the original application without any code modifications (baseline). Also, the team will evaluate the new version by collecting data for the designated metrics determined in the previous stage.

The aim is to assess the latest version's performance consistency and system health. It is crucial to examine metrics such as latency, memory usage, error count, and volume is crucial. Detailed logs will be provided to identify any bottlenecks.

Analyzing a Canary Test Deployment

In this phase of canary testing, the canary code is routed to the selected number of users, resulting in traffic in both baseline and test nodes. With this, it becomes easy to comparatively evaluate the application's performance and check whether the test version aligns with the evaluation criteria.

If any issue is identified, information will be shared with the team for early fixing. Without any issues, you can deploy the version to the entire baseline or conduct another test with a different subset of users.

Here you get three options that you can choose from:

  • On the success of the test execution, the code release can be done to the entire infrastructure.
  • If the test gets successful, however, there is a requirement for some additional information for performing another canary test. Here, you need to increase the subset to 10% or 20% from 5% or 10%, respectively.
  • If the test fails, you have to roll back to the previous version and get it restored. In this case, the team has to fix the issues before running another test.

Challenges and Solutions of Canary Testing

Every approach has its own challenges, and canary releases are no exception. However, rather than considering them as "disadvantages," feature management solutions can effectively address these challenges.

  • Mobile applications:: When dealing with mobile apps, the situation becomes more problematic as there is typically only one environment available—the user's device. Since app distribution occurs through app stores, it's not possible to selectively provide newer versions of the app to specific users.
  • Fortunately, feature flags can come to the rescue once again. By incorporating feature flags into the new version of the app, you can enable the feature for a small group of users while keeping it disabled for others. Thus, feature flags allow you to conduct canary deployments even within a single production instance of your application.


  • Canary release management: Managing canary releases becomes more complex when rapidly introducing multiple new features. In scenarios where you need to test two new features simultaneously, you would require three environments: one for most users and one for each of the two new features. It might even be necessary to have a fourth environment to assess the combined impact of the two new features. This complexity can become challenging to handle effectively.
  • Again, the utilization of feature flags can alleviate these difficulties. By leveraging a robust feature flag management platform, you can easily enable one or more features for specific groups of users. As things progress according to plan, you can gradually increase the percentage of users who experience the new version of your software until it reaches all users.


  • Getting started with canary tests using feature management: Implementing canary releases in a single production environment is significantly easier with feature flags. Canary testing reduces the risk associated with software releases, enhances flexibility and confidence, and accelerates the rollout of new features.

Conclusion

Throughout this comprehensive guide, we have delved into the fundamental concepts and optimal methodologies associated with canary testing. By gradually introducing alterations to a small subset of users or systems, canary testing empowers teams to carefully monitor the impact of these changes in a controlled environment before deploying features to the broader audience.

One of the key advantages of canary tests is their capacity to mitigate the risks linked to software updates or feature releases. Beyond the technical aspects, successful implementation of canary tests necessitates meticulous planning, transparent communication, and collaboration among development, operations, and other pertinent teams.

As organizations strive for continuous delivery and rapid innovation, canary testing remains an indispensable approach in their arsenal, ensuring that software updates are rolled out seamlessly and reliably.

Frequently asked questions

  • General ...
How is canary testing different from A/B testing?
While A/B testing focuses on comparing different versions to discern user preferences or behavior, canary testing takes a different approach by implementing changes gradually. This method allows for meticulous monitoring of the impact and early detection of any issues.
Can canary testing be used for infrastructure changes?
Canary testing extends its utility beyond software alterations and can effectively be applied to infrastructure modifications, such as network configurations or server upgrades. By employing this approach, the repercussions of such changes can be thoroughly evaluated to ensure a seamless transition.
How long should a canary testing phase last?
The length of a canary testing phase depends on the complexity of the changes being implemented and the size of the user base. Generally, it spans from several hours to a few days to gather ample data and effectively assess the impact of the modifications.

Did you find this page helpful?

Helpful

NotHelpful

Try LambdaTest Now !!

Get 100 minutes of automation test minutes FREE!!

Next-Gen App & Browser Testing Cloud