Curing the Flaky Test Headache with AI-powered Testing Tools

Ken Hardin

Posted On: March 11, 2024

view count27140 Views

Read time10 Min Read

Nothing can ruin a developer’s day as quickly as learning their latest code push has resulted in a flaky test that they now have to troubleshoot. The test ran successfully on its first pass, but then reported a critical issue on a second iteration, or when it ran on another system or an integrated testing environment.

Nothing’s changed with the code, so there’s no good reason for the failure. But there’s a big red X on the testing dashboard, and developers and QA engineers are faced with a costly, frustrating bug hunt.

Flaky tests undermine every aspect of software development and testing, from poor-quality releases to high staff turnover. And they are a headache that virtually every software team faces weekly, if not daily.

In this post, we’ll look at the common types and causes of flaky tests, as well as ways AI testing tools can finally help resolve this pernicious and costly issue.

Everybody Hates Flaky Tests

Flaky tests – also often referred to as unreliable or non-deterministic tests – are so commonplace that smart enterprise development teams have put in protocols on how to handle them. (More on this in a bit.) Even so, many developers and testers treat flaky tests as an unavoidable cost of doing business.

In fact, flaky tests were one of the major concerns cited in a recent survey our team at LamdaTest recently conducted of more than 1,600 QA professionals from around the globe. All told, they told us they spend about 8 percent of their time on flaky tests. That’s almost as large a time commitment (10.4 %) as they make to setting up[ and maintaining test environments.

A recent survey found that large corporations, including Microsoft and Google, were seeing flakiness in their testing as often as 41 percent of the time. These tests were extremely flaky – about half of the jobs that failed ran successfully on a second attempt after being restarted manually. Our survey showed that more than 24% of large organizations get non-deterministic results on more than 5% of their tests.

And that doesn’t take into account the toll that delays and frustration can have on your team dynamics and overall efficiency.

The Types of Flaky Tests

To develop a plan for tackling flaky tests, you first need to understand the general categories in which they occur.

Random: Just like it sounds. A test fails; you manually restart it; and it completes successfully, with no changes in code or environment. Or it fails and then succeeds on a second run. This is probably the most common kind of test flakiness, and it can drive you crazy as you hunt down possible causes.

Environmental: A test may work on your machine but fails on another system. Or, perhaps more troubling, code fails in a continuous integration (CI) environment.

Branch: The test succeeds within the featured application branch, but then fails when merged into the main. These are slightly less maddening than purely random fails – at least you have a starting point to hunt for inconsistencies and conflicts.

Flaky Tests Hurt Every Aspect of Software Development

We can’t stress this enough – everybody hates flaky tests. This article at Infoworld reflects the industry-wide focus of elevating the search for a solution to what’s become just part of the daily grind for most developers.

Why are flaky tests so awful?

They drain resources: Writing, executing, and re-running tests consume a huge chunk of developers’ time – in some cases, testing consumes more of their time than actually writing code. The benchmark survey we cited earlier said 77 percent of developers said flaky tests are a time-consuming part of their work cycle that draws them away from feature development and ideation.

They delay releases: Resolving unpredictable test results can push back milestones and completely derail a project timeline. That’s good for no one.

They can really stress out developers: It’s hard to retain talent as it is. The turnover rate for software developers is hovering around 60% annually. It’s incredibly demoralizing to have to go back after a “successful” rollup and bug hunt. In fact, flaky tests have been cited as the number-one issue making mobile developers nuts.

Can undermine the perceived value of the test suite and team: Like everybody else in IT, testing units have to justify the expense of their tools and talent. Compiling a lot of false “fails” in your testing program is a sure way to damage your reputation. Shopify famously lifted the pass rate on their extremely complex development channels to well over 90% by just re-running initially failed tests that had flaked out.

They can lead to buggy releases: If developers and testers don’t trust test results, they may get in the habit of simply pushing out code that has legitimate issues.

Why Do They Keep Happening?

So, why do carefully written tests flake out? These are the most common issues that cause non-deterministic tests.

Process timing / timeouts: Testers often write in sleep statements to allow time for the application to complete a request. If the application takes longer than expected, the test will fail. This is particularly tricky if the app is calling to an outside data store or resource that may not always perform as the tester expected. And, if the test environment is getting hammered, that can also slow responses to below the anticipated level.

Concurrency: Tests often expect an EXACT sequence of events, even though the code is written to allow for multiple execution sequences, particularly if different threads are handling the actions. If the test can accept only one sequence, it will fail.

Test order dependency: If testing changes data stores, memory or other aspects of the environment, running subsequent tests out of sequence will ensure failure. Your tests must be able to run independently in any order, as well as clean up after themselves to ensure stability for the next test.

Exotic code/environment conditions: Sometimes coders or environment admins create conditions (intentionally or unintentionally) that are hard to anticipate in test design, but can cause a test to fail under anomalous conditions. Everyone’s heard of the 500-mile email problem at this point – a server upgrade implemented an extremely tight timeout window that was causing emails to fail. This is rare, but it can happen.

Bad tests: Human error is always a real risk. Solid tests should include assumptions that cover all operational bases, as well as measures to enforce those assumptions. Edge cases do happen, and you need to test for them.

Flaky tests can be caused by various factors, including process timing, concurrency, test order dependency, and exotic code or environment conditions, as seen above. these issues can lead to non-deterministic behaviour in tests, making them unreliable and frustrating for developers to deal with.

To understand engineer’s perspectives on flaky tests, watch the complete video tutorial and see how these perspectives can help handle and resolve flaky test issues effectively.

How to Tackle Flaky Tests

As we mentioned earlier, flaky tests are such a big problem for software developers and testers that most enterprise teams have developed some protocols for handling them.

Here, Mythili Raju and Harshit Paul have laid out a framework for flaky test mitigation in our LamdaTest Learning Hub, which we strongly recommend you read.

In summary, we suggest that your team:

  • Implement consistent test retry mechanisms to ensure that a “fail” is a fail
  • Regularly maintain and update your tests
  • Encourage open communication between developers and testers
  • Set and monitor test performance metrics and KPIs
  • Create a weighting scale to prioritize the resolution of flaky tests based on business value
  • Constantly gather and analyze test performance data

It’s with the last initiative, analyzing testing data, where the emerging category of AI and machine learning testing tools can have an enormous impact on not only resolving flaky tests but also preventing them from happening in the first place.

AI-Based Tools and Flaky Test Detection

AI tools are great at identifying flaky tests, allowing developers and project managers to prioritize legitimate code errors that can be remediated and advanced toward release while moving flaky test results along a different path.

AI and machine learning can also analyze test results to find the underlying issues, such as environmental factors, that can contribute to ongoing flakiness.

The major strengths of AI are:

Root Cause Analysis: In addition to flagging flaky tests, AI-based tools can parse your testing logs to recognize patterns in non-deterministic results, most often with environmental factors. With these insights, you can resolve timing and resource issues that are derailing your testing efforts. (Of course, AI pattern recognition can also find recurring code errors, but that’s not really “flaky” – it’s just another way that AI tools can improve your software pipeline.)

Adaptive Test Maintenance: AI tools can sort out outdated or unnecessary use cases, which majorly contribute to overall “flakiness” in your testing suite.

Predictive Analytics: AI tools can help development teams avoid flaky environmental factors, based on historical data and fixed deltas you provide the system. No flaky errors, no time wanted.

Continuous Improvement: AI-based tools can support your team as they refine testing strategies for both the near and long term. Software testing should be both comprehensive and efficient, and Big Data analysis of your testing logs can keep you on that path.

AI Helps Tackle Flaky Tests at the Root

Flaky tests are a serious drain on your development and testing teams’ resources and morale. A solid remediation plan, including AI-powered testing suite tools to diagnose and resolve underlying issues that cause flaky tests, is essential in keeping your projects on schedule and up to the quality standards you demand.

To conquer your flaky tests, tools like LambdaTest’s Test Intelligence help your teams to take data-driven actions in identifying, resolving, and preventing flaky tests. By leveraging machine learning and intelligent analysis, LambdaTest aims to enhance the reliability and effectiveness of automated testing, resulting in more robust software delivery.

Author Profile Author Profile Author Profile

Author’s Profile

Ken Hardin

Ken Hardin is an experienced business analyst and executive team leader with a demonstrated history of success in the internet industry. Ken was a key member of the startup teams for both TechRepublci.com and ITBusinessEdge.com. Since 2010, he has served as the Principal Analyst for Clarity Answers LLC, which provides business guidance and project management services.

Blogs: 3



linkedintwitter

Test Your Web Or Mobile Apps On 3000+ Browsers

Signup for free