Essential Skills for Modern Testers [Testμ 2024]
LambdaTest
Posted On: August 22, 2024
994 Views
15 Min Read
As software grows more complex and AI becomes a standard part of coding, testers need to evolve their test automation skills.
In this session of Testμ 2024, James Massa, Senior Executive Director of Software Engineering and Architecture at JPMorgan Chase & Co., dives into four key areas every tester should focus on testing AI, using AI tools for testing, grasping FinOps, and maintaining data quality.
If you couldn’t catch all the sessions live, don’t worry! You can access the recordings at your convenience by visiting the LambdaTest YouTube Channel.
How to Embrace AI?
James started the session and highlighted the need to be careful with cloud costs, reminding us that even seasoned pros can get hit with unexpected charges. It’s a reminder that managing resources wisely is crucial to avoid hefty costs.
He also brought some humor to AI testing, noting how AI testers are great with edge cases because they’re trained for them, but AI agents might give you vague bug reports. It’s a light-hearted way to show that while AI has its strengths, it’s not perfect.
James encouraged testers to embrace AI tools, which they use to automate tasks and spot patterns. This can make testing more efficient, freeing up time to focus on trickier, more important scenarios.
He also stressed the importance of data quality, pointing out that good data leads to better AI results. So, it’s essential to prioritize data validation and governance to ensure the AI is working with reliable information.
Then, he touched on the importance of responsible AI, emphasizing that trust is key. By adopting responsible practices, we can make sure AI tools like chatbots and co-pilots are reliable and effective. He wrapped up by saying that AI isn’t here to replace testers but to help them be more productive and handle more complex tasks with greater ease.
When it comes to AI automation, AI-powered test assistants like KaneAI can help streamline the test automation process by offering natural language-driven test authoring and management. This enables even non-technical team members to create, debug, and evolve test cases with ease.
KaneAI is a smart AI test assistant for test authoring, management, and debugging, designed specifically for high-speed quality engineering teams. It empowers users to create and refine complex test cases using natural language, significantly lowering the time and expertise needed to begin with test automation.
Concept of an AI Lingo Level Set
James introduced the concept of an AI Lingo Level Set to help ensure that everyone is on the same page regarding key AI terms and concepts.
He explained:
- Responsible AI: AI that builds trust, which is crucial for business success. Just as trust is vital in human interactions, it is equally important in AI systems, especially those performing tasks previously handled by humans. Responsible AI practices are essential to ensure that tools like chatbots and AI agents are trustworthy and effective.
- Large Language Models (LLMs): These are AI models capable of reading and analyzing vast amounts of data, such as the Internet. LLMs are at the core of many AI applications, including ChatGPT.
- Supervised Learning: A traditional machine learning approach where data is labeled by humans, such as categorizing bugs as high, medium, or low priority. This method has been widely used in industries like finance for several years.
- Unsupervised Learning: Unlike supervised learning, this method involves machine learning from data without human-provided labels, often identifying outliers or patterns that could indicate bugs.
James also touched on the transition from traditional, deterministic algorithms—where outcomes are predictable and can be tested against expected results—to the more complex, non-deterministic nature of AI, where results can vary and aren’t always predictable. This shift underscores the need for a deeper understanding of AI behaviors and the importance of responsible AI practices to maintain trust and reliability in AI-driven systems.
General Future QA Predictions
James highlighted several general QA predictions for the future:
- More of Everything: He anticipated an increase in the number of releases, products, and customizations, which will naturally lead to more testing. As development cycles accelerate, testing will need to keep pace, potentially leading to multiple rounds of the Software Development Life Cycle (SDLC) being completed in a single release night.
- Quality Shifting Left: He predicted that AI would be integrated earlier in the development process, particularly in verifying the validity of training data. This shift, often referred to as shifting left, involves incorporating quality assurance activities as early as possible, even before development begins, to ensure that the data used in AI models is of high quality.
- Challenges with Data Drift: James pointed out the challenge of data drift, describing it as a silent killer that causes the gradual degradation of AI models. Data drift occurs when the underlying data changes over time, such as when new data sources are merged, leading to shifts in the model’s behavior. This is particularly relevant in industries like finance, where data characteristics can vary significantly after events like mergers, potentially impacting model performance.
How to Test An AI Project?
James outlined several key points about how to effectively test an AI project:
James Massa emphasizes the importance of equipping QA with the right tools to identify bad results, acknowledging that AI isn't 100% accurate, which makes setting clear KPI targets crucial. He also highlights the need to integrate QA with data scientists to ensure seamless… pic.twitter.com/hDcV7N6AzW— LambdaTest (@lambdatesting) August 21, 2024
- Data Strategy and Cloud Tools: James stressed the importance of a strong data strategy, noting that the best tools for handling AI and data are cloud-based. He warned against rushing into AI without first establishing a solid foundation, suggesting that doing so could lead to challenges if the infrastructure isn’t ready.
- Non-Deterministic Results: He explained that AI, particularly LLMs, produce non-deterministic results, meaning the same input may yield different outputs each time. This unpredictability, along with potential “hallucinations” (incorrect or unrealistic outputs), presents a significant shift for QA testers, who traditionally expect consistent and predictable outcomes.
- Test Coverage and Explainability: James pointed out that the traditional concept of 100% test coverage becomes nearly impossible with AI. Unlike traditional applications where paths through the code are known and testable, LLMs are vast and lack explainability, requiring more governance and a different approach to testing.
- Democratizing Data and Collaboration with Data Scientists: He suggested that QA teams should be involved in recognizing flawed results from data scientists and stressed the importance of collaboration between QA and data scientists. He also mentions that AI accuracy is rarely 100%, and having realistic KPIs (Key Performance Indicators) is essential.
- Back to the Future in QA: James predicted a return to more negotiation between QA and engineering teams, as the clarity that once existed in traditional software testing may diminish with the complexity of AI. This will require more discussion and agreement on when it’s appropriate to release updates.
- Reviewing Test Results: He acknowledged that reviewing test results in AI projects can be overwhelming and may require new approaches, such as using AI to test AI. However, this will still necessitate significant human oversight to ensure accuracy and reliability.
Why and How We Need Responsible AI?
James explained that responsible AI is essential for ensuring that artificial intelligence systems are fair, transparent, and accountable. He highlights the necessity of addressing biases in AI models to prevent perpetuating historical inequalities. AI systems trained on biased data can inadvertently reinforce existing disparities, making fairness a critical aspect of responsible AI. Implementing regular bias audits and ensuring diverse and representative data sets are key practices to mitigate these issues.
James Massa discusses the importance of responsible AI, focusing on reducing bias, protecting privacy, securing data from harmful inputs, and ensuring accountability. With powerful AI comes the responsibility to use it ethically and safely. pic.twitter.com/70vrVhheU0— LambdaTest (@lambdatesting) August 21, 2024
In terms of privacy, James emphasized the importance of protecting sensitive data handled by AI systems. With the increasing volume of data being processed, safeguarding this information from unauthorized access and breaches is paramount. He advocates for robust data management practices and adherence to privacy regulations to maintain confidentiality and build trust with users.
To achieve transparency and accountability, James suggested using explainability techniques such as LIME and SHAP to clarify how AI systems make decisions. These methods provide insights into the decision-making processes, enhancing understanding and trust. Additionally, maintaining comprehensive documentation and conducting exploratory testing can help ensure AI systems are reliable and their functioning is clear and accountable.
Testing Large Language Models
James provided insights on testing Large Language Models as follows:
- Testing Approach: Emphasized the need for comprehensive testing of LLMs due to their complexity and potential impact on various applications. Traditional testing methods may not be sufficient, so a tailored approach is necessary.
- Data Quality: Highlighted the importance of high-quality and diverse training data to improve LLM performance and reduce biases. Testing should include various data scenarios to ensure robustness.
- Synthetic Data: Discussed the use of synthetic data for testing LLMs, noting that while it can help in balancing test datasets, there are concerns about its realism and potential impact on model accuracy.
- Performance Metrics: Suggested focusing on specific performance metrics relevant to the application of the LLM. This includes evaluating accuracy, relevance, and the ability to handle diverse inputs effectively.
- Continuous Evaluation: Advocated for ongoing evaluation of LLMs throughout their lifecycle to adapt to new data and evolving requirements. Continuous feedback loops are essential for maintaining model quality.
- Ethical Considerations: Underlined the need for ethical testing practices to identify and mitigate any harmful or biased outputs generated by LLMs.
- Human Oversight: Stressed the importance of human oversight in the testing process to ensure that AI-generated outputs align with ethical standards and real-world applicability.
Testing Machine Learning
James offered insights on testing Machine Learning (ML) systems as follows:
- Testing Strategy: Advocated for a structured testing strategy tailored to the specific ML model. Traditional software testing methods may not fully address the complexities of ML systems, necessitating specialized approaches.
- Data Quality and Diversity: Emphasized the critical role of high-quality, diverse data in training ML models. Testing should cover a wide range of data scenarios to assess model robustness and generalization.
- Synthetic Data: Acknowledged the use of synthetic data for augmenting test datasets. While synthetic data can help address data scarcity, there are concerns about its effectiveness in representing real-world scenarios.
- Performance Metrics: Recommended focusing on relevant performance metrics such as accuracy, precision, recall, and F1 score. Metrics should align with the specific objectives and application of the ML model.
- Continuous Testing: Highlighted the importance of continuous testing and monitoring of ML models as they evolve. Regular updates and re-evaluations are necessary to ensure models remain accurate and effective over time.
- Ethical and Bias Considerations: Stressed the need to test for ethical implications and potential biases in ML outputs. Identifying and mitigating biased outcomes is crucial for responsible AI deployment.
- Human Oversight: Underlined the importance of human oversight throughout the testing process to ensure that ML systems operate within ethical guidelines and real-world requirements.
Predicting the Future of QA for Deterministic Systems
James predicted the future of QA for deterministic systems that include:
- Role of AI: AI will significantly impact QA for deterministic systems by automating many testing processes. AI can assist in writing test cases, conducting code reviews, and performing testing, potentially reducing manual effort.
- Enhanced Testing Efficiency: AI-driven tools will improve the efficiency of QA by executing more tests in less time and providing more accurate results. This can lead to a reduction in the amount of manual testing required.
- Integration into SDLC: AI will be integral to the Software Development Life Cycle, affecting all stages from writing requirements to deploying the application. The involvement of AI can streamline various aspects of QA and reduce the workload.
- Quality Assurance Agents: There will be the rise of QA agents capable of running tests, reviewing results, fixing minor issues, and escalating complex ones to human engineers. These agents will be essential in managing the increased output from AI-driven development processes.
- Focus on Oversight: AI will handle much of the testing, and human oversight will remain crucial. QA will transition more towards an oversight role, ensuring AI’s outputs are accurate and addressing any issues that arise.
- Impact on Legacy Systems: AI will become more prevalent, and the need for extensive testing of legacy systems may decrease. The ability of AI to generate high-quality outputs could result in fewer issues with new systems compared to older, rules-based systems.
FinOps is a QA Problem
James explained that FinOps, or financial operations, is indeed a QA issue due to the potential for costly bugs that can arise in financial systems. He emphasized that every development change carries the risk of introducing bugs that can lead to significant financial repercussions.
Therefore, it’s crucial to implement robust quality checks to identify and prevent these issues before they impact the bottom line. The process involves creating and running test cases specific to financial operations, conducting static and dynamic tests, logging and tracking bugs, and addressing root causes swiftly to avoid financial losses.
James further highlighted that FinOps issues should be treated like any other bugs, requiring rigorous testing and quick resolution. He suggested that, just as with different types of bugs, identifying and fixing financial bugs involves a systematic approach, including raising production incidents when issues are found and continuously monitoring for related bugs.
Testing FinOps
James highlighted the significance of integrating FinOps into the quality assurance process. He explained that every development change has the potential to introduce financial bugs that can lead to substantial costs.
To mitigate this risk, he recommended a robust QA approach specifically tailored for financial operations. This included creating and executing targeted test cases, conducting static and dynamic tests, and thoroughly logging and tracking financial issues. By doing so, organizations can identify and resolve financial discrepancies early, minimizing costly errors.
James advocated for shifting FinOps left in the SDLC. This approach involves incorporating financial operations and cost control measures early in the development process rather than addressing them after deployment. By integrating automated cost defense mechanisms and monitoring cloud spending from the planning stage, organizations can better manage and optimize their expenses. This proactive approach helps prevent financial issues and ensures more efficient use of resources, ultimately leading to cost savings and improved financial oversight.
Q&A Session!
Here are some of the questions that James answered at the end of the session:
- Can data drift be overcome by increasing the samples in your dataset? That is train the bank AI on data from 5+ banks for it to be more adapted to outliers and differences in multiple data sources, rather than learning that later on.
- Can we use AI to generate synthetic data for testing real-world applications where real data is not available due to many reasons?
James: Increasing the dataset size can help mitigate data drift by providing a broader range of data for training, making the model more adaptable to variations and outliers.
However, data drift can still be unpredictable and may not be entirely resolved through this approach alone. Continuous monitoring and testing against real-world data are crucial for effectively managing data drift and maintaining model performance. So, while expanding the dataset is beneficial, ongoing evaluation remains essential.
James: Yes, we can use AI to generate synthetic data to test real-world applications when real data is unavailable. Synthetic data can help fill gaps by simulating a range of scenarios that might not be present in the real dataset.
It can be useful for creating diverse and comprehensive test cases, enhancing model training, and identifying edge cases. However, it’s important to consider the limitations and ensure that synthetic data accurately reflects the characteristics of real-world data to avoid potential biases or inaccuracies in testing outcomes.
If you have more questions, please feel free to drop them off at the LambdaTest Community.
Got Questions? Drop them on LambdaTest Community. Visit now