Show HN: Harnessing LLMs for automated UI testing

8 points

2 years ago

At testup.io we have been working for a while to bring artificial intelligence to the field of test automation. Just a few years ago, the primary challenge laid in accurately identifying UI elements following minor structural changes, such as updates to IDs or paths. The emergence of Large Language Models (LLMs) raised the bar for what it meant to be smart. Now, we anticipate the robot to do lots of things autonomously, such as retry in cases of unresponsiveness or handle minor error reports. A more challenging, but soon expected feature, would involve the test robot navigating your web shop to verify the availability, purchaseability, and correct billing of certain items.

Oh, wait a minute, did I mention "correctly billed"? Would we trust an LLM-based robot to inspect your finances and respond with a reassuring "Everything is Ok"? Clearly, we won’t be there soon. Despite their apparent inflexibility, classical test automation techniques have their benefits. They might be overly rigid, but they are reliable. Obviously, we can achieve greater test sensitivity with precisely programmed queries. The challenge now lies in finding the best approach to combine the strengths of both worlds and enhance the accuracy of language models. With several recent announcements here on HackerNews, the future looks promisingly exciting.

With this announcement (video here), we aim to open up the LLM component that we utilize in testup.io for natural language queries. The source repository serves as a standalone snapshot of our LLM microservice, handling these requests. It comprises several standalone tests using the Selenium interface. Whenever a free text query needs execution, our service steps in. It condenses the DOM to the relevant entries and queries the language model for the resulting interactions. Currently, only OpenAI is supported. Based on the response, a user interaction is initiated. Then, the cycle begins anew until the AI determines that the entire user request has been processed. Otherwise, an error is returned.

The repository is licensed unter MIT and contains all the relevant code necessary to replicate these steps on your local computer. All you require is a functional Selenium driver and a OpenAI API key. Alternatively, you can register on testup.io and generate the automation step through our user interface. Our automation vision aims to render tests visually inspectable. We aim to substitute complex error messages with before-and-after visual comparisons whenever feasible. Instead of crafting intricate selectors, we simplify the process by enabling users to easily capture a segment of their visual screen and use it as a reference image. Intelligent computer vision ensures that this reference remains resilient to rendering artefacts such as kerning and color mixing. You can experience this firsthand on our website, testup.io.

What is your favorite way to automate user interactions? Will LLMs become clever enough to navigate through a complex web application and perform all required tasks from a single prompt? Or, do you expect the mixture of classical checkpoints and simpler user prompts to coexist for the foreseeable future? Please let us know in the comments. We are excited to hear your opinion.