Oh, wait a minute, did I mention "correctly billed"? Would we trust an LLM-based robot to inspect your finances and respond with a reassuring "Everything is Ok"? Clearly, we won’t be there soon. Despite their apparent inflexibility, classical test automation techniques have their benefits. They might be overly rigid, but they are reliable. Obviously, we can achieve greater test sensitivity with precisely programmed queries. The challenge now lies in finding the best approach to combine the strengths of both worlds and enhance the accuracy of language models. With several recent announcements here on HackerNews, the future looks promisingly exciting.
With this announcement (video here), we aim to open up the LLM component that we utilize in testup.io for natural language queries. The source repository serves as a standalone snapshot of our LLM microservice, handling these requests. It comprises several standalone tests using the Selenium interface. Whenever a free text query needs execution, our service steps in. It condenses the DOM to the relevant entries and queries the language model for the resulting interactions. Currently, only OpenAI is supported. Based on the response, a user interaction is initiated. Then, the cycle begins anew until the AI determines that the entire user request has been processed. Otherwise, an error is returned.
The repository is licensed unter MIT and contains all the relevant code necessary to replicate these steps on your local computer. All you require is a functional Selenium driver and a OpenAI API key. Alternatively, you can register on testup.io and generate the automation step through our user interface. Our automation vision aims to render tests visually inspectable. We aim to substitute complex error messages with before-and-after visual comparisons whenever feasible. Instead of crafting intricate selectors, we simplify the process by enabling users to easily capture a segment of their visual screen and use it as a reference image. Intelligent computer vision ensures that this reference remains resilient to rendering artefacts such as kerning and color mixing. You can experience this firsthand on our website, testup.io.
What is your favorite way to automate user interactions? Will LLMs become clever enough to navigate through a complex web application and perform all required tasks from a single prompt? Or, do you expect the mixture of classical checkpoints and simpler user prompts to coexist for the foreseeable future? Please let us know in the comments. We are excited to hear your opinion.