Photo by Possessed Photography on Unsplash

In this article I’m trying to anticipate the evolution of End-to-End (E2E) testing in the near future.

There is a lot of info about E2E testing, for example, this one.

The plan for the article:

1. What do we want from E2E testing?
2. E2E testing outcome
3. Current situation with E2E testing
4. Problems we have now with E2E testing
5. One ideal solution
6. The ideal solution outcome
7. Virtual users solution
8. Would the outcome be the same as in the ideal solution?
9. Some words about the design and implementation of the virtual users
10. Summary

What do we want from E2E testing?

By using E2E testing we want to make sure that from the real user perspective all flows in our application work fine and that the user can successfully reach their goal using the application.

E2E testing outcome

As a result of testing we would have the following artifacts:

  1. All the testing scenarios with the % of testers/runs who successfully finish them.
  2. Bug reports with enough info to reproduce the bugs.

Using the above outcome we can decide if the new version of our app is ready to be released in production and give us enough info to understand in which system we have bugs and how to repro them.

Current situation with E2E testing

Currently, in order to implement E2E testing, we need to prepare/update the testing scenarios, prepare test data (test accounts, test cards, etc), ask manual testers to test the application using the testing scenarios, or automate running those scenarios using different tools and methods.

Problems we have now with E2E testing

There are quite a few problems with E2E testing. We can enlist some:

  1. Considerable efforts to keep testing scenarios up to date every time a new version is being released or some code changes are being done.
  2. In the case of manual testing, we have to sacrifice some combination of user parameters like country, language, etc., in order to reduce the number of testing hours that can make the results of testing not so reliable.
  3. In the case of automation, there are a lot of technical challenges on each platform, especially on mobile, mostly related to the way of searching the proper UI elements and reliability of interaction with them and with device/browser.

One ideal solution

Then the question is how the ideal solution could look like. How do we see the perfect E2E testing?

What if we can have all our real users use our application, do whatever they need, and report to us if they find a bug or inconvenience. They should not stop using the app or leave a bad rating in stores or call our support, instead, they should just send us well-prepared reports with enough info to reproduce and fix bugs.

This approach definitely satisfies the definition of the E2E testing and it doesn’t have those problems from the previous chapter and actually, it sounds like perfect E2E testing, at least from my point of view.

The ideal solution outcome

In the case of the ideal solution we would have the same artifacts as from the usual E2E testing plus some more:

  1. By using the business analytics and metrics that we anyway should have in our application we would know how many users participated, how many of them successfully do what they want, how many errors were caught, and a lot of other product specific metrics that we would not have in case of usual E2E testing.
  2. Bug reports with enough info to reproduce the bugs.
  3. Reports about services load from our backend.
  4. Reports about any errors in our systems, especially on the boundaries of the systems.
  5. We even can make some A/B testing before officially releasing the app :)

Sounds rather cool. Of course, there are its own difficulties around usage of special environments and organization of the process but it seems possible to solve them as we’ve already done that for the usual E2E testing.

The next question is how we can make that dream come true without bothering our real users.

Virtual users

What if we can invent a virtual user (AI driven) who would install our application or open our website and use it as the real user with a purpose in their mind (or maybe more correct — in our Neural Network). Then we summon as many such users as we need and they will be doing exactly what we described in the ideal solution.

However, there are two requirements that make the task rather hard:

  1. Those users should behave differently in order to imitate the diversity in the real users. And they shouldn’t behave chaotically, they should be trying to reach their goal (buy a chair, search for a movie, or anything else).
  2. In order to reduce the load to our environment and to the machinery that emulate those virtual users we’d like to make their number as small as possible but enough to cover all the functionality.

One way of solving the both above requirements is to set up different types of users that will match the different groups of the real users. For example, our app is a mega online store, so we can, probably, define the following groups of users using their intentions:

  1. Search for one specific thing a lot, add many options in the basket, then choose one and buy it (intent to choose one thing from one category and buy it ASAP)
  2. Search for a group of similar things, rarely add something to the basket, almost never buy (intent to explore a broad category of goods)
  3. Come by a direct link from outside and immediately buy smth (intent to buy the specific thing)

I’m not an expert in user behavior, it’s just what comes to my mind first.

The main idea is that there are some common behavior patterns and they can be reproduced by our virtual users. However, the above patternd are not the testing scenarios. The testing scenarios are hardcoded and should give the tester or testing pipeline clear instructions: tap this, input that, check this, and so on. On the contrast the behaviors or patterns are very general, they are on a higher level of abstraction: “search for black office chair under 100 pounds with an overall rating bigger than 4.5 and check all reviews so there are no any injures mentioned, then buy it using {card} and {address}”.

A virtual user will then imitate the real users — trying to understand UI, find the needed elements on the screen, and interact with them in the same way the real users do.

Using this approach we can summon few virtual users per group and “ask” them to use our app in the same way the real users do it.

Would the outcome from the virtual users solution be the same as in the ideal solution?

The outcome would be pretty much the same, so we will have the same stats about users who participated and successfully reached their goals, the same reports about any bugs that prevent our virtual users to reach their goals, reports about backend errors that didn’t impact user experience, and so on.

What will be missing:

  1. Load testing. It depends on the capacity of our environment and testing pipeline actually, so theoretically we can summon thousands or millions of virtual users but the costs of such testing would be super high.
  2. Proper business metrics about the real users’ behavior. There are a lot of factors that impact the real users’ behavior — from the political situation in the world to the weather and modern trends. But nothing impacts our virtual users — they will behave almost the same way every time we summon them. Sure thing we can re-train them if something really important changes in the world. Thus the data about virtual users behavior can’t help product owners to evolve the product but it can help them to measure how easy using our application is by measuring the time the virtual users spend to reach their goals. It’s, actually, a really interesting idea to use the virtual users in order to do UX research of our apps. If an image on a button is not easily understandable for the virtual user it will be the same for the real one.

Some words about the design and implementation

How would we implement the virtual users?

Let’s see what would we need to make the virtual users do what we want.

There could be the following layers:

  1. [UI] UI that we should already have in the app
  2. [UI Parser] A system that understands the UI. So it can extract all the UI elements, be able to interact with them, and understand their purpose using text or images on elements. Ideally it shouldn’t rely on any knowledge about the internals of the UI — it should analyse a plain picture of the interface in the same way a real user does it.
  3. [Solver] A system that can expand the goal into a series of steps to reach it. If you want to buy a black chair you need to search for one first, you want to use filters and keywords, then you want to check it out and finally pay for it.
  4. [Goal setter] The very high-level system that translates the pattern or the needed behavior in a text form into the virtual user’s goal.

We can think of layer 3 as a core of the system, layer 4 as an input, layer 2 as an adapter from core to any UI (web, mobile, PC apps, or even something special like medical equipment with touch screens or industrial monitoring systems with hardware buttons).

High level design

How could the above example about the buying of a black office chair work here?

Layer 4

Layer 4 gets the text input, analyzes and extracts all the important features from it. It could look like this:
* Main goal: buy
* Subject: chair
* Subject properties: black, office
* Limitations: price (100 pounds), rating (better than 4.5), reviews (not contain injures).
* Account: …
* Card: …

I suppose the best way to do that is to use machine learning and train it to extract the meaning from the text, probably, there are already some on the market.

Layer 3

This one is a bit tricky. It should look like a problem solver and for each task provides the algorithm or the sequence of steps to solve it. In our example, the steps could look like this:

  1. Open the search page
  2. Search for a subject “chair” with keywords: black and office.
  3. Check the list of options, if there is none -> return error
  4. Filter out all the items with a price bigger than 100 and rating less than 4.5
  5. Iterate the list and choose the first one where reviews don’t contain keyword injures
  6. Proceed to checkout of the chosen item
  7. If it’s available buy it using credentials {card}

I can see two ways of implementing this:

  1. Use the real users’ analytics data, label a lot of different funnels and user’s journeys with goals and properties, and then use this data to train NN that can do the opposite — using a goal and properties suggests the sequence of events that lead to the goal.
  2. Once layers 1 and 2 are ready we can use reinforcement learning to train NN to reach different goals in the real application.

I appreciate that I don't have much knowledge in machine learning and maybe there are many other ways to implement this layer but I’m sure that the task is very interesting, challenging, and, what is more important, promising as the results can be reused in many other fields like robotics, game AI and many more.

Layer 2

This layer is very important as it gives us the real interface to any application. What I mean here is that UI is rather different in web, mobile apps, and PC apps and the way we can interact with the UI is different too. Layer 2 abstracts it and provides us with something more high level.

In our example, Layer 2 will be constantly analyzing the page it sees (using screenshots, some special tools like ‘adb’ for android or something else), extract all UI elements, and “understand” them, so when it receives command “Open search page” from Layer 3 it looks up something that has the meaning of search — button with the text “Search”, magnifying glass icon, menu with the text “Search”, etc, and then activates it by tapping on it (using special tools or just tapping on screen using elements coordinates).

This layer alone can be rather useful for UI testing as it gives the opportunity to interact with any UI universally and does not depend on special ‘ids’ of elements or special tooling for interacting with internal UI representation.

We have already made a PoC for this layer and it looks very promising. Hopefully, in the next articles, we can reveal more about it.

Layer 1

This is just the UI of our app and the implemetation of a bidirectional way of interacting with it. We already mentioned some options in the previous section — like taking screenshots and tapping on screen using coordinates of the UI elements.

Summary

In this article, we discussed one possible way of E2E testing evolution and suggested some ways of going there.

We think that the future of E2E testing in the hands of virtual users (AI) who can test our apps in the same way our real users use them every day.

It’s definitely the challenge to make that AI work but each step to the end state can bring its own value and can reveal new unknown ways of testing, using the apps, and even thinking about the apps.

First of all, I’m super happy that somebody reaches this far and I’ll be even happier to discuss this topic in comments to the article or in any other appropriate place.

Feel free to share your ideas, critics, and comments.