Unmoderated Remote Usability Testing

Different from my previous writings, where I have mastered the skills being discussed for years, I actually get involved in Usability Testing just recently, about a year or so.

Early 2020 was actually the first time I saw Remote Usability Testing (UT) in practice, when I joined Tokopedia. About a week after onboarding, I saw a message posted in Product Manager Slack channel with Maze link attached to it. Out of curiosity I tapped the link and got redirected to some kind of "App Prototype" in which I was being asked to perform some tasks by interacting with the prototype.

I found out later that even before the Covid-19 Pandemic, Remote UT is the common exercise done by their product team to gather early feedback for specific feature before committing any effort and time to build it.

During my stay at the unicorn, I participated (as respondent) in around 10 similar UT activities. I found this exercise very insightful for us Product Manager, especially true if you deal with product's feature that is user-facing; to save you times from building something that turns out causing confusion to your users. Therefore, later when I moved to Qlip, I bring along this Remote UT practice to my new team.

In this writing, I will focus mainly about how to conduct Remote UT using Maze, simply because this is the only tools I really familiar with. As usual, I will show you real-life examples so that you can have better context and visualisation on this topic.

Types of Usability Testing

Before continuing on, as usual, let's talk a bit about definition. According to Interaction Design Foundation (IxDF), usability testing is:

The practice of testing how easy a design is to use with a group of representative users. It usually involves observing users as they attempt to complete tasks and can be done for different types of designs. It is often conducted repeatedly, from early development until a product’s release.

From the definition stated above, I can conclude that there are at least 3 things we should deeply understand before conducting any UT session, namely:

The objective of UT is to measure how easy-to-use the design is. In other words, by how far our intended product design is intuitive enough, so that it is easy to understand and to navigate by our users.
The participant of UT must represent our actual users, otherwise the findings we will gather from the testing will become unusable, as it might show us misleading signals that differs from what the actual users will experience.
The process of UT will include a specific set of tasks that we ask participants to perform by interacting either with a prototype or actual product. We then observe them by measuring whether the tasks are completed and by how long it's taken.

Based on my own experience, there are two types of Remote UT that are commonly used in the industry: moderated and unmoderated.

Remote Moderated UT

Remote Moderated UT is frequently done via a video call session (with screen-sharing as a mean to observe) between you or you team as the researchers and the participants when conducting the testing. In short, with this method of testing you can interact directly with the testers, help them setting up the context, and ask them to perform the given tasks. I would say, moderated UT can be called Qualitative testing as well, as you can probe the WHY of any particular interesting steps or issues encountered by the testers.

You can probe deeper by asking "why" questions to the testers.
You can take note on non-verbals signals during the session.

You couldn't quantify the conclusion as a reflection of the larger truth.
You will likely pay more for the participants to compensate their times.

Unmoderated Remote UT

As for Unmoderated Remote UT, almost all the elements and processes are quite similar with the moderated one. The major different is, as the name suggested, in this method researcher and participant do not interact directly with each others. We as researchers could rely on the available online tools in the market in which we can setup the UT tasks, and then participants can easily work on the tasks on the chosen test platform, anytime. By nature, I would say unmoderated testing as a more Quantitative approach as with this method we can reach broader audiences and the results of the testing are thus more quantifiable.

You can involve many participants, this will allow us to quantify the end results.
You can save more money, participants are likely joining the test voluntarily.

You would get the quantified "what", but it's rather hard to assume the "why".
You would also miss the non-verbal signals you can obtain from observation.

That being said, just like any other research study, combining the Qualitative and Quantitive approach to prove hypothesis or simply mining insights from users, would be your most ideal case. This is true, so that you can get the "Why, How, and What" questions answered and make all the dots be connected. However, in this writing I will focus on the Unmoderated Remote UT first and let's save the former for next occasion.

Prepare for the Remote UT

Suppose you are hired as a Product Manager in a growing startup company and your team is currently responsible to revamp the app registration module. All the stakeholders believe that by improving the onboarding process, it will lead to more users get onboarded successfully and thus will be able to experience the product core values. After having discussion with several stakeholders and brainstorming with your team, your design team has finally come up with a proposed solution.

However, you and your time are not quite sure whether this solution will really improve the user experience that will lead to higher registration success rate. Therefore, as PM, you decided to conduct a UT session first using the proposed design to see how it performs before passing it to the engineering team.

You might notice this is the case of "feature or funnel improvement". Ideally, you would compare the alternative design versus the status quo to see whether there is any significant difference in performances between the two, before releasing the change to the rest of the users. You could achieve this by running an experiment such as A/B Testing.

However, for the context of this writing, the objective of Usability Testing is to get early feedback on the design even before we build it. Because, in the case of many of participants failed to complete the UT Tasks, there's almost no meaning in implementing this alternative design without revision.

Define Objective and Tasks

First of all, you must define what objective you expect to achieve out of this UT session, for example:

Users can complete the registration process with 95% success rate and within 30s.

After that, create a simple task script that contains several activities which need to be done by the participants, for example:

Task

Instruction

Create Account

Ask participant to complete the registration process until succeed

Resend OTP Code

Ask participant what they will do in the case of OTP wasn't received

Facebook Login

Ask participant to register using Facebook Login method (alt)

Prepare Prototype and Tools

As mentioned earlier, UT is usually conducted during early stage of product development. In most cases, you need a clickable prototype ready before conducting a UT session. This is to ensure participants can mimic the user experience as close and realistic as possible to the actual product. It will give you a better proxy on how users will actually react to your product and its features.

I often use Marvel, InVision, or Figma to produce the prototype. All these prototyping tools are compatible with Maze, the Remote UT platform that we will discuss shortly. Even though I am not going to walk you through on how to create a prototype here, I'd like to let you know that it is one of many important skills you need to master as PM. Because, in the case you don't have enough designer to help you create a prototype, you can just make one yourself.

Here I use Figma Prototyping tools to create one, the prototype can be accessed via this link. Later, I will insert it into Maze setup.

Once you have the test objective, tasks, and prototype ready, setting up the UT platform is actually one of the easiest parts. In case you don't have a Maze account, just go visit their website at www.maze.co and sign-up for a new one, it has a free version!

Setup the Maze Blocks

First of all, you need to copy-paste the prototype url into Maze dashboard. You might be asked to authorise the access to your prototype source, this is to make sure that you are eligible to use it for the testing purpose. In Maze, the task script we have prepared will be inserted to one of the Blocks or sections that build up Maze project. Block can be a task or mission, multiple choice, open question, etc. Below are the anatomy of Maze project:

Default Screens: Maze will automatically create a default Welcome and Thank You screens for you. These are the two screens that participants will see in the beginning and at the end of the UT session. For the free version, these default screens are not customisable.
Mission Block: This is the heart of UT; the task you want participant to complete. There are three things you need to fill in for this type of block:
- Task: A straightforward sentence that summarise the task (e.g. Create an Account)
- Description: A more detail instruction for participants to complete the tasks.
- Expected Path: The User Flow you have defined previously in the prototype, will ideally act as the expected path. To map the path into Maze Block, you can simply click around the prototype preview on the right side, and Maze will record the order of the screens you tap as the expected path it assumes participants will take.
Opinion Block: Beside Task or Mission, I often use Opinion Block to ask participants to rate on "how easy it is for them to use the product". This will be the indicator whether the user experience is easy enough for users.
Preview and Start: You can preview the remote UT project to review it first in case you missed any step before launching the test. Once we are ok with the preview, it's time for us get it Live simply by tapping the "Start testing" button.

I have made the UT example in this writing Live, so that you guys can try it, please visit this link if you are interested https://t.maze.co/40843039. As for the step-by-step guidance on how to "Get started with Maze", you can consider to pay this link a visit.

Analyse the UT Results

Now is the time for the analysis part! Once you are getting results from participants, you could monitor it real-time from the Maze dashboard. But before we discuss the metrics we got from this UT simulation, I need to mention the sample size quoted directly from Maze to achieve confidence level you desire:

Level 0 (<5): Test your maze with at least five people to start learning how your designs perform with real users.
Level 1 (5-20): At this level, you uncover the most common issues and learn how your designs perform with sufficient participants for accurate results.
Level 2 (21-100): Keep up the pace to discover undetected issues in your designs—increasing the number of participants can uncover all usability problems.
Level 3 (>100): More testers means greater confidence that all problems have been found. Well done you!

According to confidence levels above, by the time you reach Level 1, you might already identify the most visible problems that occurred. But, I would suggest you to reach out to more people to participate so that you have better confidence when it comes to decision making, considering that it is a quantitative study. Especially true, if you have more time to wait for larger pool of participants to come in.

Mission Block Results

For every task or mission in Maze, you should take a look first on the paths taken by participants, which are divided into three outcomes:

Direct Success: Participants who completed the mission via the expected path(s).
Indirect Success: Participants who completed the mission via unexpected paths.
Give-up / Bounce: Participants who left or gave up in the middle of the mission.

As we can see, out of 13 participants, 92.3% of them are able to complete the task. If we consider this fact alone, we might decide it is a failure output because it performs below our target metrics of 95% success rate 👎🏽. And for those who completed the tasks according to the Expected Path, we have average duration of 23.6s which is below the 30s that we targeted, let's give this one a passed ✅.

When we see the dataset like this, it's always interesting to find out more about something that is not working according to what we expect, thus we have Indirect Success and Give-up / Bounce outcomes here to check. Let's take a look into data of participant who gave up the mission first, since indirect success is still a success nonetheless.

Inside Maze mission's result, you can zoom in to individual screens users have taken in order to complete their mission, this is including the last screen opened before giving up the task. For this 1 participant who left the task uncompleted, we could see from the heatmap that he/she tried to click the Privacy Policy (PP) hyperlink 7 times before leaving. I would then assume this is an outlier who is very concern with data privacy before committing to any digital account creation, therefore he/she choose to leave when found out this feature was not working.

Considering this finding, you may come up with two alternative follow-up actions:

Simply ignore this one because it's an outlier, and you do now the actual feature must be incorporating the working PP anyway;
Continue gathering new data to prove whether it is indeed an outlier that represents a very small chunk of population, thus ok to disregard;

You decide... 😌

As for the Indirect Success outcome, we do now from the prototype that the only alternative path to complete the task is by tapping the "Facebook Login Button". Therefore, we could say from the current dataset that there are 23.1% of sample users who expect this function available. So, pay attention to this path and you may consider this feature to be included in the upcoming sprint backlog.

Opinion Block Results

The result from opinion block is very straightforward. In our case, we got the average score of 8.3/10, which a decent value with almost half of the participants choose 9 point. And if you take a look to individual response, you will see that the only one choosing 5 point is our friend who gave up the task before. This made me further assume that he/she is an outlier in this group of samples who didn't happy with the design which didn't include a working PP.

Looking at these findings, I would suggest myself to continue gathering more participants to join the UT session before concluding any decisive action item. However, at this stage, you pretty much have a hunch about where it leads us to. You just need more data to really confirm the patterns that occurred. If you have time though, it will be better if you can conduct the moderated UT as well, to confirm your other assumptions (asking the "Why").

Conclusion

In some other cases, you might see many participants are constantly failing in completing a task or getting stuck in particular screen. If this happens, I would suggest you take a look on that part, and if there is more than 3 participants who experience this problem, analyse and discuss this issue with your team immediately. Even though the UT session is still running, you and your team may want to get started solving this usability issue in parallel.

Ideally, you would want to always have the UT session before building any product feature. But in reality, that is always a case where you simply don't have the time to go through all the process. In this case, use your gut feeling as PM and the expertise of your team, as well as all the information you have at hands to come up with the best solution worth to implement. Always consider the overall impact versus effort matrix though, and once built don't forget to measure success metrics and make adjustment from there.

PreviousUser Flow: A Process Analysis NextStorytelling with Data (coming soon)

Last updated 4 years ago

Was this helpful?