User Studies and UX Metrics Flashcards
Evaluation
- Evaluation is a process of systematically assessing a system, guided by questions about the system and evaluation criteria
- Does the system work as it is supposed to work?
- Does it comply with relevant standards?
- Is the performance improved over a previous version?
- You need to have a clear objectives in order to plan your evaluation
- What questions should the evaluation answer?
- If you have developed a system, what claims would you want to be able to make about it?
Evaluation of Usability and UX
- For systems that are used interactively, a main concern is their usability and user experience
- How well can users can learn and use a
system to achieve their goals - How satisfying is the use of the system
- How well can users can learn and use a
- Any evaluation should be guided by clear objectives and questions
- Can users complete the tasks the system is
meant to support? - Would a first-time user be able to figure out
how to use the system? - Has a change made in the interface had the
desired impact? … - Can users achieve their goals faster or with
less effort, compared to earlier versions, or
competing products?
- Can users complete the tasks the system is
Forms of usability evaluation
- Analytical evaluation
- Informal or formal review, for example using scenarios, guidelines, checklists or models
- By the design team and/or external reviewers (usability experts)
- Empirical evaluation
- Evaluation with users (“User studies”)
- Assessment based on observation
Forms of usability evaluation
User Studies
- Why evaluate with users?
- Designers are experts in using in their own systems
- That does not make them experts in usability and UX
- Analytical evaluation is limited by the ability of the reviewer to test a system from a user perspective
- Analytical evaluation can answer some questions but not other
- Why evaluate without users, first?
- Many problems can be found analytically
- Rigorous testing of interactive workflows by the design team
- Respect users and their time
Types of User Studies
- Usability tests
- Focus on identifying usability issues
- Problems that users encounter when they use a system
- Focus on identifying usability issues
- Lab Studies
- Focus on user performance and user experience
- Controlled experiments, often to compare interfaces or systems
- Field study
- Focus on use in the real-world
- Little or no control over the interaction, but observing use in context
Usability Tests
- User are given typical tasks to perform with a prototype, part of a system, or finished product
- Identifying usability issues for improvement (formative evaluation)
- Validating the design against project goals (summative evaluation)
- Qualitative focus on issue, i.e. problems users encounter when they try to complete tasks
- But usability issues can also be quantified by measuring frequency of issues
Usability issues
- Usability issues are problems users encounter toward achieving their goals
- Something they are not able to do, or find difficult to do
- Something they do that leads to problems
- Examples
- User actions that prevent task completion
- Performing an action that leads away from
task success - Not seeing something that should be
noticed - Participants says task is completed when it
isn’t - User misinterprets some information
presented to them
Issue-based Metrics
- Using metrics to prioritise improvements
- Pareto Principle (80/20 rule):
20% of the effort will generate 80% of the results - Example: problem frequency in a usability study
“What one thing would you improve”
- Asking users at the end of the usability test, what one problem to fix
- Coding responses to identify categories
- Example: top five cover 75% of suggested improvements
How many users for a test?
“Five Participants is Enough”
* It is widely believed that >75% of issues are found by the first five users (Nielsen’s model)
* “Testing one user is 100 percent better than testing none”
- “You can find more problems in half a day than you can fix in a month” (Steve Krug)
- Do not expect to find and fix all issues
- Some issues can only be discovered after other issues have been fixed
- What works for most people might remain an issue for some people
Lab Studies
- Lab studies focus on performance and user experience
- The purpose of design is to achieve an improvement of something
- Develop a prototype, system, app or product that in some respect is better than what we had before
- e.g., more efficient, easier to use, faster to learn, less error-prone, …
- Users are given tasks to perform under controlled conditions
- Observing the effect of specific designs on performance (e.g., completion time, error rate) and/or user experience (user-reported ratings)
Comparative Evaluation
- Lab studies are usually comparative
- Comparing a new user interface with of a previous version
- Is there an improvement?
- Benchmarking of a new interactive system against the best existing solution (comparison against a “baseline”)
- Important in research and innovation
- Comparing alternative designs to see which one works best
- Formative studies
Controlled Experiments
- Lab studies are conducted as controlled experiments
- Experiments are an empirical research method for answering questions of comparison and causality
- “Does the new feature added to the UI cause a lower error rate?”
- “Is search engine A more effective in finding what users are looking for than search engine B?”
- The aim of an experiment is to determine cause-effect relationship between variables
Principles of Experiment Design
- Reduction to observation of specific variables
- Reducing a question about cause and effect to specific variables that can be manipulated and specific variable that can be observed
- Repetition: repeated runs/trials to gain sufficient evidence
- Experiments study a relationship between variables; Repetition is necessary to build up evidence of the relationship
- Control to limit confounding factors
- Expertiments are controlled to minimize the influence of other variables on the observed effect.
Variables in Experiments
- Independent variables
- Something that is manipulated or systematically controlled
- In HCI experiments, we call an independent variable a factor
- Factors are manipulated across multiple levels (at least two)
- Each combination of factor and level defines a test condition
- e.g. factor Search Engine with levels [Google, Bing]
- Dependent variables
- Something we measure in the experiment, as an effect
- In HCI lab studies: a human behaviour or response
Example
- Webcomic xkcd ran a study to see what men
and women call different colours - Factors:
- Gender
- Colour they were shown (RGB)
- Gender is controlled
- Colour is manipulated
- Dependent variable
- The colour name they typed in
Planning User Studies
- Five practical steps to follow for a lab study / experiment
- Step 1: Define your study objectives
- Step 2: Identify your variables
- Step 3: Design the experiment: tasks, procedure, setup
- Step 4: Recruit participants and run the study
- Step 5: Evaluate and report the outcome
Define your study objective
- A clear objective is essential for deciding on your study approach
- Is the study formative or summative?
- What question(s) should the study answer?
- What will the results be used for after the study?
- If you conduct an evaluation of something you designed …
- What do you want to be able to say/claim about your design?
- What defines “better performance” or “better user experience” for your
design? - What should it be compared against?
Reflect user goals in your study objectives
- What are the assumptions about the users’ goals?
- Are the users required to use the system regularly? Or will they only use it occasionally?
- What alternatives do they have to using the system?
- In what kind of situations will they use the system?
- When they are busy? When they are bored? When they under extreme stress?
- What matters most to the user?
- Complete tasks as quickly as possible? Feel in control? Not making
mistakes? Have fun interacting? Feeling immersed?
Identify your variables - Factors
- What are the factors and conditions that you want to study and compare?
- Examples:
- Comparing three products – one factor with three levels (1x3)
- Interface with new feature v. prior version – one factor, two levels (1x2)
- Two calendar apps, on small v. large screen – four conditions (2x2)
- Two input devices, left- v. right-handed people – four conditions (2x2)
- Focus on one factor if possible (keep it simple)
- More factors make it harder to determine cause-effect relationships
- Aim for small number of conditions, large number of repetitions
Identify your variables - Data collection
- What measurements do you take? What data do you collect?
- What aspect of usability or user experience do you want to evaluate?
- Effectiveness: ability to complete a task accurately
- Efficiency: the amount of effort required to complete a task successfully
- Satisfaction and other aspects of user experience
- What type of measurement? What metrics?
- Performance measurement: task success, time, error rate, …
- Self-reported metrics: user ratings / questionnaire scores
Types of measurement in user studies
- Performance measures
- Measuring user performance on tasks they are given
- Observation or automated logging of performance data
- Self-reported metrics
- Measuring user experience and their perception of the interaction
- Using rating scales and questionnaires as instrument
- Behavioural and Physiological metrics
- Measuring the response of the body during interaction with a system
- e.g. eye-tracking to measure what users look at
Example: Usability metrics in ISO 9241-11:1998
Performance Measures
- Performance measures assess
- Effectiveness: ability to complete a task accurately
- Efficiency: the amount of effort required to complete a task successfully
- Measuring task success, time, errors
- Performance evaluation relies on clearly defined tasks and goals
- Users are given tasks to accomplish
- Task success has to be clearly defined
- Performance evaluation can focus on different usability aspects
- e.g. learnability: how long it takes to reach proficiency
Task Success
- Task success is a fundamental measure of effectiveness
- Task success rate: percentage of users who succeed on a task
- Requires clear definition of a task and of an end state to reach
- Requires clear criteria for pass/fail
- Giving up – users indicate they would give up if they were doing this for real
- Moderator calls it – when user makes no progress, or becomes too
frustrated - Too long – certain tasks are only considered successful if done in time limit
- Wrong – user thinks they completed successfully but they did not
Example: AED
- Usability evaluation of Automated External Defibrillators (AED)
- Are lay people able to use defibrillators successfully?
- Comparison of 4 Devices
- 64 participants, 35-55 years, none from a medical background
- Each device tested by a subgroup of 16 (“between-subject”)
- Task: rush into a room where they find the mannekin fully-dressed and an AED nearby
- Task success: successfully deliver a shock
Example: AED #2
Time on Task (Task completion time)
- Time on task is a basic measure for efficiency
- Requires that there is clearly defined start and end of a task, for starting and stopping the clock
- Great for comparative evaluation
- The more often the same task is performed by the same user, the more important efficiency becomes
- e.g. frequent data-entry, or information look-up
- reduced time on task saves costs
- Faster is not always most important for user experience
Time on Task (Task completion time) #2
- Time on task can vary, and improves with repetition
- Repeated measures
- Variance in performance has larger effect with shorter tasks
- Using multiple trials of the same (type of) task to determine mean performance
- Training effects
- Is the goal to determine time of a first-time user or trained user?
- How much training are users given before the evaluation starts?
- Using blocks of trial to measure learning effect
Error rate
- The rate at which errors occur affects both effectiveness and efficiency
- Speed-accuracy trade-off
- Errors also effect user experience / satisfaction
- Issues are the cause of a problem, errors the outcome
- Error rate: average number of errors, for each task
- Requires clear definition of what counts as an error
- Based on what users do (actions) or fail to do
- e.g. data-entry errors; wrong choices; key actions not taken
Efficiency
- Efficiency is the amount of effort required to complete a task successfully
- Time is a good indicator but does not show whether a task was completed with the least effort required (Users don’t always take the shortest path)
- Can also measure number of actions the user performs to complete a task, relative to optimum number of actions
- e.g. number of clicks, menus opened, pages visited
- Relevant for assessing the usability of transactions, navigation and information architecture
Example: Lostness
- How lost do users get on web sites?
- N: Number of different web pages visited while performing a task
- S: Total number of pages visited, counting revisits to same page
- R: Minimum (optimum) number of pages that must be visited to accomplish the task
Choice of metrics
- Choose metrics and collect data that reflect the study objectives
- For example
- Study completion of transactions (bookings etc) – task success, user satisfaction (perceived usability)
- Study frequent use of the same product – ease of use, efficiency
- Usability for a critical product – fast learnability, no errors
- Comparing products that offer the same service – can be different criteria but satisfaction is important
User Studies and Metrics - Key Points
- Choose appropriate metrics based on your study goals
- Clearly identify your variables
- Assign a name to the factors you study and to the levels or test conditions and use these consistently
- Assign a name to the dependent variables, and report the units in which they are measured
- Pilot test how measurements are taken to ensure that data is recorded consistently and correctly (also when the data collection is manual or by questionnaire)