Evaluation Flashcards
What is the high level overview of evaluation?
detailed discussion on specific topics
- analytical evaluation
- usability evaluation
- experiments
Why evaluate?

What are the evaluation considerations?
There are a range of evaluation methods and settings
Your choice of method should be informed by the following:
- Why
- What
- Where
- When
Evaluation considerations: Why? Why evaluate?

Evaluation Considerations: What

Evaluation Considerations: Where

In general, when should evaluation occur?
Throughout the design process, from the first descriptions and sketches through to the final product.
What are the other evaluation considerations?
And Stackeholders:
- Management
- Actual Users
- Maintainers
- Others impacted

What are the general evaluation types?

How do evaluation types complement one another?

What are the participants rights?

Once we’ve completed the research, what do we do with it?

Why does Data analysis and Interpretation get tricky?
If you don’t plan appropriately, your data can be meaningless, or worse, you may interpret it incorrectly, e.g., that your interface works when it doesn’t

What are the Evaluation overview key points?

In general, what are Analytical Evaluations?
Evaluations without involving users
Experts assess the system using structured techniques known to be effective at uncovering usability flaws
theoretical models of human performance can predict actual use
What are the Analytical Evaluation Types?

What are the Analytical Evaluation Inspections?

What are Cognitive Walkthroughs
“Cognitive walkthroughs simulate a user’s problem-solving process at each step in the human-computer dialog, checking to see if the user’s goals and memory for actions can be assumed to lead to the next correct action”
What is the process for Cognitive walkthroughs?

Why is the process of Cognitive Walkthroughs tedios?
force you to consider every component, every step, from the user’s perspective
avoid “glossing over” the details because you get it
tried-and-true process for overcoming “designer blindness” – you can’t see the problems with things you’ve made
How do you get results form conginitive walkthroughs?
Through the process, information is recorded on:
Assumptions about what would cause problems and why (involves explaining why users would face difficulties)
Notes about side issues and design changes
Results are summarized
design is revised to fix problems identified

What is a Heuristic Evaluation?
define: heuristic
a rule of thumb—a principle that is a shortcut for solving a problem or making decisions
not always right/true, but cognitive shortcuts
design heuristics
broad usability guidelines that can guide a developer’s design efforts
derived from common design problems across many systems
several researchers and practitioners have developed different sets of heuristics (e.g. domain specific)
What is the general process for Heuristic Evaluation?
Systematic inspection of an interface design to see if an interface complies with a set of usability heuristics, or usability guidelines.
General process:
» 3-5 inspectors (usability engineers, experts…)
» inspect interface in isolation (~1-2 hr for simple interfaces)
» results are aggregated afterwards
single evaluator catches ~35% usability problems 5 evaluators catch ~75%
How does the proportion of usability problems found work?>

In general what is Nielsen’s Heuristics?
Nielsen’s Heuristics: Background
Developed by Jacob Nielsen in the early 1990s
Based on heuristics distilled from an empirical analysis of 249 usability problems
These heuristics have been revised for current technology
Heuristics still needed for some emerging technologies (e.g., mobile devices, AR, etc).
Design guidelines form a basis for developing heuristics
What are the actual Nielsen’s heuristics?

What are the heuristics recommended for corporate web site evaluation?

What are the Heuristic Evaluation Advantages?
A few guidelines identify many common usability problems
Fewer practical and ethical issues to deal with – no participants
Cheap and fast: a few guidelines identify many common usability problems
Provides common evaluation template (to compare approaches, systems)
What are the Heuristic Evaluation Problems?
principles may be too general subtleties involved in use
designer may not be able to overcome being defensive / experts may disagree
you may actually have the wrong design altogether can be hard to find experts
false positives: does the rule always apply?
not complete: will miss problems
not a replacement for user testing
In general, What is performance modelling?
Using models of human behavior to generate quantitative predictions of certain interface actions or sequences of actions
What is the formula for fitt’s law and what is it?

What does Fitt’s law tell us?
Fitts’ Law predicts how long it will take users to acquire targets once they know which target to select
How do we measure decision time?
Hick-Hyman Law models the time it takes users to decide between n familiar alternatives
What is the Hick-Hyman Law when items are equi-probable?

What is Hick-Hyman Law when certain items are more likely to be chosen than others?

What is the application for the Hick-Hyman Law?
Deciding on menu depth vs. breadth
Hick-Hyman Law models decision time, but not searching time?
Models decision time, not searching time
If the user is not familiar with the interface elements, they need to investigate each one. It’s not a pure search task:
- Time to search through n items is linear, not logarithmic
What is the Keystroke Level Model?
Given a task consisting of a sequence of steps
- How long will it take the user to perform those steps given a specific interface?
Keystroke Level Model (KLM)
- Models performance given a sequence of steps for an expert user
In KLM how is performance calculated?



What are performance modelling advantages?
Can evaluate components of interface prior to building it
Good for comparing different interface possibilities
Can get the kinks out of interface prior to full user testing/experimentation
What are the performance modelling problems?
Difficult to model complex tasks
- E.g., consider designing a KLM for your complete project
Most models consider only expert behaviour
For really accurate predictions, coefficients (those as and bs) need to be determined empirically
What is the analytical evaluation summary?
Analytical evaluations are a set of structured techniques for evaluating interfaces without (necessarily) involving end users
Often used to address major usability issues and inefficiencies prior to, and in parallel to involving end users
Should always complement rather than replace evaluations with end users
In general, What is a Usability Test?
A usability test is a “formal” method for evaluating whether a design is learnable, efficient, memorable, can reduce errors, meets users’ expectations, etc.
- users are not being evaluated
- the design is being evaluated
How do you get users for a usability test?
Bring in real users
Have them complete tasks with your design, while you watch (ideally with your entire team)
Measure and record things
- task completion, task time, error rates
- satisfaction, problem points, etc.
- use a think-aloud protocol, so you can “hear what they are thinking”
How is the data used from a usability test?
Use the data to
- Identify problems (major ones | minor ones)
- Provide design suggestions to design/engineering team
- Iterate on the design, repeat
What are the important considerations for a usability test?
Usually takes place in a usability lab or other controlled space
Major emphasis is on
- selecting representative users
- developing representative tasks
5-10 users typically selected
Tasks usually last no more than 30 minutes
The test conditions should be the same for every participant Informed consent form explains ethical issues
What are user testing environments?
Best environment depends on pragmatic considerations, as well as what you’re looking for
- Do you want your whole team to be able to view? Do you want to be able to review a test?
- How important are interruptions?
- What are you resources?
What are pilot studies and why are they challenging?
Especially important for usability testing Make sure your plan is viable
All the corners are checked (your script, questionnaires, tasks, etc., all work)
It is worth doing several to iron out problems before doing the main study
Ask colleagues if you can’t spare real users
Challenging
EVERYONE thinks they don’t need to do them. It all fits in your head, what can go wrong?
From nearly 15 years experience of running studies my advise is..
- just pilot. OK? You need to. Something will go wrong
How should you use tasks with a usability test?
A task is designed to probe a problem
Tasks should be straightforward and require the user to find certain items, or do certain operations
They can be more complex such as solving particular problems
Sample tasks for a weather network web site:
- What is the forecasted weather for Winnipeg?
- What is air quality in Los Angeles today?
- What is the level of humidity in Winnipeg?
- What is the forecast for Ottawa for the upcoming weekend?
You are developing a user test for a new CS web page. Identify 6 tasks for the test:
You are developing a user test for a new CS web page. Identify 6 tasks for the test:
Task 1: Identify the instructor for Comp 3020
Task 2: Find the e-mail address of the Comp 3020 prof
Task 3: Find the admission requirements for the M.Sc. Program
Task 4: Find out the first day of classes next term
Task 5: Locate the requirements for being a Co-op student
Task 6: Identify whether the graduate Graphics course is a “fundamentals” course
How many participants is enough for usability testing?

How can questionaires be used for usability testing?

How can one use Likert scales for usability testing?

How do you observe people in usability testing?

How do you indirectly track activities?

What are unobtrusive vs obtrusive oberservations?

ASK vs LOOK for being obtrusive?

What is the observation recap?
main point: minimize and think about the impact of your observations and any interventions!
unobtrusive – minimize interactions
obtrusive – try to maximize learning (at what cost)
What is qaulitative vs quantitative data?

What should you analyze and report?
Report on times to complete task, number of errors
Provide simple statistical measures: mean, median, std dev.
Describe interaction patterns
e.g., four ways that people may use the interface
How do you present results?

Give the usability testing summary
Usability testing is a form of applied experimentation
Users are brought into controlled environments to complete focused tasks
Focus of the testing is on having representative users and tasks
Typically both objective (e.g., task completion times, errors) and subjective data is collected (e.g., questionnaire data)
Important end goal: draw meaningful conclusions about your system’s current strengths and limitations
What are the two main types of evaluation and how do they differ?
Formative Evaluation: Is done at different states of development to check that the product meets user’ needs
Summartive evaluation: assesses the quality of a finished product