Final Flashcards
2 different metric types
observable + quantifiable
3 common questions with metrics
- “Which metric shall I (we) use?”
- “How shall I (we) obtain the components needed to calculate it?”
- “Is this metric reliable enough to give a realistic picture of the degree to which my (our) system is usable (or not)?
3 general metrics names
Performance Metrics
Issue-Based Metrics
Self-Reported Metrics
9 performance metrics names
- Completion rate (task success, effectiveness)
- Task time
- Errors
- Efficiency (Page views/clicks)
- Lostness
- Conversion rate
- Learnability
- Eye-tracking
- Biometric data
1 combined metric name
Single usability metric
3 self-reported metric names
- Task-level satisfaction (self-report)
- Expectations
- Test-level satisfaction (self-report)
1 issue-based metric name
Usability problems
Performance metrics (there ae six of them)
- Task Success (Completion Rates)
- Binary Success
- Levels of Success - Time on Task
- Errors
- Effectiveness
- Efficiency
- Learnability
Effectiveness formula explanation
- measure completion rate.
- fundamental usability metric/completion rate is calculated by assigning a binary value of ‘1’ if the test participant manages to complete a task and ‘0’ if he/she does not.
- average Task Completion Rate is 78%
Effectiveness formula
(Number of tasks completed/Total number of tasks) X 100%
Effectiveness example. 5 tasks and a user completes 3 of them.
3/5 X 100% = 60%
3 Levels of success
Complete Success
Partial Success
Failure
How to resolve a task when a user is not successful
Tell the users at the beginning of the session that they should continue to work on each task until they either complete it or reach the point at which, in the real world, they would give up or seek assistance (from technical support, a colleague, etc.).
Time on task explanation
Time on task (sometimes referred to as task completion time or simply task time).
Caveat: Sometimes, slower is better
Ex: They are truly engaging with the website
4 steps to make efficiency quantible
- Identify the action(s) to be measured
- Define the Start and end of an action
- Count the actions
- Actions must be meaningful
Lostness formula
N: # of DIFFERENT webpages visited
S: total # of webpages (duplicate included)
R: minimum # of pages required to visit
L = sqrt(N/S-1)^2 +(R/N-1)^2
2 types of rating scales
likert and semantic differential scales
4 types of self-reported metrics
- Post-Task ratings
- Post session ratings
- using sus to compare designs
- online services
Likert scale definition
5 point rating scale following:
1. Strongly disagree
2. Disagree
3. Neither agree nor disagree
4. Agree
5. Strongly agree
Semantic Differential scales definition
Will have a scale with two values on each side. These two values will be opposites such as: weak and strong, ugly and beautiful, cool and warm. example below
weak o o o o o o o strong
SUS meaning and definition
System Usability Scale - consists of a 10 item easy questionnaire with five response options for respondents; from Strongly agree to Strongly disagree
10 questions used in SUS
- I think that I would like to use this system frequently.
- I found the system unnecessarily complex.
- I thought the system was easy to use.
- I think that I would need the support of a technical person to be able to use this system.
- I found the various functions in this system were well integrated.
- I thought there was too much inconsistency in this system.
- I would imagine that most people would learn to use this system very quickly.
- I found the system very cumbersome to use.
- I felt very confident using the system.
- I needed to learn a lot of things before I could get going with this system.
CSUQ meaning and definition
Computer System Usability Questionnaire - has 19 questions about usability; from strongly disagree to strongly agree
3 severity ratings
low, medium, high
low severity rating
Any issue that annoys or frustrates participants but does not play a role in task failure. These are the types Of issues that may lead someone of course, but he still recovers and completes the task. This issue may only reduce efficiency and/or satisfaction a small amount, if any.
medium severity rating
Any issue that contributes to significant task difficulty but does not cause task failure. Participants often develop workarounds to get to what they need. These issues have an impact on effectiveness and most likely efficiency and satisfaction.
high severity rating
Any issue that leads directly to task failure. Basically, there is no way to encounter this issue and still complete the task. This type of issue has a significant impact on effectiveness, efficiency, and satisfaction.
Self-reported metrics definition
Self-reported data give you the most important information about users’ perception of the system and their interaction with it. At an emotional level, the data may tell you something about how the users feel about the system
tasks that can be measured
Time on task, errors, efficiency, lostness, conversion rate, learnability, eye tracking, emotion, stress, phycological measures
3 usability statistics
- effectiveness (can users successfully achieve their objectives)
- efficiency (how much effort and resource is expended in achieving those objectives)
- satisfaction (was the experience satisfactory)
what do sensor data streams test on?
a person, an environment, such as an office
and the home
questions in sensor data streaming
Where do people travel over the course of a day?
With whom do they normally communicate or collaborate?
What tools or information resources do they use at various points during the day? When, where, and with whom?
What routines help to define a “typical” or “atypical” day?
How healthy are a person’s daily behaviors? Is he or she making good health choices?
One advantage of using streams of data as a means of understanding people’s activities and behaviors is
that the technique can be used to answer research questions across a range of units of analysis
The 3 data streams are
egocentric, group-centric, and space-centric
Egocentric Sensor Data Streams
Sensors focused on monitoring the movements, activities, and interactions of a single individual can answer questions at an egocentric unit of analysis.
Egocentric data stream questions
How do people allocate their time or attention?
How is a person ’ s mental state or mood affected by real - world stimuli ?
How do electronic communications or mobile computing interactions affect daily routines?
How do people ’ s own understanding or interpretations of their activities , colleagues , or environment differ from what a ubiquitous computing application or tool is able to automatically sense ?
Group-Centric Sensor Data Streams
This group - centric approach can involve simply capturing the same signals as for a single person, but across a group over the same window of time, or it might involve deploying a broader set of environmental or infrastructural sensors in a shared/community space or collecting data about more interpersonal types of interactions
Group-centric data stream questions
How often do members of this group interact with one another ?
What do these interactions entail ?
How do power relations manifest in different kinds of work environments or work teams ?
Space-Centric Sensor Data Streams
Answer questions about how spaces are used, irrespective of their particular inhabitants, given appropriate instrumentation of a space.
High information density sensors
Low density sensors
Space-centric data stream questions
How are the occupants of a home spending their time throughout the day and night ?
Is a senior adult living by herself continuing to maintain healthy levels of physical activity ?
What is the impact of ambient feedback promoting environmental awareness on cooking , cleaning , and hygiene activities within different types of families ?
Sensor Data Streams and Context-Aware Computing
Context-aware computing is a form of interactive computing in which a user’s implicit behavior—that is, their location, their physical activity, or their interactions with other people—or the environment in which a system is being used can both serve as alternative or auxiliary inputs to the system.
limitations of sensor data streaming
- poor job of why things have happened in the real world
- The phenomena must be well understood
- quality of data limited by sensors capabilities
- select sensors to effectively capture quality data and minimize discomfort
- large streams of data over moderate-length deployment.
- sensors are technologically complex