User Feedback Flashcards
Qualitative vs Quantitative
- Qualities
- Studying the dynamic and negotiable
- Open-ended
- Nuances
Analysis: Thematic patterns
-> “Findings”
- Numerical quantities
- Studying the fixed and measurable
- Categorical
Analysis: Statistics
-> “Results”
Qualitative != Assesing quality
Qualitative Methods
- Are helpful for understanding the user’s perspective
- Can be used to evaluate why parts of a design (do not) work
- But are often used for understanding the use domain before creating a system
- Domain = The users’ area(s) of expertise (e.g. firefighting, teaching etc.)
What can be evaluated qualitatively?
- Experience
- Usability issues
- Contextual fit
Evaluating Experience
- Hedonic experience
- How do users experience the system?
Pleasant/stressful/helpful/chaotic/… - Do people want to use the system (for the things it was designed for)?
Evaluating Usability
- Does the system help people or do they experience a need to do workarounds?
- Are the included features appropriate and sufficient?
- Does the structure of information make sense to users?
- What problems/issues do users encounter when using the system?
- What about the system works well for users?
Evaluating Contextual Fit
- How well does the system work in the intended use situations?
- How well does the system fit into the domain practice?
- Does the system disrupt tasks or routines?
- How does the system indirectly impact people that users work with, or who are otherwise impacted by users’ work?
Qualitative Evaluation with Users
Qualitative methods can center around…
- Users reporting on their experience
- You observing the user
- … or a combination
We will get into three examples of methods:
1. Interviews
2. The Think Aloud Protocol
3. Diary studies
User Feedback
- Introduction to Qualitative Methods
- Types of qualitative methods
- Interviews
- Focus Groups
- Think Aloud
- Focus Shift Analysis
- Diary Studies
- Metrics & Measures
- Self/User Reported
- Questionnaires
- Rating Scales
Interviews #1
- A purposefully one-sided conversation
- An interviewer has an agenda and directs the conversation
- Three types:
- Unstructured: Open-ended (“Tell me about…”), good for exploring topics and reactions to new design ideas
- Structured: Similar to questionnaires (“Which of the following…”), good for getting feedback about particular aspect of a design
- Semi-structured: A mix of open-ended and closed, good for in-depth coverage of the same topics with each participant
Interviews #2
Focus Groups #1
- Interviews are often one-on-one
- Focus group: A group of users are interviewed together
- Each group usually consists of similar kinds of users
- E.g. for BB: students in one group, administration in another
- Good for collecting multiple viewpoints
- Common form of interview in product design
Focus Groups #2
Interviews - Data
- Created during the interview:
- Audio/video recording
- Notes
- After the interview:
- Reflection notes
- Transcripts of the audio
Interviews - Analysis
Coding: Grouping parts of the data into themes
Think Aloud #1
The participant carries out a pre-defined task using the system
- During the task, the participant explains what they are thinking and doing, e.g.
- “I’m pressing the search field and typing in…”
- I can’t seem to find the menu. Maybe up here…” moves cursor
Think Aloud #2
Think Aloud - Data
Created during the study:
- Video recording
- …and/or combined audio and screen recording
- Notes
After the study:
- Transcripts of audio, aligned with video/screen recoding
- Potentially, only critical incidents are transcribed
Think Aloud - Analysis
- Coding (e.g. clustering errors into types)
- Looking for critical incidents
- E.g., points of error/confusion/silence
- In some cases it may be particularly interesting if a user makes an error without noticing
- Codes or critical incidents may be distilled into a list of issues.
- Results are often similar to what can be discovered with an expert walkthrough…
But can include surprising insights related to the user’s domain knowledge
Think Aloud - Focus Shift Analysis
- Analyzing what the user is focusing on can help you identify breakdowns
- Breakdown: When the user’s actions are directed at the system rather than at the task
- Example: Needing to figure out which paper tray to tell the printer to use, rather than specifying that you want the print to be A3
- In a focus shift analysis, the transcript is mapped to the objects the participant’s focus was directed at
- Objects= files, windows, UI elements, hardware,…
- Which objects are relevant depends on what you are evaluating
Diary Studies #1
- Participants fill out information during their day-to-day activities over a period of time.
- Feedback activities: Fillin in a form
- Elicitaion activities: Capturing media, such as photos
- Participant can be reminded to participate with automated e-mail or text messages.
- The diaries can be used to as outset for interviews
Diary Studies #2
Diary Studies - Data
Data from diary studies varies a lot:
- Text
- Photos
- Audio
- Annotations on images, maps, etc.
- …
Data Studies - Analysis
- Open/closed coding (like interviews)
- Identifying elements of particular interest to follow up on
- …
Pitfalls #1
Conversations are social situations -> People will behave like in any other social situation
Participants in a user study
- … Want to be polite / feel like you know more about the system than them
Ways to mitigate:
- Explain the participant’s role of domain expert (you want to learn from them, not the other way around)
- Let someone other than the person who made the system conduct the study (and let the participant know this)
- Use mock-ups, make it clear that system is not finished
Pitfalls #2
Self-Reported Metrics
- Data reported by the users themselves is important as it provides information on their satisfaction with a system, and perception of the interaction with it
- Self-reported data can be
- Qualitative: for example by asking users open-ended questions about their experience
- Quantitative: By using questionnaires as instrument for collecting quantitative data from users
- The general format for self-reported metrics is to give users a question (or statement) and ask them to select an answer on a scale
- Self-reported metrics are rarely useful just by themselves
- Combined with performance data, such as task success and times
- Combined with qualitative feedback, which can provide explanations for ratings
User-reported data
- Data reported by the users themselves is important as it provides information on their satisfaction with a system, and perception of the interaction with it
- Subjective feedback by users
- Self-reported data collected with questionnaires
- Asking users a pre-defined set of questions
- Similar to structured interview, but on paper or on a computer
- Use of rating scales for quantitative analysis
- Also qualitative data that can be coded for quantitative analysis
Questioaires #1
-Questionnaires are a method for data collection from study participants
- In general, the term refers to collection of data by giving users a set of pre-defined questions similar to a structured interview but on paper or on a computer
- Useful for gathering data from a larger number of people
- Can only gather data you know about (unlike observation and interviews that can uncover data)
Questionnaires #2
In quantitative research, the term refers to instruments that measure specific phenomena (perceptions, attitudes, …) by asking people questions that are carefully designed to meet three criteria
- Validity: the question measures what is intended to be measured
- Reliability: users will consistently answer the question in the same way
- Sensitivity: the questions detects meaningful differences
Questionnaires #4
Igroup Presence Questionnaire (IPQ)
- Develop for virtual reality experiences
- Measuring the user’s sense of presence in the virtual environment
- 14 items on 3 factors:
- Spatial presence: Sense of being physically present in the VE
- Involvement: Measuring attention devoted to the VE
- Experienced Realism: measuring subjective experience of realism
Open & Closed Questions
- Open-ended questions (“can you suggest any improvements”)
- Good for general subjective information
- Difficult to analyse
- Closed questions - single or multiple choice
- Restrict responses by supplying alternatives
- Easy to analyse
- Watch out for ‘hard-to-interpret’ responses
- Alternative responses should be
- Mutually exclusive
- Exhaustive
Other Data Collection
- Collection of demographic information on users, and any information about users that is relevant for a study and analysis of the results
- Age, gender
- e.g., prior experience with the type of interface or application
- e.g., handedness
- Collection of qualitative feedback
- For example using SEQ (Single Ease Question) combined with asking users to give a reason for their rating
- Post-test feedback / comments
- e.g., “Can you suggest any improvements”
Questionnaire Guidelines
- Always collect demographic data: age and gender
- Concise: keep questions simple and as short as possible.
- Relevance: each question must be relevant to your study goal.
- Precision: don’t use vague terms.
- Avoid ‘loaded’ or ‘leading’ questions that hint at the answer you want to hear
- Avoid ‘and’ questions -> split.
- Avoid negative questions (and double-negatives!)
- Avoid jargon and abbreviations
Rating Scales
- The most common rating scale are Likert Scales, composed of statements to which respondents rate their agreement.
- Developed by Rensis Likert, 1932, as general psychometric scale.
- A Likert item can be a positive (“The labels used in the interface are clear”) or a negative statement (“ I found the navigation options confusing”)
- Respondents specify level of agreement with a statement on a symmetrical agree-disagree scale.
- The original Likert scale has 5 points, each with a response anchor:
1 - Strongly disagree; 2 - Disagree; 3 - Neither agree nor disagree; 4 - Agree; 5 - Strongly agree - The range captures the intensity of the subjects’ feeling for a given item
Likert Scales
Likert Scales
- User judge a specific statement on a numeric scale
- Usually agreement or disagreement with a statement
- Provides quantitative data
- Typically 5-point or 7-point scales
Also other types of scales, e.g., semantic differential
Rating Scales # 1
Rating Scales #2
Statements for Likert scales need to be worded carefully, using unmodified adjectives.
- Modifiers such as “very”, “extremely”, “absolutely” bias the response
- e.g., “the UI is extremely easy to use” makes strong agreement less likely than “the UI is easy to use”
Post-task / Post-test rating
- In usability evaluation, questionnaires and ratings are categorized as as post-task versus post-test
- Post-task ratings are completed immediately after finishing a task, to capture impression of the tasks, and are often just a single task-difficulty question: e.g. 7-point “Single Ease Question” (SEQ)
- Post-test questionnaires are administered at the end of a session, after completion of all tasks with an interface, to capture how users perceive the usability of the interface as a whole
- Post-task and post-test rantings can be complementary
System Usability Scale (SUS)
- Widely used scale
- Developed by John Brooke, 1986
- 10 statements
- 5 Worded positively, 5 negatively
- Responses converted to score 0…4
- Added up
- Multiplied by 2.5
- Total out of 100
Usefulness, Satisfaction and Ease-of-Use Questionnaire (USE)
NASA-TLX
- NASA-TLX (Task Load Index) is a post-task
questionnaire for complex interfaces - Developed by NASA for measuring the perceived workload of highly technical tasks of aerospace crew
- 6 Questions on an un-labelled 21-point scale, from Very Low to Very High
- Complex to score
- In HCI it is common to just adopt the mental demand and physical demand questions into custom questionnaires
PSSUQ / CSUQ
- Post Study System Usability Questionnaire (PSSUQ) and Computer System Usability Questionnaire (CSUQ)
- Like the SUS for post-test rating of any type of interface. Originally PSSUQ, minor changes in CSUQ
- 16 items, 7-point scale, positively worded
- Provides overall usability score, but also scores subfactors: System Usefulness; Information Quality; Interface Quality
- High sensitivity: able to detect differences across a large number of variables (different user groups, types of systems used, years of experience, etc.)
- Effective at smaller sample sizes (because of higher sensitivity)
- Strong correlation with SUS
User Feedback - Key Points
- Some aspects of people‘s interaction with technology cannot be measured
- Qualitative studies are useful for obtaining rich data from a smaller number of subjects
- Qualitative methods can be used for evaluation
- … but are often also used to understand
the domain and the people who will be
using a system before design and
construction of the system begins
- … but are often also used to understand
- Qualitative approaches are not by definition better than quantitative approaches (or vice-versa) – What is appropriate depends on what you want to find out…
- Qualitative and quantitative data can complement each other