Study Design and Analysis Flashcards
1
Q
Planning User Studies
A
- Five practical steps to follow for a lab study / experiment
- Step 1: Define your study objectives
- Step 2: Identify your variables
- Step 3: Design the experiment: tasks, procedure, setup
- Step 4: Recruit participants and run the study
- Step 5: Evaluate and report the outcome
2
Q
Study Design = Research Design
A
3
Q
Types of measurement in user studies
A
- Performance measures
- Measuring user performance on tasks they are given
- Observation or automated logging of performance data
- Self-reported metrics
- Measuring user experience and their perception of the interaction
- Using rating scales and questionnaires as instrument
- Behavioural and Physiological metrics
- Measuring the response of the body during interaction with a system
- e.g. eye-tracking to measure what users look at
4
Q
Performance measures
A
- Performance measures assess
- Effectiveness: ability to complete a task accurately
- Efficiency: the amount of effort required to complete a task successfully
- Measuring task success, time, errors
- Performance evaluation relies on clearly defined tasks and goals
- Users are given tasks to accomplish
- Task success has to be clearly defined
- Performance evaluation can focus on different usability aspects
- e.g. learnability: how long it takes to reach proficiency
5
Q
Task Success
A
- Task success is a fundamental measure of effectiveness
- Task success rate: percentage of users who succeed on a task
- Requires clear definition of a task and of an end state to reach
- Requires clear criteria for pass/fail
- Giving up – users indicate they would give up if they were doing this for real
- Moderator calls it – when user makes no progress, or becomes too frustrated
- Too long – certain tasks are only considered successful if done in time limit
- Wrong – user thinks they completed successfully but they did not
6
Q
Learning Curve
A
- Power Law of Practice
- Describes how task performance increases with practise
- Task time on the
nth trial: - T1 is the time for the first trial
- a is a constant capturing steepness
of learning, c is a limiting constant - Holds for skilled behaviour
- Does not hold for gaining knowledge!
7
Q
Example: Hierarchical Menus
A
- Comparative evaluation
- Pie Menu and Square Menu
- Traditional Pull-Down Menu as baseline
- 24 participants, 19-49 years
- For each interface, participants completed 10 trials for familiarisation
- Then they performed 8 blocks of 6 tasks (randomized)
8
Q
Error rate / Accuracy
A
- Error rate: average number of errors for each task
- The rate at which errors occur affects both effectiveness and efficiency
- Speed-accuracy trade-off
- Errors also effect user experience / satisfaction
- Requires clear definition of what counts as an error
- Based on what users do (actions) or fail to do
- e.g. data-entry errors; wrong choices; key actions not taken
- Note: Issues are the cause of a problem, errors the outcome
9
Q
Counting Clicks / Actions
A
- Efficiency is the amount of effort required to complete a task successfully
- Time on task is a good indicator but it does not show whether a task was completed with the least effort required
- A different measure for efficiency is to count number of actions the user performs to complete a task
- e.g., number of clicks, menus opened, web pages visited
- Compare number of clicks (or other actions) performed to the minimum number of clicks required
- evaluating usability of navigation and information architecture
10
Q
Design
A
- Once we know what we want to measure in a study, we can consider the design of the study. This includes:
- Experimental setup, referred to as apparatus
- Hardware and software, spatial arrangement of participant and devices
- Tasks and Procedure
- What tasks are the participants asked to complete
- Sequence of events in the study
- Design (= structure of experiment, but just referred to as the design)
- Factorial design – how the experiment is structured by factors and levels
- Participant grouping – one or more groups
11
Q
Choice of Tasks
A
- Tasks are central to usability tests and user studies. (If there is no task than your study is not a usability study!)
- In usability tests, users are given typical tasks that users would perform with the user interface, to find out whether they encounter problems
- In user studies that measure performance, we have a trade-off:
- Use typical tasks -> representative of real application
- Use abstract tasks -> more control, for observation of how performance depends on conditions
12
Q
Example: Evaluate Pointing Devices
A
13
Q
Skills- versus Knowledge-based Tasks
A
- Tasks that are skill-based lend themselves to repetition, and to be performed with different test conditions
- Examples: reaction time, selection from menus, text entry
- Knowledge-based tasks are more problematic as users gain knowledge when they perform the task, and need careful variation
- looking up information on a web site (vary what needs to be looked up)
- finding a train connection (vary the task)
- extract information from visualisations (vary data visualized)
- …
14
Q
Within-subjects or between-subjects
A
- Within-subjects design
- Each participant performs the same
tasks with each of the test conditions - Between-subjects design
- Participants are put into groups that
each use different test conditions - Some variables (factors) require a
between-subjects design - e.g., if we want to study expert versus
novice performance
15
Q
Order Effects
A
- In a within-subject design, there can be order effects on the results
- Participants test with one condition, then another
- Learning effects
- Participants may perform better on a second condition because they benefitted from practice on the first
- Fatigue effects
- Participants might get tired if the task is demanding
- They might get bored and less attentive if the task is repetitive
16
Q
Counterbalancing
A
- Counterbalancing is used to compensate
for any order effect or sequence effect - Divide participants into groups that each are
given test conditions in a different order - If we have two conditions A and B:
- Half of the users first use A, then B
- The other half first use B, then A
- If we have more conditions, use Latin squares
instead of all permutations