Study Design and Analysis Flashcards

1
Q

Planning User Studies

A
  • Five practical steps to follow for a lab study / experiment
  • Step 1: Define your study objectives
  • Step 2: Identify your variables
  • Step 3: Design the experiment: tasks, procedure, setup
  • Step 4: Recruit participants and run the study
  • Step 5: Evaluate and report the outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Study Design = Research Design

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of measurement in user studies

A
  • Performance measures
    • Measuring user performance on tasks they are given
    • Observation or automated logging of performance data
  • Self-reported metrics
    • Measuring user experience and their perception of the interaction
    • Using rating scales and questionnaires as instrument
  • Behavioural and Physiological metrics
    • Measuring the response of the body during interaction with a system
    • e.g. eye-tracking to measure what users look at
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Performance measures

A
  • Performance measures assess
    • Effectiveness: ability to complete a task accurately
    • Efficiency: the amount of effort required to complete a task successfully
  • Measuring task success, time, errors
  • Performance evaluation relies on clearly defined tasks and goals
    • Users are given tasks to accomplish
    • Task success has to be clearly defined
  • Performance evaluation can focus on different usability aspects
    • e.g. learnability: how long it takes to reach proficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Task Success

A
  • Task success is a fundamental measure of effectiveness
  • Task success rate: percentage of users who succeed on a task
  • Requires clear definition of a task and of an end state to reach
  • Requires clear criteria for pass/fail
    • Giving up – users indicate they would give up if they were doing this for real
    • Moderator calls it – when user makes no progress, or becomes too frustrated
    • Too long – certain tasks are only considered successful if done in time limit
    • Wrong – user thinks they completed successfully but they did not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Learning Curve

A
  • Power Law of Practice
  • Describes how task performance increases with practise
  • Task time on the
    nth trial:
  • T1 is the time for the first trial
  • a is a constant capturing steepness
    of learning, c is a limiting constant
  • Holds for skilled behaviour
  • Does not hold for gaining knowledge!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example: Hierarchical Menus

A
  • Comparative evaluation
  • Pie Menu and Square Menu
  • Traditional Pull-Down Menu as baseline
  • 24 participants, 19-49 years
  • For each interface, participants completed 10 trials for familiarisation
  • Then they performed 8 blocks of 6 tasks (randomized)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Error rate / Accuracy

A
  • Error rate: average number of errors for each task
  • The rate at which errors occur affects both effectiveness and efficiency
    • Speed-accuracy trade-off
    • Errors also effect user experience / satisfaction
  • Requires clear definition of what counts as an error
    • Based on what users do (actions) or fail to do
    • e.g. data-entry errors; wrong choices; key actions not taken
  • Note: Issues are the cause of a problem, errors the outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Counting Clicks / Actions

A
  • Efficiency is the amount of effort required to complete a task successfully
  • Time on task is a good indicator but it does not show whether a task was completed with the least effort required
  • A different measure for efficiency is to count number of actions the user performs to complete a task
    • e.g., number of clicks, menus opened, web pages visited
  • Compare number of clicks (or other actions) performed to the minimum number of clicks required
    • evaluating usability of navigation and information architecture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Design

A
  • Once we know what we want to measure in a study, we can consider the design of the study. This includes:
  • Experimental setup, referred to as apparatus
    • Hardware and software, spatial arrangement of participant and devices
  • Tasks and Procedure
    • What tasks are the participants asked to complete
    • Sequence of events in the study
  • Design (= structure of experiment, but just referred to as the design)
    • Factorial design – how the experiment is structured by factors and levels
    • Participant grouping – one or more groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Choice of Tasks

A
  • Tasks are central to usability tests and user studies. (If there is no task than your study is not a usability study!)
  • In usability tests, users are given typical tasks that users would perform with the user interface, to find out whether they encounter problems
  • In user studies that measure performance, we have a trade-off:
  • Use typical tasks -> representative of real application
  • Use abstract tasks -> more control, for observation of how performance depends on conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Example: Evaluate Pointing Devices

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Skills- versus Knowledge-based Tasks

A
  • Tasks that are skill-based lend themselves to repetition, and to be performed with different test conditions
    • Examples: reaction time, selection from menus, text entry
  • Knowledge-based tasks are more problematic as users gain knowledge when they perform the task, and need careful variation
    • looking up information on a web site (vary what needs to be looked up)
    • finding a train connection (vary the task)
    • extract information from visualisations (vary data visualized)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Within-subjects or between-subjects

A
  • Within-subjects design
  • Each participant performs the same
    tasks with each of the test conditions
  • Between-subjects design
  • Participants are put into groups that
    each use different test conditions
  • Some variables (factors) require a
    between-subjects design
  • e.g., if we want to study expert versus
    novice performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Order Effects

A
  • In a within-subject design, there can be order effects on the results
  • Participants test with one condition, then another
  • Learning effects
    • Participants may perform better on a second condition because they benefitted from practice on the first
  • Fatigue effects
    • Participants might get tired if the task is demanding
    • They might get bored and less attentive if the task is repetitive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Counterbalancing

A
  • Counterbalancing is used to compensate
    for any order effect or sequence effect
  • Divide participants into groups that each are
    given test conditions in a different order
  • If we have two conditions A and B:
    • Half of the users first use A, then B
    • The other half first use B, then A
  • If we have more conditions, use Latin squares
    instead of all permutations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Design: Procedure

A
  • The procedure encompasses everything that the participant does or is exposed to, from the moment they arrive for the study until they leave
  • Includes the tasks that participants perform and the specific instructions, demonstration or practice they are given for their task
  • The order in which test conditions are administered, and how many repetitions/trials of the task in each condition
  • Includes consent procedure, and questionnaires that participants are given
    before and/or after testing
  • Time for breaks between tasks/conditions, and total time for a session
18
Q

Design - Key Points

A
  • Task design is critical, for tasks
    (i) to be representative of real use and
    (ii) to bring out differences between test conditions
  • Tasks may need to be varied, to avoid effects of anticipation and learning. Variations need to be carefully designed so that conditions remain equivalent.
  • Most HCI studies are within-subjects as they require fewer users and less variability, but order effects need to be controlled
  • Some studies have to be between-subjects as they compare user groups
  • Procedures need to be planned, followed and reported in detail to support interpretation (and reproducibility) of the results
19
Q

Sampling

A
  • Sampling refers to the selection of participants, as a sample from a target population
  • Ideally, the results of a study should hold for people in a target population who were not tested
  • Sampling is a major concern for survey research (e.g. opinion polls)
  • Experiments can produce statistically valid conclusions with relatively small samples
20
Q

Sampling

A
  • Sampling refers to the selection of participants, as a sample from a target population
  • Ideally, the results of a study should hold for people in a target population who were not tested
  • Sampling is a major concern for survey research (e.g. opinion polls)
  • Experiments can produce statistically valid conclusions with relatively small samples
21
Q

Sampling #2

A
  • Within-subject comparative evaluation is often conducted with small samples of 10-12 participants
    • must be multiple of number of conditions, for counterbalancing!
  • Representativeness
    • If the participants you measure are not part of the target group, there is no logical basis for generalizing the result
  • Randomness
    • Ideally the selection from the target population is random
    • In practice, sampling is based on convenience and volunteering
22
Q

Reporting of participants

A
  • Studies collect and report demographic data that describes the study population
  • Age, gender, and any other characteristics relevant to the study
  • This helps in understanding how representative the sample is
23
Q

Statistical Analysis

A
  • Statistical analysis is used for learning something about a larger group of people (the target population) than we can measure in a study.
  • If we cannot study the entire target population then we rely on a sample to make an estimate about all users
    • What we want is (e.g.) mean task completion time of the population
    • What we get is mean task completion time of our sample
  • How good an estimate is the measured performance/experience for the performance/experience of all users who will use the system?
  • How much can we trust that effects we observed hold with other users?
24
Q

Types of data / Levels of Measurement

A
  • Variables have a “level of measurement” that determines how the data can be analysed, and which statistics and data visualization are appropriate
25
Q

Descriptive Statistics

A
  • Descriptive statistics – describing one data set (one group or condition)
  • Essential for any interval and ratio-level data collected from participants
    • How did participants rate ease-of-use, on average?
    • How much did performance vary from participant to participant?
    • How good an estimate is the data for the whole population?
  • Report mean and standard deviation to show the distribution of the data points collected from across participants
  • Report the confidence interval to show in which range we expect the mean value for all potential users
26
Q

Confidence Intervals

A
  • Confidence intervals (CI) represent the range in which we expect the average value for all possible users
  • EXCEL function:
    CONFIDENCE(alpha, standard deviation, sample size)
  • alpha= 0.05 if we want 95% confidence
  • CI increases with the variance in the data
  • CI decreases, when we test more users
  • Inverse square: 4x users -> half the error
27
Q

Inferential Statistics

A
  • We need inferential statistics when we want to compare two or more groups, data sets or conditions
  • If the mean performance with interface A is better than with interface B, can we conclude that A is better than B?
  • How much does the observed difference depend on who we selected to participate in our study?
  • What are the chances that the result would be different with a different sample drawn from the population?
  • Test for statistical significance of the observed difference
28
Q

Statistical significance

A
  • If we observe a difference between A and B, what is the probability that was just due to chance (choice of sample)?
  • p-value: if p<0.05 you can report a statistically significant difference
  • For example:
    • In a study comparing two booking systems A and B, task completion time was faster for A than for B, on average across study participants
    • If a statistical test finds p<0.05 then we can conclude that users are faster with A than with B, on average across all potential users
    • If p>0.05 then we can not claim that A is better for users than B, even though we saw a difference within our sample
29
Q

Statistical Tests

A
  • Standard tests for different types of data
  • t-Test for comparing two groups
    • Comparing unpaired data (between groups): unpaired t-Test
    • Comparing paired groups (within-subjects): paired t-Test
    • Excel: T.TEST(array1,array2,tails,type)
  • Comparing more than two groups: ANOVA
  • Comparing nominal- and ordinal-scale data: Χ2 (chi square)
  • Statistical tests all come with assumptions that need to be checked
    • For example, is the data normally distributed?
30
Q

Sampling and Analysis - Key Points

A
  • People are the biggest random factor in any evaluation of user performance and user experience
  • User studies rely on participation of a sufficiently large number of users to get
    a good idea of average behaviour and experience
  • For any variable, the level of measurement determines how we can analyse it
  • Descriptive statistics are essential for describing the distribution of the data
    across the study participants (mean and standard deviation)
  • Confidence intervals and statistical significance are key concepts for drawing
    conclusions beyond the set of users that participated in the study
31
Q

Intelligent User Interfaces

A
  • Fusion of AI and HCI fields
  • Goal: To improve the user experience and usability of user interfaces by making iteasier, faster, accessible and more efficient with the help of artificial intelligence
  • IUIs are designed to provide a more natural and intuitive way of interacting with computers
  • Opened up new possibilities for HCI and has the potential to revolutionise the way we interact with computers
32
Q

IUI: Intelligent Communication Tools

A
  • Do text suggestions help us communicate more efficiently?
33
Q

IUI: Intelligent Input

A
  • Why are we so precise with our fingers on a screen?
34
Q

IUI: Recommender Systems

A
  • How do recommender systems impact the user experience?
35
Q

IUI: Voice User Interfaces

A
  • How do we design voice promtps and responses?
36
Q

IUI: Adaptive UI and Context Awareness

A
  • How do devices know when to adapt to our context?
37
Q

Intelligent User Interfaces: In the UI

A
  • In the UI: Intelligent processing is found in the user interface(s) of the system, and its purpose is to enable effective, natural, or otherwise appropriate interactions with users of the system
    • The user interfaces are automatically adapted to the inferred capabilities, needs of the user or context of the interaction (e.g. Gmail Smart Compose)
    • Multimodal and sensing-based systems that aim to enable more natural, human-like forms of input and output (e.g. Speech, Body language)
38
Q

Intelligent User Interfaces: Behind the UI

A
  • Behind the UI: Intelligent processing is found in the backend of the system, and its purpose is to serve some beneficial functions, such as performing actions on behalf of the user or modelling the user for personalisation
    • Recommender systems that provide us with personalised options (e.g. Netflix, Spotify)
    • Agents that perform complex or repetitive tasks with some guidance from the user
    • Situated assistance systems that monitor and support user activities (e.g. Health and activity monitoring)
39
Q

Intelligent User Interfaces

A
  • IUIs employ a variety of AI technologies in order to assist users achieve their goals and
    requirements
  • IUIs assist users by …
    • using AI to optimise the interaction by adapting to the user and their context
    • using AI to enable users to interact with the interface similar to how humans interact, making it more intuitive and user-friendly
    • using AI to intelligently provide personalised and relevant suggestions, recommendations, and responses based on their individual preferences and needs
40
Q
A