Block 4 - Unit 2: Usability testing and field studies Flashcards

1
Q

Usability testing. (4)

A

Involves users.

Emphasises ‘usable’ property - product is being tested rather than user.

(Often) controlled environment; performance of users on pre-planned tasks is repeatedly measured.

Goal - test if product is usable by intended user population to achieve the tasks designed for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Key components of usability testing. (2)

A

User test.

User satisfaction questionnaire.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

User testing. (Include examples)

A

Measures human performance on specific tasks.

Eg. reading different typefaces, navigation of menu types, info searching.

Logging of mouse / keyboard and video used to record performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

User satisfaction questionnaire.

A

How do users feel about using the product - rating along different scales, after interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measures used in usability testing.

A

Time and number:

  • time to complete a task (after a specified time away);
  • Number of errors per task or per time unit;
  • Number of views of online help / manuals;
  • Number of users making a particular error;
  • Number of users completing a task successfully.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Number of users needed in usability testing.

A

5 - 12 considered acceptable.

Can use less if lower budget / schedule, or for quick feedback on eg. logo placement - 2 or 3 users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Remote usability testing (including advantage).

A

Users perform tasks in their own setting, logged remotely.

Advantage - many users tested at once, and logged data automatically compiled into statistical packages for analysis. Eg. no of clicks per page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Usability testing (UB)

A

Doesn’t have to result in accurate measures of user performance, or be under controlled conditions, but does involve some form of performance assessment and a level of evaluator control, most commonly the use of pre-set tasks.

Several methods may be used - eg. indirect observation, user test to measure performance, questionnaire.

User tests often conducted along experimental lines, involving hypothesis testing and statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Doing user testing. (Intro)

A

How to do user testing is peculiar to usability testing.

Controlling test conditions is central - careful planning; ensure all material is prepared, conditions are the same for each user, what is being measured is indicative of what is being tested and that assumptions are made explicit in the test design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Elements of doing user testing. (4)

A

Design typical tasks.

Select typical users.

Prepare testing conditions.

Plan how to run the tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Design typical tasks (user testing).

A

Appropriate task to test users’ performance is critical.

‘Completion’ tasks set, eg. find website, create spreadsheet.

Quantitative performance measures obtained.

User tests most suitable for hi-fi prototypes, simulations and working products.
Task type depends on system type and evaluation goals / questions.
Eg. whether paper-prototypes, simulation or limited part of system’s functionality will influence breadth and complexity of tasks set.

Generally, tasks are 5 - 10 mins and designed to probe a problem.
Often straightforward, but occasionally more complex, eg. join an online community or solve a problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Select typical users (user testing)

A

What characteristics?

Some products aimed at eg. old / young, novice / expert.

Equal male / female split, unless aimed at one.

Previous experience with similar systems?

If user population is large - short questionnaire can help identify testers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prepare testing conditions (user testing)

A

Environment controlled to prevent unwanted influences and noise that distorts results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Plan how to run the tests (user testing).

A

Schedule an script, equipment tested, pilot test.

Start with easy task to build confidence and familiarisation.

Avoid long tasks; keep session under 1 hour.

Don’t create too much data to analyse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Potential sources of bias (user testing)

A

Users:
if don’t match profile, may get misleading results.

Evaluation tasks:
If users get unintentional clues from tasks, this may prevent exposing UPs.

Test setting:
Lab is artificial, and may need conditions adding to more closely simulate actual environment of use.

Evaluator / observer bias:
Behaviour may be altered when being watched. Observer may forget they shouldn’t ask leading questions, or help if user gets stuck.

Methodology:
‘Think-aloud’ alters behaviour.
Avoid ‘order effects’ - trying one design may give ‘practice’ to trying another. (So, counterbalance).

Reporting / analysis:
Data analysis and interpretation will be undertaken based on evaluator’s knowledge and experience - so biased to some extent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Field studies (intro)

A

Typically used to find how a product / prototype is adopted an used in people’s working / everyday lives.

Such settings are ‘messy’ - activities overlap and constantly get interrupted.

The way people interact with products is often different than in a laboratory setting; a better sense of the success of a product is gained in the real world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Trade-off of field studies.

A

Can’t test specific hypothesis about an interface or account, with the same degree of certainty, for how people react or use a product (than in a lab).

So, harder to determine what causes certain behaviour or what is problematic about the usability of a product.

Instead, qualitative accounts and descriptions of people’s behaviour and activities are obtained to reveal how they used the product and reacted to the design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Some characteristics of field studies. (3)

A

Can be minutes, months or years.

Primary data collection - observing and interviewing people; video, audio and field notes.

May ask for paper or electronic diaries to be filled in at points in the day - when interrupted, encounter a problem or are in a particular location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Things to do for field studies. (4)

A

Important to inform people of the length of study / session.

Need to agree part of site to be recorded and how.

Need to set up cameras unobtrusively, and if eg. in someone’s home - how to turn on / off.

What happens if product breaks?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Planning an observational evaluation (intro).

A

Expand DECIDE to present a procedural approach to ‘doing’ observational evaluation.

Primary focus - evaluation of low-fi prototypes (especially card-based and interface sketches).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Kinds of task to consider (‘I’)

A

Core tasks (frequent)

Those very important to users or the business.

Those that have some new design features / functionality.

Critical tasks.

Ones you feel have to be validated with users for greater clarity and understanding of the design team.

Those that scenarios were based on, which you used for developing the conceptual design of the UI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to choose tasks.

A

Choose tasks that help in validating usability or UX goals, or that focus on any particular design features you want to assess - metaphor suitability, task flow, icon design, etc.
But, depends on prototype as to what tasks you can evaluate.

23
Q

Other materials for evaluation. (9)

A
  1. Briefings for evaluators.
  2. Introductory material for participant.
  3. Background info for participant.
  4. Pre-session questionnaire or an interview plan.
  5. Permission to gather and keep data.
  6. Task cards.
  7. Data collection forms.
  8. Post-session interview plan or questionnaire.
  9. Data analysis, interpretation and recommendation forms.
24
Q

First steps before evaluation session.

A

Pilot study.

Revising any procedures and materials as required.

25
Q

Steps in an observational evaluation session.

A

Greet user, brief them about the study / equipment and obtain informed consent.

Conduct pre-session questionnaire / interview.

Show prototypes. If low-fi, explain their rough nature.

Give first task; check understanding.

Give each of the other tasks.

Observe while each task is attempted - what they do, or say if ‘think-aloud’. Record observations on paper / data collection forms.

Explore as appropriate other concerns / aspect of design - suitability of metaphor, task flow, icon design, structure, layout, tools, etc.

Conduct post-session questionnaires / interviews (if appropriate).

Thank user, and de-brief with post-evaluation discussion.
If others present, have group discussion - gives people chance to share their impressions with designers.

If evaluation is done with colleagues, can be useful to compare notes.

26
Q

What are paper-prototypes good for exploring? (6)

A

Concepts and terminology.

Navigation, work and task flow.

Content.

Documentation / help.

Requirements / functionality.

Interface layout.

27
Q

Comments on script for evaluating paper-prototypes.

A

Add any questions thought of to script.

Where possible cluster questions around relevant task, so you’re exploring at one time as many of the issues related to the task being attempted as you can.

For aspects (eg. UX goals) that may not cluster - ask separately.

Take care not to ask leading questions - we want users’ own opinions.

28
Q

Quantitative and qualitative data (analysis)

A

Quantitative - data in the form of numbers, or easily translated to numbers.

Qualitative - data is difficult to measure, count or express in numerical terms in a sensible fashion.

Fallacy that certain forms of data gathering will only result in quantitative data and others in qualitative - all forms previously discussed can result in either.

Quantitative analysis users numerical methods to ascertain magnitude, amount or size of something.

Qualitative analysis - nature of something, represented by themes, patterns an stories.

29
Q

Use and abuse of numbers.

A

Often unwarranted to turn qualitative data into numbers to manipulate and interpret with respect to goals, although people tend to believe numbers offer strong / clear conclusion.

Better to use % for 10+ only, but still better to use raw numbers to make understanding clear.

30
Q

Initial processing of interview data.

A

Raw data - audio or notes.

Notes - write up and expand while memory fresh.
Audio can help, or may be transcribed (large effort - maybe just relevant parts).

Closed questions - quantitative analysis, eg. % in age range.

Open questions - qualitative analysis.

31
Q

Initial processing of questionnaires.

A

Raw data - written responses, or electronic records.

May need to ‘clean’ eg. misunderstood question responses.

Data can be filtered by sub-populations, or by questions (eg. to understand reactions to colour).
Allows analyses to be conducted on subsets of data - can draw detailed conclusions for more specific goals. (May use tools, eg. spreadsheets).

Closed questions - quantitative, open - qualitative.

32
Q

Initial processing of observations.

A

Raw data - wide variety: notes, photos, data logs, think-aloud, video / audio.

Rich picture, but hard to analyse without structured framework.

Initial processing - write / expand notes, transcription.
In controlled setting - sync different data recordings.

Transcriptions / note - qualitative
Photos - contextual info.
Logs and some elements of notes - quantitative.

Patterns / themes emerge - note initial impressions for further analysis, but don’t rely on these - you may be biased by them.

33
Q

Simple quantitative analysis.

A

Type of average chosen can change meaning of results.

Before analysis - data needs collating into data sets.
Can usually be put in rows and columns - eg. spreadsheet for easy manipulations and data set filtering.

34
Q

Example of how the question affects data analysis.

A

“How do you feel about Cybelle?” - responses will be treated qualitatively, which is hard with many responses.

Instead, could use a closed question, eg. “Is Cybellee amusing or irritating?” (or neither).
Reduces options and can give % for each.

Or, use Likert scale: “Cybelle is amusing” - strongly agree, etc.

35
Q

Advantages of spreadsheets.

A

Commonly available and understood.

Offers a variety of numerical manipulations and graphs - good for overall view.

36
Q

How graphs are helpful.

A

Can help identify outliers in error rate data - may be removed to avoid distortion, but worth investigating why.

Fairly straighforward to compare 2 sets of results using graphs.

37
Q

3 main groups of methods for summarising quantitative data.

A

Tabulations, charts and rankings - visual representations.

Descriptive statistics - eg. mean, median and mode.

Inferential statistics - results based on tests of statistical significance that give the probability that a claim arising from data can be applied to your user population as a whole.

38
Q

Comment on size and frequency of evaluation tests.

A

Mostly looking for info to help decide what to do next with design - small, frequent tests are more helpful than a single large-scale technique.

But, small samples don’t support maths of inferential statistics - not enough data for a statistical analysis that supports claims - avoid eg. ‘significant result’.

39
Q

Outcome of data analysis?

A

List of any usability problems (defects) found during evaluation.

Presence of usability defects indicate an interactive product’s usability goals and UX goals are not met.
They can lead to confusion, error, delay or failure to do a task.

40
Q

Characteristics of usability defects.

A

Irritates / confuses.

System hard to install, learn or use.

Mental overload.

Poor user performance.

Violates design standards or guidelines.

Reduces trust / credibility of system.

Tend to cause repeated errors.

41
Q

Simple qualitative analysis - first steps

A

(As with quantitative) first step is to gain an overall impression of the data and look for patterns.

Some patterns emerge during data gathering, but need to confirm and re-confirm findings to make sure initial impressions are not biasing analysis.

Using a framework for observational data helps notice patterns, by asking eg. “Who is present?”, “What is happening?”, “Where?”.

42
Q

3 types of qualitative analysis (M364).

A

Identify recurring patterns / themes.

Categorising data.

Analysing critical incidents.

43
Q

What aspects do patterns / themes relate to?

A

Behaviour, user group, places / situations where certain events happen, etc.

Each may be relevant to goals (ie. a main theme)

44
Q

How can problems found in analysis of qualitative data be grouped?

A

Categorised according to incidents, people looking puzzled, or don’t know what to do.

According to patterns - common and differences in data from users, eg. ‘All liked ……’.

Ordering according to severity of problem.

Dividing into lists - eg. whether you know the cause.

Order according to the level of difficulty in fixing a defect.

45
Q

Identifying causes of usability defects.

A

To identify causes of defects, need to look in more depth at the various sources of evaluation data collected (video, notes, etc.).

Process of reviewing and summarising evaluation data usually makes it obvious which problems require most urgent attention.

46
Q

Severity rating.

A

A measure given to a usability defect to indicate the criticality of its impact on the usability of the UI design.
Eg. 0 - 4

Prioritisation can be used to distribute resources - time, money, design effort - to required changes.

47
Q

Points recommendations for change may cover. (3)

A

Successes to build on.

Defects to fix.

Possible defects or successes not proven - evaluation data doesn’t have enough evidence to decide. Requires further evaluation.

48
Q

Categorising data - transcripts

A

Transcripts can be analysed at a high level of detail (stories, themes), or a fine level (each word, gesture, etc, analysed).

Either way elements are usually categorised first using a categorisation scheme.

49
Q

Categorisation challenges.

A

Categories used are largely determined by study goal.
Challenges:
- determine meaningful categories that are othogonal (don’t overlap).
- decide appropriate granularity (word, phrase, paragraph) - depends on goal and data.

50
Q

Content analysis.

A

Typically involves categorisation of data, then studying frequency of category occurrences.

51
Q

Discourse analysis (for transcripts).

A

Focuses on dialog - meaning of what is said, and how words convey meaning.

Strongly interpretive; attention to context.

How people use language to construct versions of their worlds.

52
Q

Conversational analysis.

A

Very fine grained form of discourse analysis.

Semantics of the discourse are examined in fine detail, and focus is on how conversations are conducted.

53
Q

Critical incident technique?

A

Flexible set of principles. 2 basic principles:

a) Reporting facts regarding behaviour is preferable to the collection of interpretations, ratings and opinions based on general impressions.
b) Reporting should be limited to those behaviours which, according to competent observers, make a significant contribution to the activity.

54
Q

ID and critical incident technique.

A

In ID context, use of well-planned observation sessions satisfies principle (a);
(b) refers to critical incidents - significant or pivotal to the activity being observed, in either a desirable or undesirable way.

In ID, the main focus is to identify specific significant incidents, then focus on these and analyse in detail, using the rest of the data as the context to inform their interpretation.
May be identified by:
- User - by retrospective discussion of recent event.
- Observer - by studying video; observation in real time.
Eg. when observer is obviously stuck.