L7- Evaluating interactive systems Flashcards by P Q

What is the difference between formative and summative evaluation

Formative evaluation is used in the early stages of a project to compare, assess and refine design ideas.
Formative evaluation often involves OPEN research questions where the researcher is interested in learning further information that may inform the design
Summative evaluation is more likely to be used in the later stages of a project and involve CLOSED research questions with the purpose of testing and evaluating systems according to predefined criteria

How well did you know this?

Not at all

Perfectly

What’s the difference between analytical and empirical evaluation methods

Analytical : based on applying a theory to analysis and discussion of the design, in the absence of real world users
Empirical : making observations and measurements of users

How well did you know this?

Not at all

Perfectly

What’s the difference between quantitative and qualitative evaluation

Numbers versus words/pictures/audio/video

How well did you know this?

Not at all

Perfectly

Why are analytical methods useful for formative evaluation?

Analytical methods are useful for formative evaluation, because if the system design has not yet been completed, it may be difficult to observe how it is used (although low fidelity prototypes can be helpful here)

How well did you know this?

Not at all

Perfectly

Give some examples of qualitative evaluation methods

Qualitative analytic methods include cognitive walkthrough (useful for closed research questions), and the cognitive dimensions of notations framework (useful for open research question).

How well did you know this?

Not at all

Perfectly

Give examples of quantitative analytic methods

The Keystroke Level Model is a quantitative analytic method, which can be used to create numerical comparisons of closed research questions.

How well did you know this?

Not at all

Perfectly

Give examples of qualitative empirical methods

think-aloud, interviews, and field observation –> ethnographic approaches
They are usually associated with open research questions, where the objective is to learn new information relevant to system design or use

How well did you know this?

Not at all

Perfectly

Give examples of quantitative empirical methods

Quantitative empirical methods generally require a working system, so are most often summative
Examples include the use of analytics and metrics in A/B experiments, and also controlled laboratory trials

How well did you know this?

Not at all

Perfectly

Explain how to run RCTs

Decide on a performance measure
Find a representative sample of the target population (who have given informed consent to participate)
Find an experimental task that can be used to collect performance data

How well did you know this?

Not at all

Perfectly

How might we measure the results of an RCT?

Effect size – impact on the mean performance
Measure correlation with factors that might improve performance
Report significance measures to check whether the observed effects might have resulted from random variation or other factors than the treatment

How well did you know this?

Not at all

Perfectly

What problems are associated with RCTs?

Overcoming natural variation needs large samples
RCTs don’t provide understanding of why a change occurred
This means that it is hard to know whether the effect will generalise (for example to commercial contexts)
If there are many relevant variables that are orthogonal to each other, such as different product features or design options, many separate experiments might therefore be required to distinguish between their effects and interactions
Thus RCTs aren’t often used for design research in commercial products
A more justifiable performance measure is profit maximisation, but sales/profit are often hard to measure with useful latency
Companies therefore tend to use PROXY MEASURES such as the number days that customers continue actively to use the product

How well did you know this?

Not at all

Perfectly

What is internal validity?

What the study done right
Reproducibility
Scientific integrity
Refutability

How well did you know this?

Not at all

Perfectly

What is external validity?

Does the study tell us useful things
Focussing on whether the results can be generalisable to real world situation, including factors such as representativeness of the sample population, the experimental task and the application context

How well did you know this?

Not at all

Perfectly

Describe two ways of analysing qualitative data

While we can use statistical comparison of quantitative measures from controlled experiments; interviews and field studies require analysis of qualitative data
Qualitative data is often recorded and transcribed as written text, so the analysis can proceed using a reproducible scientific method

How well did you know this?

Not at all

Perfectly

What is categorical coding, explain how to do it.

Categorical coding is a qualitative data analysis method that can be used to answer ‘closed’ questions, for example, comparing different groups of people or users of different products
The first step is to create a “coding frame” of expected categories of interest
The text data is then segmented (for example on phrase boundaries)
Each segment is assigned to one category, so that frequency and correspondence can be compared
In a scientific context, categorical coding should incorporate some assessment of inter-rater reliability, where two or more people make the coding decisions independently to avoid systematic bias or misinterpretation
Compare how many decisions agree relative to chance using a statistical measure such as Cohen’s Kappa for 2 people, or Fleiss’ Kappa for more and comparing to typical levels (0.6 - 0.8 is considered substantial agreement)
Inter-rater reliability may take account of how many decisions still disagreed after discussion, which may involve refining and iterating the coding frame to resolve decision criteria
It is often useful to ‘prototype’ the coding frame by having the independent raters discuss a sample before proceeding to code the main corpus

How well did you know this?

Not at all

Perfectly

What is grounded theory?

Study These Flashcards

Qualitative data analysis method that can be used to explore open questions where there is no prior expectation or theoretical assumption of the insights that the researcher is looking for
First step: read the data closely, looking for interesting categories (‘open coding’)
The researcher then collects fragments, writing ‘memos’ to capture insights as they occur
Emerging themes are organised using axial coding' across different sources of evidence It is important to constantly compare memos, themes and findings to the original data in order to ensure that these can be objective justified The process ends when the theoretical description has reachedsaturation’ in relation to the original data, with the main themes complete and accounted for

Explain how to get ethical clearance

Study These Flashcards

Inform ethics committee before you collect any data or recruit any participants
Describe the study, who will participate, what you will ask them to do, what data you will collect
What precautions are being taken, as appropriate to the nature of the research, including the approach taken to informed consent, and whether participants will be anonymous

What are three analytical evaluation options?

Study These Flashcards

Cognitive walkthrough
KLM/GOMS
Cognitive Dimensions

When would you use cognitive walkthrough?

Study These Flashcards

Cognitive Walkthrough: Is normally used in formative contexts – if you do have a working system, then why aren’t you observing a real user, which is far more informative than simulating or imagining one? However, Cognitive Walkthrough can be a valuable time-saving precaution before user studies start, to fix blatant usability bugs.

When would you use KLM/GOMS?

Study These Flashcards

KLM/GOMS: It is unlikely that you’ll have alternative detailed UI designs in advance, so there is not much to be learned from using these methods in the context of a Part II project. If do you have a working system, a controlled observation is superior

When would you use Cognitive Dimensions?

Study These Flashcards

Is better suited to less structured tasks than Cognitive Walkthrough and KLM/GOMS, which rely on predefined user goal and task structure

What empirical approaches could you choose from?

Study These Flashcards

Interviews/ethnography
Think-aloud / wizard of oz
Controlled experiments

When would you collect data using interviews/ethnography?

Study These Flashcards

Useful in formative/preparation phase where an open research method is helpful in developing design ideas or capturing user requirements

When would you use think-aloud/wizard of oz?

Study These Flashcards

Valuable for both paper prototypes and working systems
Highly effective at uncovering usability bugs as long as the verbal protocol is analysed rigorously using qualitative methods

When would you use controlled experiments?

Can help to establish the engineering aspects of the work Important to ensure you can measure the important attributes in a meaningful way (with both internal and external validity) need to test significance and report confidence interval of observed means and effect sizes

When would you use surveys and informal questionnaires

Be clear what you are measuring Is self reporting likely to be accurate Use a mix of open questions, which capture richer qualitative information, and closed questions that make it easier to aggregate and test hypotheses Open questions require a coding frame to structure and compare data, or grounded theory methods (if you have a broader research question) Collecting survey data via interviews is likely to give more insight but questionnaires are faster so that you can collect data from a larger sample Remember to test questionnaires with a pilot study as it's easier to get them wrong than with interviews

When would you use field testing

If a working product exists it may be possible to make a controlled release and collect data on how it is used Make a risk assessment Seek ethics approval before proceeding

When would you use standardised survey instruments

These are standard psychometric instruments to evaluate mental states such as fatigue, stress, confusion and emotional state There are also standard methods to assess individual differences (e.g. personality, intelligence) Use standardised approaches wherever possible, so your results can be compared to existing scientific literature Making changes to these standardised surveys generally invalidates the results

What are some bad evaluation techniques?

Don't use purely affective reports Don't ask a biased group -- e.g. your friends -- experimental demand Don't make claims that sound as though they result from a formative analytic process but are actually subjective Don't use introspective reports made by a single subject -- might be biased and subjective

How would you evaluate a non HCI project

Approach testing as a scientific exercise Define goals and hypotheses and understand the boundaries and performance limits of your system by exploring them Keep in mind that it's often necessary to test to point of failure so that you can make comparisons or explain limits For non-interactive projects, still necessary to decide whether evaluation should be analytic (proceeding by reasoning and argument, in which case you should ask how consistent and well-structured is your analytic framework) OR empirical (proceeding by measurement/observation, in which case you should ask what you are measuring and why, and ensure that you have achieved scientific validity, where the measurements are compatible with your claims). All projects can include a mix of formative and summative evaluation If you only evaluate formatively -- did you finish your project? If carrying out summative evaluation, be clear whether the evaluation criteria are internal (derived from some theory) or external (addressing some problem) Need to establish objectivity of qualitative data (i.e. that it isn't simply your own opinion).

L7- Evaluating interactive systems Flashcards

(30 cards)