Introduction to statistics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is statistics?

A
Exploring.
Analysing.
Summarising data.
Designing methods.
Collect data.
Drawing conclusions from data.
Making decisions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where will we use statistic?

A
At university:
Research.
Communication.
Design.
Analysis of laboratory experiments.
Surveys.
Career:
Evaluating experimental results.
Epidemiology.
Pharmaceutical.
Food industry.
Clinical trials.
Marketing studies.
Sales.
Data informing policies.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

From where can the data come?

A

Laboratory experiments.
Questionnaires.
Observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a variable?

A
A characteristic of interest.
Measured/observed. 
A factor for group data. 
Height.
Cholesterol levels.
Colour.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can data be?

A
Numerical= measurements.
Categorical = group.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
Is the variables numerical or categorical?
Height of males.
Cakes produced.
Gender.
Voting.
Education.
Cholesterol levels.
Salt concentration.
A
N.
N.
C.
C.
C.
N.
N.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is a variable continuous or discrete?

A

When the variable is a measurement then it is continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is the variable discrete or continuous?

Weight.
Participants.
Height.
Blood cholesterol concentration.
Cell count.
Enzyme activity.
Live births.
Reaction time (msecs).
A
C.
D.
C.
C.
D.
C.
D.
C.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is a variable nominal and when ordinal?

A

When the data can be ordered then they are ordinal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is the variable ordinal or nominal?

Gender.
Army rank.
Favourite restaurant.
Voting.
Education levels.
Marital status.
Exam grade.
Council tax band.
A
N.
O.
N.
N.
O.
N.
O.
O.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do statistics summarise?

A

Centre.
Position.
Spread.
Shape of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which are the measures that characterise the centre of a dataset?

A

Mean.
Mode.
Median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the mode?

A

The most frequently occurring value in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a odal class in a histogram?

A

The most frequently occurring value range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How many modal classes can we have in a histogram?

A

2 the most.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we calculate the mean?

A

Sum of all values/number of values (n).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can we calculate the median?

A

When we put values in an order.
Find the number in the middle.
Formally: n + 1 /2 = value in dataset.
Find that value in that position we calculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where can we find the mode?

A

In all types of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Where can we find the median?

A

Only for ordinal or numerical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where can we find the mean?

A

Only for numerical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can we find the mode from an age group dataset?

A

Find the variable with the highest number of students.

= occur more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can we find the mode of a gender dataset?

A

Value with highest frequency = occur more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which are the measures of Position?

A

Quartiles.
Q1: lower quartile.
Q2: median.
Q3: upper quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do quartiles do?

A

Divide an ordered tests into specific/equal parts?

Characterise the shape of dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does the median Q2 do?

A

Splits ordered data series into 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What do the quartiles Q1 and Q3 do?

A

Split the upper and lower halves of the ordered data series?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is Q1?

A

Lower first quartile.

25% of values lie below Q1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is Q2?

A

Median second quartile.

50% of values lie below Q2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is Q3?

A

Upper third quartile.

75% of values lie below Q3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Which are the quartile positions?

A
  1. 25x(n+1): lower quartile.
  2. 5x(n+1): median.
  3. 75x(n+1): upper quartile.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What do the formulae of quartiles calculate?

A

The position of the value in an ordered data series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Data:
60 70 82 90 68 68 76 76 62 74 76 70 80 62 78 76 68 60 74 60 80

Work out quartiles.

A

Order data:
60 60 60 62 62 68 68 68 70 70 74 74 76 76 76 76 78 80 80 82 90

n=21
n+1 = 22

Q1 position= 0.25x(n+1) =0.25 x 22=5.5
Q1=(62+68)/2=65

Q2 position= 0.5x(n+1) = 0.50 x 22=11
Q2=74

Q3 position= 0.75x(n+1)=0.75 x 22=16.5
Q3=(76+78)/2=77

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How does SPSS Output present the quartiles?

A

As percentiles of 25, 50, 75.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What if the position we find from Q1 is at 0.25?

A

Q1 = value + (next value-62)/4 = value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What if the position we find for Q3 is 0.75?

A

Q3 = value - (value - next value) / 3 = find the value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does a percentile do?

A

Divides an ordered data set into one hundred equal parts.

1% of values lie below P1.
2% below P2.
99% below P99.

37
Q

Which are the percentile positions?

A

0.01x(n+1), 0.02x(n+1)….., 0.99x(n+1)

38
Q

Wat do percentiles represent?

A

Median (Q2) = P50.
Q1 = P25.
Q3 = P75.

39
Q

Where are the percentiles useful?

A

In analysis of large data sets.

40
Q

How can we find the values of percentiles from a SPSS Output dataset?

A

Read the values the right percentiles show in the table.

41
Q

Which are the measures of dispersion = spreading?

A

Ranges.
Interquartile range.
Standard deviation.

42
Q

What are the disadvantages of precision lack in dataset?

A

Variability.

Uncertainty.

43
Q

What is precision of dataset for measurements?

A

Important.

44
Q

What do we need to know to make comparisons?

A

Spread of datasets.

45
Q

Which are some comparisons we can make about spreading of datasets?

A

If 2 datasets overlap.

If 2 datasets are different = hypothesis testing.

46
Q

What does accuracy mean?

A

Correct value.

47
Q

What does precision mean?

A

Reproducibility.

48
Q

How are the values characterised when they are near the centre and away from each other?

A

Accurate but not precise.

49
Q

How are the values characterised when they are all together but away from the middle?

A

Precise but not accurate.

50
Q

How are the values characterised when they are away from each other and from the centre?

A

Neither precise nor accurate.

51
Q

How are the values characterised when they are in the middle and all together?

A

Precise and accurate.

52
Q

What do we identify when using a hypothesis testing?

A
Spreading.
If the 2 datasets are completely different.
If they have different mean values.
If they have big variability.
Large signal.
53
Q

What do the range measures help us identify?

A

How spread out the values in the dataset are.

54
Q

What is the simplest measure of the range?

A

Range = Maximum value - Minimum value.

55
Q

60 60 60 62 62 68 68 68 70 70 74 74 76 76 76 76 78 80 80 82 90

Range?

A

Range = 90-60 = 30.

56
Q

What is the interquartile range?

A

The difference between the first and the third quartiles.

57
Q

How is the interquartile range calculated?

A

IQR = Q3 - Q1.

Q3 = third quartile.
Q1 = first quartile.
58
Q

Dataset:
60 60 60 62 62 68 68 68 70 70 74 74 76 76 76 76 78 80 80 82 90

Interquartile range?

A

Q1- position = 0.25(n+1)=5.5
Q1=(62+68)/2=65

Q3- position = 0.75(n+1)=16.5
Q3=(76+78)/2= 77

Inter-quartile range: IQR = 77 – 65 = 12

Range of values = 30 units.
Interquartile range of values = 12 units.

59
Q

What is the standard deviation?

A

The most widely used measure of dispersion.

A measure of dispersion or spread that indicates how closely the data values in a dataset cluster around the mean.

The residual.

60
Q

How do we calculate the standard deviation of a dataset?

A

Mean = x, of variable.

Deviation from the mean = value (xi) - mean (x)).
= xi-x.

Squared deviation (xi-x)2, of each deviation.

Sum squared deviations Σ (xi-x)2.

s2 = sum of squared deviations / sample size -1 = Σ(xi-x)2/n-1.

Standard deviation: s = square root of s2.

Formula: s = square root of (Σ (xi-x)2/n-1).

61
Q

Small data set: 9, 18, 7, 5, 11

Standard deviation?

A
n = 5
x = 50/5 = 10
s = square root of (Sum/n-1) = 5
62
Q

How can we check the standard deviation?

A

Smaller than variance = taken square root.
No negative.
Sensible value.

63
Q

How is standard deviation of samples characterised?

A

Not precise.

64
Q

How is small standard deviation of values characterised?

A

Precise.

65
Q

What does the range provide?

A

A quick dispersion measure.

66
Q

What does the range not use?

A

All data values.

67
Q

By what is the range affected?

A

By outliers.

68
Q

What does interquartile provide?

A

A quick measure of dispersion.

69
Q

What does the interquartile range use?

A

The middle 50% of the data.

70
Q

Why is the interquartile range not affected by outliers?

A

Because upper and lower 25% of the data is ignored.

71
Q

How is the standard deviation characterised as a dispersion measure?

A

Most commonly used.

Most important.

72
Q

What does the standard deviation use?

A

The mean as a reference.

All data values.

73
Q

Which questions do we ask when we notice an unusual observation in the dataset?

A
  1. Is it a reasonable value?
  2. Why is it so large/small?
  3. Will it bias the analysis?
  4. Is it an error we need to remove?
  5. Is it an outlier?
74
Q

What is an outlier?

A

An observation numerically distant from the rest of the data.

75
Q

What does an outlier indicate?

A

False data.
Error procedures.
Unusual circumstances.
Invalid theory.

76
Q

How can we identify the outliers in data?

A

Data points that fall outside lower inner fence or upper inner fence.

LIF = Q1-(1.5xIQR)
UIF = Q3 + (1.5 x IQR).
77
Q

How can we identify extreme outliers in a dataset?

A

Data points that fall outside lower extreme fence or upper extreme fence.

LIF = Q1 - (3 x IQR).
UIF = Q3 = + (3 x IQR).
78
Q

DBP: 60 60 60 62 62 68 68 68 70 70 74 74 76 76 76 76 78 80 80 82 90

Outliers?

A
Q1 = 65
Q3 = 77
IQR = 12

LIF = Q1 - 1.5 * IQR
= 65 - 1.5 x 12
47.

UIF = Q3 + 1.5 * IQR
=77 + 1.5 * 12
=95

No outliers in this dataset.

79
Q

What does the shape of the dataset show?

A

How the data are distributed.

80
Q

What happens to the shape of the data when more and more observations are made to the dataset?

A

It changes.

81
Q

Which is an important distribution in science and medicine?

A

The normal distribution.

82
Q

Which are some of the probable types a distributed data could have?

A
Unimodal.
Bernoulli with 2 possible outcomes.
Multimodal.
Uniform.
Binomial.
Poisson.
Exponential.
83
Q

How can we compare the dataset?

A

If they are symmetrical in terms of their shape.

84
Q

Which can be the shapes of data sets?

A

Positively skewed data.
Symmetric and bell-shaped = normally distributed data.
Negatively skewed data.

85
Q

What happens in a dataset when its data are positively skewed?

A

Most of the values are at the lower end of the data set.
Only few data are at the upper end.

Mean>Median>Mode.

86
Q

What happens in a dataset when its data are negatively skewed?

A

Most of the values are at the upper end of the data set.
Only few values are at the lower end.

Mean

87
Q

What happens in a dataset where its data are symmetric and Bell-shaped?

A

Most of the values are at the centre of the data set.
Roughly equal number of values are at either side of the centre.

Mean = Median = Mode.

88
Q

How should we compare datasets in a lab report?

A

Be objective not subjective.
Quote and compare calculated measures of central tendency and dispersion.
Comment on outliers.
Comment on skewness = values of mean, median, mode.
Draw boxplots and histograms.

Example:
Dataset ‘no’ has a higher mean (value) and median (value) of SBP than dataset ‘yes’ (values). The range of dataset ‘no’ is greater (values) but this is due to the presence of outliers. The range of the no’ data in the absence of the outliers is slightly smaller (values) but both the IQR (values) and standard deviation (values) of the ‘no’ data is much smaller than the ‘yes’ data. The ‘no’ data is negatively skewed (values) but the ‘yes’ data is positively skewed (values). There may be a significant difference between the groups.

Use values with units or present in a table.