Statistics Flashcards

1
Q

Define Population:

A

Full set of units that we are interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Sample:

A

A subunit of units that we experiment on or observe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we use a sample?

A

To draw inferences about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why won’t we get the right answer from sampling units?

A

Role of chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is hypothesis testing?

A

Suggesting something is unlikely to be true is rather easier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are steps of formulating a hypothesis testing?

A
  • Formulate a hypothesis
  • Formulate a null hypothesis
  • Calculate the chance that you might see your data if the null hypothesis is true (p value)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is p-values?

A

Probability that you might see something as extreme or more extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do you do if p<0.05 in the old school approach?

A
  • Significant result
  • Reject null hypothesis
  • Accept alternative hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we interpret p-value in the modern approach of continuum of evidence?

  1. 1 =
  2. 05 =
  3. 01 =
  4. 001 =
A
  1. 1 = Weak evidence
  2. 05 = Moderate evidence
  3. 01 = Strong evidence
  4. 001 = Very strong evidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is wrong with the old school approach?

A

Effectively by using strict cut-off we interpret p<0.05 as statistical proof
-Does’t represent how strong the evidence is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the basic types of data?

A
  • Numerical

- Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is numerical data?

A

Any data that can be expressed with numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are two many sub-types of numerical data?

A
  • Continuous

- Count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is continuous data?

A

Can take any value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an example of continuous data?

A

Height
Blood pressure
Time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is count data?

A

Takes only integer values and represents a count of discrete things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an example of count data?

A

Number of time to A&E

Number of children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is categorical data?

A

Things that do not have an inherent numerical value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the main subtypes of categorical data?

A
  • Nominal

- Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is nominal data?

A

Things with inherent order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are examples of nominal data?

A

Eye colour

Blood type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is ordinal data?

A

Things with an inherent order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are examples of ordinal data?

A

Large/Small
Education level
-Age group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is descriptive statistics used for?

A

To describe the data in you sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is inferential statistics used for?

A

To draw inferences about the population from the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Summaries categorical data:

A
  • Data can take on 1 of a number of categories
  • Number of categories is small
  • Use of table frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What do frequency tables allow?

A

To see which category is most common, least common and which categories occur more frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a problem with frequency tables allow?

A

Can not see immediately what share of sample is contained in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What can you do to see what share of sample is contained in each category?

A

Percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are types of graphical summary of categorical data?

A
  • Bar charts

- Pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does the height of a bar chart represent?

A

Number of occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does grouping data turn continuous data into?

A

Categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What can you do instead of grouping data?

A

Plot histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the total area of histogram?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What equation is used to calculate the density of histograms?

A

Density = proportion in bin/bin width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Is there gaps between bins in histograms?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does the height of a bin in a histogram indicate?

A

Relative frequency of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What does using density allow histograms to compare?

A

Different bin widths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

If a histogram has a heavy tail does it have a high or low kurtosis?

A

High

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

If a histogram has a low tail does it have a high or low kurtosis?

A

Low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What does location mean?

A

Defines where data are located in the range of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are the three common measures of the averages used?

A
  • Mean
  • Mode
  • Median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is a mean?

A

Equal to the sum of values divide by the number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is a median?

A
  • Rank data in order
  • Median is the middle number
  • If even number of data points, no single point so take mean of 2 middle values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is a mode?

A

Most commonly occurring value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is dispersion?

A

Technical name for the spread or variability of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What are the three common measures of spread?

A
  • Standard deviation
  • Interquartile range
  • Range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is standard deviation?

A

Equal to the square root of the mean of the difference between values and the mean squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is the interquartile range?

A

Data where 25% of data is above and 25% of data below

50
Q

What is the range?

A

Simply the smallest and largest values

51
Q

What is an alternative to histograms?

A

Box and whiskers plots

52
Q

What is good about box and whiskers plots?

A

Can be easier to compare between groups

53
Q

What are outliers on a box and whisker plot?

A

More than 1.5 IQRs above the upper quartile

54
Q

What level of skewness dies symmetrical data have?

A

0

55
Q

What level of kurtosis is normally distributed data??

A

3

56
Q

Does continuous and count come under numerical or categorical data?

A

Numerical

57
Q

What are different types of continuous data?

A
  • Blood pressure
  • BMI
  • Size of an orange
58
Q

What are different types of count data?

A
  • Number of headaches
  • Number of people with diabetes
  • Number of oranges
59
Q

Does nominal and ordinal data come under numerical or categorical data?

A

Categorical

60
Q

What are different types of nominal data?

A
  • Ethnicity
  • Blood types
  • Variety of orange
61
Q

What are different types of ordinal data?

A
  • Disease severity
  • Satisfaction rating
  • Orange quality rating
62
Q

What is the standard error on the mean?

A

Equal to the standard deviation of the sample divided by the square root of the sample size

63
Q

When does standard error on the mean increase?

A

By increase in standard deviation

64
Q

Why does standard error on the mean increase with increasing standard deviation?

A

More variability there is in the population the more the uncertainty in our estimate

65
Q

When does the standard error on the mean decrease?

A

With increasing sample size

66
Q

Why does the standard error on the mean decrease with increasing the sample size?

A

Bigger the sample size, the more information we have, the more precise our estimate

67
Q

What is a t-test?

A

T-tests are used to test whether the means in two groups are different from each other
-Continuous data

68
Q

What are inferential tests?

A

Testing whether the difference in our sample reflects a difference in the population

69
Q

Are t-tests weakly or strongly related to to standard error on the mean?

A

Strongly

70
Q

What does t-tests being strongly related to the standard error on the mean mean?

A

Standard deviation of the population increases our precision on our estimate is worse as our sample sizes go up the precision on our estimates is better

71
Q

What are the assumptions of t-tests?

A
  • Data in each group are normally distributed in population
  • Variance (SD) is constant across groups
  • Data points are independent of each other
72
Q

What does data points are independent of each other mean?

A

Unrelated

73
Q

How are t-tests assumptions broken?

A
  • Before and after data on the same people
  • Small sample from the same area
  • Using same piece of equipment when collecting subsets of our data
74
Q

What is t-tests with unequal variances?

A

Version of t-tests which assumes unequal, rather than equal variance

75
Q

When would you use a t-tests with unequal variances?

A

Where standard deviations from two different groups are quite different

76
Q

What are the assumptions for unpaired t-test assuming unequal variance?

A
  • Normally distributed data in each group

- Independent data points

77
Q

What are the assumptions for paired t-tests?

A
  • Normally distributed data in each group

- Constant variance across groups

78
Q

Why do we use ANOVA instead of t-tests?

A

-More than two groups

79
Q

Why use ANOVA?

A

Look overall at the data and see if there are any differences by groups, rather than comparing individual groups to each other

80
Q

What is ANOVA?

A

Analysis of Variance

-Partition the variance in the data to that high can be attributed between groups and that which is left over

81
Q

How do you interpret an ANOVA?

A

-p-value telling how much evidence there is that there is some variability between groups

82
Q

What is post-hoc pairwise comparisons?

A

Occur after an overall assessment

Occurs after ANOVA

83
Q

Can you use post-hoc pairwise comparison with t-tests?

A

No

84
Q

What are ANOVA assumptions?

A
  • Data in each group are normally distributed in the population
  • Variance (SD) is constant across groups
  • Data points are independent of each other
85
Q

So if testing data requiring equal variance use?

A

T-tests

86
Q

If no independent data for t-test use?

A

Paired t-test

87
Q

If no normally distributed data within groups use?

A

Mann-Whitney test

88
Q

What is the Mann-Whitney test?

A

An alternative to a t-test when we have non normally distributed data in each group
Comparing continuous data between two groups

89
Q

What is a generalisation of the null hypothesis of t-test?

A

Two groups have the same mean in the population

90
Q

What is the generalisation of the null of hypothesis of Mann-Whitney test?

A

If we select one value at random from each group, the value from the first group will be larger than the value from the second group 50% of the time

91
Q

What are non-parametric tests?

A

Make no assumptions about the form of the data

92
Q

What do you apply where a t-tests is appropriate yielding a larger p-vaule?

A

Wilcoxon test

93
Q

When do you use a Wilcoxon signed-rank test?

A

Paired data

94
Q

When can you use a Kruskal Wallis Test?

A

More than 2 groups

95
Q

What are alternative forms of the Mann-Whitney test for use of non-normally distributed data and are analogous to paired t-tests and ANOVA?

A
  • Wilcoxon signed-rank test

- Kruskal Wallis Test

96
Q

What tests categorical data?

A

Chi-squared test

97
Q

What are assumptions for chi-squared test?

A
  • Data points are independent
  • Data are described by the binomial distribution
  • At least 5 expected counts
98
Q

What are used instead of chi-squared test if a small expected count?

A

Fishers exact test

99
Q

What test does correlation analyses?

A

Pearson’s correlation coefficient aka rho

100
Q

What are the correlation numbers taking place?

A

1 to -1

101
Q

What does the different correlation numbers mean?
1 =
0 =
-1 =

A
1 = Perfectly correlated
0 = No correlation
-1 = Negatively correlated
102
Q

What are the assumptions of correlation analyse?

A
  • Data points are independent
  • One set of data is normally distributed for any given value of the other with contacts variance
  • Relationship is linear
103
Q

What is linear regression?

A

Similar to correlation looks at relationship between two continuous variables
First a functional form
y = bx+ c

104
Q

What can you get from linear regression?

A
  • p-vaule

- R-squared value

105
Q

What is the r-squared value?

A

Proportion of variance

106
Q

Assumption of linear regression:

A
  • Data points are independent
  • Outcome data is normally distributed for any given value of the exposure
  • Outcome data has a constant variance for all values of the exposure
  • Relationship is linear
107
Q

Choose a statistical test - 2 continues variables:

If the assumption of normally, constant variance or linear relationship are not met

A

Spearman’s correlation

108
Q

Choose a statistical test - 2 categorical data:

If the assumptions of at least 5 expected counts in each cell is not met

A

Fisher’s exact test

109
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
-Binary categorical variable

A

T-tests

110
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
-categorical with more than 2 categories

A

ANOVA

111
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
-Assumption of normality not met for a binary catergorical variable

A

Mann-Whitney test

112
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
-Assumption of normality not met for categorical with more than 2 categories

A

Kruskal-Wallis test

113
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
If constant variance is not met for a binary categorical variable

A

T-test for unequal variance

114
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
If independence is not met for a binary categorical variable

A

Paired t-test

115
Q

Choose a statistical test - 1 contentious and 1 categorical variable:
If independence is not met for categorical with more than 2 categories

A

Repeated measures ANOVA

116
Q

What do confidence intervals indicate?

A

Range of plausible values for the thing we are trying to esitmate

117
Q

If a confidence interval includes zero or no zero difference p>0.05?

A

Zero

118
Q

If a confidence interval includes zero or no zero difference p<0.05?

A

No zero

119
Q

What can error bars show?

A

Standard errors
Standard deviations
Confidence intervals

120
Q

Key information to include with figure legends for graphs?

A
  • Meaning of different symbols
  • What error bar representing
  • Provide p-value
  • State what statistics are used
  • Describe everything on graphs
  • Sample size number