Statistics Flashcards

(120 cards)

1
Q

Define Population:

A

Full set of units that we are interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Sample:

A

A subunit of units that we experiment on or observe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we use a sample?

A

To draw inferences about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why won’t we get the right answer from sampling units?

A

Role of chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is hypothesis testing?

A

Suggesting something is unlikely to be true is rather easier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are steps of formulating a hypothesis testing?

A
  • Formulate a hypothesis
  • Formulate a null hypothesis
  • Calculate the chance that you might see your data if the null hypothesis is true (p value)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is p-values?

A

Probability that you might see something as extreme or more extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do you do if p<0.05 in the old school approach?

A
  • Significant result
  • Reject null hypothesis
  • Accept alternative hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we interpret p-value in the modern approach of continuum of evidence?

  1. 1 =
  2. 05 =
  3. 01 =
  4. 001 =
A
  1. 1 = Weak evidence
  2. 05 = Moderate evidence
  3. 01 = Strong evidence
  4. 001 = Very strong evidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is wrong with the old school approach?

A

Effectively by using strict cut-off we interpret p<0.05 as statistical proof
-Does’t represent how strong the evidence is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the basic types of data?

A
  • Numerical

- Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is numerical data?

A

Any data that can be expressed with numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are two many sub-types of numerical data?

A
  • Continuous

- Count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is continuous data?

A

Can take any value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an example of continuous data?

A

Height
Blood pressure
Time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is count data?

A

Takes only integer values and represents a count of discrete things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an example of count data?

A

Number of time to A&E

Number of children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is categorical data?

A

Things that do not have an inherent numerical value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the main subtypes of categorical data?

A
  • Nominal

- Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is nominal data?

A

Things with inherent order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are examples of nominal data?

A

Eye colour

Blood type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is ordinal data?

A

Things with an inherent order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are examples of ordinal data?

A

Large/Small
Education level
-Age group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is descriptive statistics used for?

A

To describe the data in you sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is inferential statistics used for?
To draw inferences about the population from the sample
26
Summaries categorical data:
- Data can take on 1 of a number of categories - Number of categories is small - Use of table frequency
27
What do frequency tables allow?
To see which category is most common, least common and which categories occur more frequently
28
What is a problem with frequency tables allow?
Can not see immediately what share of sample is contained in each category
29
What can you do to see what share of sample is contained in each category?
Percentages
30
What are types of graphical summary of categorical data?
- Bar charts | - Pie charts
31
What does the height of a bar chart represent?
Number of occurs
32
What does grouping data turn continuous data into?
Categorical data
33
What can you do instead of grouping data?
Plot histograms
34
What is the total area of histogram?
1
35
What equation is used to calculate the density of histograms?
Density = proportion in bin/bin width
36
Is there gaps between bins in histograms?
No
37
What does the height of a bin in a histogram indicate?
Relative frequency of observations
38
What does using density allow histograms to compare?
Different bin widths
39
If a histogram has a heavy tail does it have a high or low kurtosis?
High
40
If a histogram has a low tail does it have a high or low kurtosis?
Low
41
What does location mean?
Defines where data are located in the range of possible values
42
What are the three common measures of the averages used?
- Mean - Mode - Median
43
What is a mean?
Equal to the sum of values divide by the number of values
44
What is a median?
- Rank data in order - Median is the middle number - If even number of data points, no single point so take mean of 2 middle values
45
What is a mode?
Most commonly occurring value
46
What is dispersion?
Technical name for the spread or variability of the data
47
What are the three common measures of spread?
- Standard deviation - Interquartile range - Range
48
What is standard deviation?
Equal to the square root of the mean of the difference between values and the mean squared
49
What is the interquartile range?
Data where 25% of data is above and 25% of data below
50
What is the range?
Simply the smallest and largest values
51
What is an alternative to histograms?
Box and whiskers plots
52
What is good about box and whiskers plots?
Can be easier to compare between groups
53
What are outliers on a box and whisker plot?
More than 1.5 IQRs above the upper quartile
54
What level of skewness dies symmetrical data have?
0
55
What level of kurtosis is normally distributed data??
3
56
Does continuous and count come under numerical or categorical data?
Numerical
57
What are different types of continuous data?
- Blood pressure - BMI - Size of an orange
58
What are different types of count data?
- Number of headaches - Number of people with diabetes - Number of oranges
59
Does nominal and ordinal data come under numerical or categorical data?
Categorical
60
What are different types of nominal data?
- Ethnicity - Blood types - Variety of orange
61
What are different types of ordinal data?
- Disease severity - Satisfaction rating - Orange quality rating
62
What is the standard error on the mean?
Equal to the standard deviation of the sample divided by the square root of the sample size
63
When does standard error on the mean increase?
By increase in standard deviation
64
Why does standard error on the mean increase with increasing standard deviation?
More variability there is in the population the more the uncertainty in our estimate
65
When does the standard error on the mean decrease?
With increasing sample size
66
Why does the standard error on the mean decrease with increasing the sample size?
Bigger the sample size, the more information we have, the more precise our estimate
67
What is a t-test?
T-tests are used to test whether the means in two groups are different from each other -Continuous data
68
What are inferential tests?
Testing whether the difference in our sample reflects a difference in the population
69
Are t-tests weakly or strongly related to to standard error on the mean?
Strongly
70
What does t-tests being strongly related to the standard error on the mean mean?
Standard deviation of the population increases our precision on our estimate is worse as our sample sizes go up the precision on our estimates is better
71
What are the assumptions of t-tests?
- Data in each group are normally distributed in population - Variance (SD) is constant across groups - Data points are independent of each other
72
What does data points are independent of each other mean?
Unrelated
73
How are t-tests assumptions broken?
- Before and after data on the same people - Small sample from the same area - Using same piece of equipment when collecting subsets of our data
74
What is t-tests with unequal variances?
Version of t-tests which assumes unequal, rather than equal variance
75
When would you use a t-tests with unequal variances?
Where standard deviations from two different groups are quite different
76
What are the assumptions for unpaired t-test assuming unequal variance?
- Normally distributed data in each group | - Independent data points
77
What are the assumptions for paired t-tests?
- Normally distributed data in each group | - Constant variance across groups
78
Why do we use ANOVA instead of t-tests?
-More than two groups
79
Why use ANOVA?
Look overall at the data and see if there are any differences by groups, rather than comparing individual groups to each other
80
What is ANOVA?
Analysis of Variance | -Partition the variance in the data to that high can be attributed between groups and that which is left over
81
How do you interpret an ANOVA?
-p-value telling how much evidence there is that there is some variability between groups
82
What is post-hoc pairwise comparisons?
Occur after an overall assessment | Occurs after ANOVA
83
Can you use post-hoc pairwise comparison with t-tests?
No
84
What are ANOVA assumptions?
- Data in each group are normally distributed in the population - Variance (SD) is constant across groups - Data points are independent of each other
85
So if testing data requiring equal variance use?
T-tests
86
If no independent data for t-test use?
Paired t-test
87
If no normally distributed data within groups use?
Mann-Whitney test
88
What is the Mann-Whitney test?
An alternative to a t-test when we have non normally distributed data in each group Comparing continuous data between two groups
89
What is a generalisation of the null hypothesis of t-test?
Two groups have the same mean in the population
90
What is the generalisation of the null of hypothesis of Mann-Whitney test?
If we select one value at random from each group, the value from the first group will be larger than the value from the second group 50% of the time
91
What are non-parametric tests?
Make no assumptions about the form of the data
92
What do you apply where a t-tests is appropriate yielding a larger p-vaule?
Wilcoxon test
93
When do you use a Wilcoxon signed-rank test?
Paired data
94
When can you use a Kruskal Wallis Test?
More than 2 groups
95
What are alternative forms of the Mann-Whitney test for use of non-normally distributed data and are analogous to paired t-tests and ANOVA?
- Wilcoxon signed-rank test | - Kruskal Wallis Test
96
What tests categorical data?
Chi-squared test
97
What are assumptions for chi-squared test?
- Data points are independent - Data are described by the binomial distribution - At least 5 expected counts
98
What are used instead of chi-squared test if a small expected count?
Fishers exact test
99
What test does correlation analyses?
Pearson's correlation coefficient aka rho
100
What are the correlation numbers taking place?
1 to -1
101
What does the different correlation numbers mean? 1 = 0 = -1 =
``` 1 = Perfectly correlated 0 = No correlation -1 = Negatively correlated ```
102
What are the assumptions of correlation analyse?
- Data points are independent - One set of data is normally distributed for any given value of the other with contacts variance - Relationship is linear
103
What is linear regression?
Similar to correlation looks at relationship between two continuous variables First a functional form y = bx+ c
104
What can you get from linear regression?
- p-vaule | - R-squared value
105
What is the r-squared value?
Proportion of variance
106
Assumption of linear regression:
- Data points are independent - Outcome data is normally distributed for any given value of the exposure - Outcome data has a constant variance for all values of the exposure - Relationship is linear
107
Choose a statistical test - 2 continues variables: | If the assumption of normally, constant variance or linear relationship are not met
Spearman's correlation
108
Choose a statistical test - 2 categorical data: | If the assumptions of at least 5 expected counts in each cell is not met
Fisher's exact test
109
Choose a statistical test - 1 contentious and 1 categorical variable: -Binary categorical variable
T-tests
110
Choose a statistical test - 1 contentious and 1 categorical variable: -categorical with more than 2 categories
ANOVA
111
Choose a statistical test - 1 contentious and 1 categorical variable: -Assumption of normality not met for a binary catergorical variable
Mann-Whitney test
112
Choose a statistical test - 1 contentious and 1 categorical variable: -Assumption of normality not met for categorical with more than 2 categories
Kruskal-Wallis test
113
Choose a statistical test - 1 contentious and 1 categorical variable: If constant variance is not met for a binary categorical variable
T-test for unequal variance
114
Choose a statistical test - 1 contentious and 1 categorical variable: If independence is not met for a binary categorical variable
Paired t-test
115
Choose a statistical test - 1 contentious and 1 categorical variable: If independence is not met for categorical with more than 2 categories
Repeated measures ANOVA
116
What do confidence intervals indicate?
Range of plausible values for the thing we are trying to esitmate
117
If a confidence interval includes zero or no zero difference p>0.05?
Zero
118
If a confidence interval includes zero or no zero difference p<0.05?
No zero
119
What can error bars show?
Standard errors Standard deviations Confidence intervals
120
Key information to include with figure legends for graphs?
- Meaning of different symbols - What error bar representing - Provide p-value - State what statistics are used - Describe everything on graphs - Sample size number