STAT Notes Flashcards

1
Q

Define descriptive statistics

A

Methods used to summarize or describe our observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe inferential statistics

A

Using observations as a basis for making estimates or predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What two methods can be used to ensure random sampling is truly random?

A

Mechanical
Blind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define mechanical sampling

A

Assigning every individual in the population a number and randomly generating numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define stratified random sampling

A

Selects characteristics of the sample based on proportion of said characteristics in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define dispersion of data

A

How far it lies from a given average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is sample variance calculated?

A

Σ(difference between each value (xi) and the mean (x̄))^2 ÷ 1(n-1) where n is the number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is standard deviation calculated?

A

√var

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is standard error calculated?

A

sx ÷ √n where n is the number of observations and sx is standard deviation of a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define confidence interval

A

Specific certainty of a predicted population mean with normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What proportion of the population stands within one standard error?

A

68%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What proportion of the population stands within two standard errors?

A

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What proportion of the population stands within three standard errors?

A

99.7%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What function shows perfect normal distribution?

A

Gaussian function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define nominal data

A

Classifies by names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define ordinal data

A

Classified in an order (by categories)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two types of variables?

A

Categorical
Numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two sub categories of categorical data?

A

Ordinal
Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is categorical data referred to in R?

A

factor()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two sub categories of numeric data?

A

Discrete
Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How is discrete data referred to in R?

A

integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How is continuous data referred to in R?

A

numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define skewed distribution

A

A measure of asymmetry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define bimodal distribution

A

There are two modes (can be symmetrical or asymmetrical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Define a bin

A

An area in which data is collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Define central tendency

A

Central values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Define probability

A

Proportion of times a particular outcome will occur from a large sample of trials or the likelihood of a particular outcome of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does a P=0 (probability=0) suggest?

A

Impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does a P=1 (probability=1) suggest?

A

Certainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does it mean if trials are independent?

A

The actions of one have no impact on the results of the next trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the probability of a OR b where they are both mutually exclusive?

A

P(a)+P(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is the sum of the probabilities of mutually exclusive outcomes?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

When we use OR when describing mutually exclusive probabilities how do we combine these values?

A

Add

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a probability distribution?

A

Graphical distribution of theoretical relative probabilities
y=probability, x=potential outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is true about the area of any sections of a probability distribution graph?

A

Equivalent to the relative probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How can we draw probabilities with multiple trials but limited outcomes?

A

Table
Probability tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How do we combine mutually exclusive events using AND?

A

Multiply probabilities together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Define probability distribution

A

Theoretical probability of each outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Define frequency distribution

A

Observed frequency of each outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

After more trials what becomes true about the frequency distribution and probability distribution?

A

Frequency distribution approaches probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

When can we use binomial statistics

A

Can be used when there are two groups (such as A and B or pass and fail)
NOTE: we can create these groups if we define some outcomes as “success” and the others as “failure” and classify other outcomes beneath these banners

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Give examples for which type of questions binomial distribution may be used for

A

Predict the probability of success in a single trial
Predict the proportion of successes in n trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are requirements for binomial statistics?

A
  • 2 outcomes (P(success)=p and P(failure)=q) and p+q=1
  • Each trial is independent with equal p
  • Fixed no. trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

As number of trials increases what becomes true of discrete data?

A

Begins to resemble continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How can we approximate binomial distribution?

A

Probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How can we find probability up to any point (normal distribution)

A

Area under the graph up until that point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Rules for hypothesis testing

A
  • Understand the certainty of a hypothesis test
  • Don’t base scientific decisions on hypothesis tests alone
  • Consider the wider picture and plausibility of results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Which letter denotes significance level?

A

Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Which two hypothesis are needed for a hypothesis test?

A

H0: null hypothesis (no change)
HA: alternative hypotheses (covers all other probability)
These hypotheses must be mutually exclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What do we assume about H0 in a hypothesis test?

A

H0=true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is referred to as the critical region?

A

Areas above the critical value (above the alpha)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

When is the null hypothesis rejected in hypothesis testing?

A

p<alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is a tail?

A

Area at the end of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How do we test both tails?

A

Two-tailed test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

How many critical regions are present in a two tailed test?

A

2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

If alpha=0.05 and a two-tailed test is performed, what % of values lie outside the critical region?

A

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What is a p value?

A

The p value assumes the null hypothesis is true and gives the probability of getting a result that extreme or more assuming this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What is a contingency table?

A

One that shows all possible HA and H0 outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

If H0 is true and we reject it, what is true?

A

False Positive
Type I error
We do not know what is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

If H0 is true and we fail to reject H0 what is true?

A

There is a true negative
H0 is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

If HA is true and we fail to reject H0 what is true?

A

False negative
Type II error
HA was true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

If HA is true and we reject H0 what is true?

A

True positive
H0 is untrue, this does not confirm HA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

If H0 is true, what are the possible outcomes/errors?

A

True negative (H0 is true and we fail to reject H0)
Type I error (H0 is true and we reject H0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

At an alpha value of 0.05, how often would we expect a Type I error, if H0 is true?

A

5% Type I error
(95% true negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What does statistical power tell us about a test?

A

How powerful a test is at detecting true positives when there really is a difference to detect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

When the HA is true, when do we fail to reject the null hypothesis? What type of error is this?

A

When we are outside the critical value (in the direction of the H0)
This is type II error and is shown where the HA graphs overlaps with H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

How is beta defined graphically?

A

The area of overlap between the H0 and HA graphs (where HA is true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How is power of a test calculated?

A

Power=1-beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

If the power of a test is 0.979, how often do you get type II error?

A

2.1% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

If HA is closer to H0, the power is greater or smaller?

A

Smaller
It is more difficult to identify a true error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

If there is a high power, what is true about the error likely?

A

There will be a lower rate of false negatives (type II error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

How can the power of a test be increased graphically?

A

Increase effect size:
Separate the curves to be skinnier
Increase distance between peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What is true of power if effect size is increased?

A

Power increases (less type II error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

How is effect size increased?

A

Increased trials
(decreases curve dispersion)

75
Q

What must be present for hypothesis testing?

A

There must be two hypotheses:
H0 - null hypothesis (no change/ effect)
HA - alternative hypothesis (mutually exclusive and covers all other options (different for one and two-tailed tests))

76
Q

Why is the p value not the probability of a false positive?

A

It is only the probability of a false positive if the alternative hypothesis is true, we can not know if the alternative hypothesis is true we can only speculate based on evidence

77
Q

Define power

A

Proportion of true positives for a particular HA

78
Q

Define Multiple Testing

A

Comparing and testing several conditions or treatments

79
Q

When is a two-sample t test performed?

A

When comparing two samples with each other (i.e.: control and drug)

80
Q

When is a one-sample t-test performed?

A

When comparing a sample to a mean

81
Q

When may a paired t test be performed?

A

When samples are closely replated to one another (such as before and after a treatment)

82
Q

What are the assumptions of a t test?

A

Outcome variable is continuous dependent variable and experimental variable is bivariate independent variable
Normal distribution
Equal Variance

83
Q

What is a bivariate variable?

A

Contains two groups

84
Q

What is a Q-Q plot?

A

A normal quantile-quantile plot compares quantiles of your data to theoretical quantiles for a normal distribution (if these match closely the data is normally distributed)

85
Q

What is the danger of performing many tests?

A

There is an increase in the probability of false positives
(FWER (family-wise error rate))

86
Q

Define FWER

A

Family wise error rate is the probability of getting a false positive if the null hypothesis is true

87
Q

What calculation gives the probability of not getting a false positive in n tests?

A

(1-alpha)^n
in n tests

88
Q

What calculation gives the probability of at least one false positive in n tests?

A

1-(1-alpha)^n

89
Q

Define F-test

A

Compares several samples with each other and compares variance within samples with that between samples

90
Q

What are other names for an F-test?

A

Analysis of variance (ANOVA)

91
Q

Why is an ANOVA done?

A

Compare means with one another to find statistical difference

92
Q

Define overall mean

A

Mean of sample means
(Add all means and divide by number of groups)

93
Q

What are the two types of studies?

A

Observational
Experimental

94
Q

Define observational study

A

Makes observations without intervention

95
Q

Define interventional study

A

A study where an intervention is made to test a hypothesis

96
Q

Define statistical or scientific variable

A

Any relevant condition, characteristic, number or quantity that can be measured, assessed or counted

97
Q

What is another name for independent variable?

A

Explanatory variable

98
Q

What is another name for dependent variable?

A

Response variable

99
Q

Define a confounding variable

A

One that could impact the measurement from your dependent variable in addition to your independent variable

100
Q

Define error

A

The difference between the result for a whole
population and the result from our sample or experiment.

101
Q

What are the two main types of error we can control for?

A

Sampling error
Bias

102
Q

Define sampling error

A

The possibility that the sample is not a perfect representation of the population

103
Q

What distribution is shown by sampling error?

A

Normal (allowing for statistical testing)

104
Q

What are the main techniques for controlling error?

A

Replication
Balance
Blocking

105
Q

Why is replication a method of decreasing sampling error?

A

The more data we collect he more insignificant errors become

106
Q

What are the two types of replicates?

A

Technical
Biological

107
Q

Define technical replicates

A

These are additional measurements or analyses taken from the same sample. They help account for variability introduced by the measurement process itself.

108
Q

Define biological replicates

A

These involve separate samples that are independently manipulated or tested under identical conditions

109
Q

Define Blocking

A

Grouping experimental units with similar properties

110
Q

Define Balance

A

This is the process of comparing groups of similar sizes

111
Q

Define bias

A

Error caused by a systematic difference in the estimation of the sample and the whole population

112
Q

In what stages of an investigation may bias occur?

A

Any
(Design, data collection, analysis, publication etc…)

113
Q

How can bias be controlled for?

A

Simultaneous control groups
Blinding
Randomisation

114
Q

Define a simultaneous control group

A

A group of subjects not exposed to the experimental treatment but are treater the same in all other ways

115
Q

What are the two types of control treatments?

A

Untreated control
Vehicle control

116
Q

What is an untreated control?

A

Subject in it’s native state with no treatment

117
Q

What is a vehicle control?

A

Subject undergoes treatment with everything but the exact thing being tested (e.g.: the drug)

118
Q

What is a best-available therapy control?

A

Testing against a pre-existing drug as opposed to a vehicle control

119
Q

Define a positive control

A

A control which defines what a positive result looks like

120
Q

Define a negative control

A

Result which defines what a negative result looks like

121
Q

Describe blinding

A

The process of obscuring whom has which treatment to limit the placebo effect

122
Q

Define randomisation

A

Assigning random places to random individuals such to not introduce further sampling bias

123
Q

What methods are used to investigate the relationships between 2 continuous variables?

A

Correlation
Regression

124
Q

What may correlation tell us about a relationship?

A

It’s strength and direction

125
Q

What is denoted by “r”?

A

Correlation coefficient

126
Q

What is the range of “r”?

127
Q

What would be the “r” value of a perfect positive linear correlation?

128
Q

What would be the “r” value of a perfect negative linear correlation?

129
Q

What does an “r” value between +/- 0-0.2 suggest?

A

Very weak correlation or negligible between the two variables

130
Q

What does an “r” value between +/- 0.2-0.4 suggest?

A

Weak or low correlation between the two variables

131
Q

What does an “r” value between +/- 0.4-0.7 suggest?

A

Moderate correlation between the two variables

132
Q

What does an “r” value between +/- 0.7-0.9 suggest?

A

Strong, high and marked correlation between the two variables

133
Q

What does an “r” value between +/- 1.0-0.9 suggest?

A

Very strong and very high correlation between the two variables

134
Q

What does the r^2 value tell us?

A

How much of the variation in one variable can be explained by the other

135
Q

In which types of experiments do we compare continuous variables?

A
  1. Looking for an association between variables where neither is experimentally manipulated
  2. Experimentally manipulating one variable and looking to see whether the other variable changes too
136
Q

What can we use to predict the value of a variable when we know it’s correlation to another?

A

Regression

137
Q

What makes a regression prediction more confident?

A

A higher correlation coefficient

138
Q

What is true of values around the line of best fit when there is a strong correlation?

A

There is little variability about the line of best fit

139
Q

When can a y=mx+c regression line be drawn?

A

When there is a linear correlation

140
Q

What is goodness of fit

A

Assessment of how well a linear regression line fits data

141
Q

How can we judge how well a regression equation fits data?

A

Using the r^2 value
Looking at the residuals

142
Q

How is a linear regression drawn?

A

As a straight line through the data points

143
Q

Define fitted value

A

The point (y) a dataset at a given is expected to be seen on a regression line

144
Q

Define the residual

A

The distance between a given point and it’s fitted value

145
Q

How can we use the residual to check the goodness of fit of a linear regression?

A

Plot a residual plot - residual against fitted value - and observe if there are any patterns

146
Q

What does a pattern on a residual plot suggest?

A

A linear equation may not be appropriate for the data presented

147
Q

What does a residual plot look like where the liner relationship was the best possible fit?

A

Plots are evenly scattered about the line on either side with even distribution

148
Q

If a is explained by b, with a known value of b, can we predict a?

A

Yes using the linear regression

149
Q

If a is explained by b, with a known value of a, can we predict b?

A

No, we need to create a regression in the other direction to describe b in terms of a

150
Q

Define questionable research practices (QRPs)

A

Refers to a number of activities, often related to the misinterpretation of statistics, that occur in published scientific work

151
Q

Define cherry picking as a QPR

A

The practice of cherry picking refers broadly to only presenting one side of the story. Specifically in relation to statistics, this translates as choosing not to report parts of your analysis which do not agree with the story you are trying to tell.

This is often used to “tidy up” or create a “convincing” story

152
Q

Define P-hacking as a QPR

A

Ultimately manipulating your data or analysis to result in a significant p value

153
Q

Give examples of P-hacking

A
  • check the statistical significance before deciding whether to collect more data
  • stopping data collection as soon as results reflect those desired
  • excluding data after checking impact on significance
  • adjust models on the basis of whether or not a significant result is obtained without proper justification
  • rounding a p-value to the threshold
  • hidden multiple testing and therefore no p value adjustments
154
Q

Define HARKing as a QPR

A

Hypothesis after results are known is presenting results that have been discovered as if they were expected or as if they were the main study aim (overstating prior knowledge of the study).
Presenting ad hoc or unexpected results in this way is misleading

155
Q

Define ad hoc

A

An unplanned or supplementary analyses conducted to explore specific aspects of data that weren’t the primary focus of the study. This is done on an as-needed basis to investigate particular comparisons or relationships not initially accounted for in the main analysis.

156
Q

Are QPRs evidence of academic misconduct?

A

No, they are questionable but not misconduct

157
Q

What are the two main forms of research misconduct?

A

Fabrication and falsification

158
Q

Define fabrication

A

Making up data or results

159
Q

Define falsification

A

The manipulation of research materials, data or results

162
Q

What are the assumptions of an ANOVA?

A

Data needs to be normally distributed

Data should be from independent observations, which means that there is no relationship between the observations in each group or between the groups themselves.

Equal variances between groups (Homogeneity of variances, Homoscedasticity)

163
Q

Define homoscedastic

A

The fundamental assumption that the variance of the errors (or residuals) should be constant across all levels of the independent variable(s)

(Violated homoscedasticity is known as heteroscedasticity)

164
Q

Define homogeneity

A

Refers to the similarity or uniformity of certain characteristics within a group or between groups.

165
Q

When doing an ANOVA how do you find the degrees of freedom (DF) between groups?

A

K-1
Where K is the number of groups being compared

166
Q

When doing an ANOVA how do you find the degrees of freedom (DF) within groups?

A

N-K
Where K is the number of groups being compared and N is the total number of observations/data points collected.

167
Q

Define the sum of squares

A

Quantifies variability between the groups of interest and within groups of interest in separate rows

168
Q

What is the overall sum of sqaures?

A

The square of the difference between each datapoint and the overall mean, also called SST, for sum of squares (total).

169
Q

Define SSW

A

The sum of squares within the groups is defined as the square of the difference between each datapoint and the mean of the group it belongs to. This shows the variation among each single groups.

170
Q

Define SSB

A

The sum of squares within the groups is defined as the square of the difference between each mean of the groups and the overall mean for each datapoint. This shows the variation among between the groups.

171
Q

What is the maximum value of a datapoint before it is considered an outlier

A

Q3+1.5 IQR

172
Q

What is the minimum value of a datapoint before it is considered an outlier

A

Q1-1.5 IQR

173
Q

What is the nature of a binomial distribution?

A

The binomial distribution is discrete, dealing with the number of successes in a fixed number of trials.

174
Q

What is the nature of a normal distribution?

A

The normal distribution is continuous and is often associated with the distribution of measurements in a population.

175
Q

Which parameters are common in binomial distribution?

A

The binomial distribution is characterized by the number of trials (n) and the probability of success (p).

176
Q

Which parameters are common in normal distribution?

A

The normal distribution is characterized by the mean (μ) and standard deviation (σ).

177
Q

What must be born in mind when calculating the critical value for a 2 tailed test?

A

Use 1-(alpha/2) at each end

178
Q

What is the difference in the analysis of variance between a boxplot and an ANOVA?

A

A boxplot is a qualitative analysis whilst an ANOVA is quantitative

179
Q

What is the Mean sq. and how is it calculated (ANOVA)?

A

ANOVA output
This is a variance estimate and what is used to calculate the F-statistic, the next column.
Calculated by taking the Sum of Squares divided by DF on the same row

180
Q

What is an F-statistic (ANOVA) and how is it calculated?

A

This is defined as the ratio between the Mean Squares between and within.
Calculated by Mean squares of row 1/mean squares of row 2.

181
Q

How do we use an F-statistic (ANOVA)?

A

If it is below a threshold value, the NULL hypothesis can be rejected

182
Q

What does a high F-statistic mean (ANOVA)?

A

More likely to be a statistically relevant difference between groups.

183
Q

How do we report an ANOVA?

A

F(dfbetween, dfwithin) = F Statistic, p =

184
Q

What test is used to determine where the differences between two groups lies?

A

post-hoc tests such as the Tukey Honest Significance test