All stats Flashcards

1
Q

Name the 2 broad categories data can be split into

A
  1. Categorical

2. Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can categorical data be split into?

A
  1. Binary
  2. Nominal
  3. Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can quantitative data be split into?

A
  1. Discrete

2. Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is binary data?

A

Data split into 2 categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example of binary data

A

Success/ failure

Yes/ No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is nominal data

A

More than 2 categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give an example of nominal data

A

Eye colour
Hair colour
Hair type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ordinal data

A

Ordered data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give an example of ordinal data

A

Happiness rating on a scale of 1-10

Customer server rating of 1-5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is discrete data

A

Data in the form of numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give examples of discrete data

A
  1. Number of kids

2. Movie rating in stars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is continuous data

A

Uninterrupted data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give examples of continuous data

A

Height
Time
Weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name the best way to represent categorical data

A

In a bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name the best way to represent continuous data

A

Histogram or box plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define skewness

A

Skewness is a measure of probability distribution around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name the 3 ways be describe skewness

A
  1. Left skew
  2. Symmetrical
  3. Right skew
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe the relationship between median and mean in a data set that is left skewed

A

Mean < median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe the relationship between median and mean in a data set that is right skewed

A

Mean > median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is central tendency

A

Measures of specific points in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Give examples of central tendency measures

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are variation measures?

A

Measures of spread of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Give examples of variation measures

A
  1. Variance

2. Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the standard deviation

A

A measure of the average scatter around the mean

greater the spread of data greater the SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is normal distribution used to describe?

A

Used to describe continuous data that forms a bell shaped symmetrical curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a key characteristic of normally distributed data

A

Mean, median and mode are all equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What symbol to we give to represent the mean?

A

μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What symbol to we give to represent the SD

A

σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Give examples of data that could be normally distributed

A
Height 
Ade
Weight 
Bone density 
Exam scores 
BP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How do we check for normality

A
  1. Look at the histogram does it appear bell shaped
  2. Are mean, median and mode similar
  3. Do 2/3rds of the data lie within 1 sd from the mean
  4. Run numerical tests of normality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Describe a Q-Q plot for normally distributed data

A
  1. Follows a straight line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Give examples of numerical tests we can use to assess normality

A
  1. Kolmogorov-Smirnov

2. Shapiro Wilk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What requirements must a qualitative data set fulfil before we can calcite a central limit theorem for it?

A

Sample size must be larger than 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does μ+σ mean and what does it determine on a curve for normally distributed data?

A

mean+standard deviation

Determines the shape of the curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What does μ mean and what does it determine on a curve for normally distributed data?

A

μ is the mean and it determines the line of symmetry on a bell curve

36
Q

What does σ mean and what does it determine on a curve for normally distributed data?

A

σ Is the standard deviation and it determines the spread of data around the mean

37
Q

What does the empirical rule state?

A

All curves are standardised where:
μ= 0
σ= 1

38
Q

How much of the populations represented 1 standard deviation +/-mean

A

68%

39
Q

How much of the populations represented 2 standard deviation +/-mean

A

95%

40
Q

How much of the populations represented 3 standard deviation +/-mean

A

99.7%

41
Q

Define population

A

A group of all items of interests

42
Q

Define sample

A

A set of data drawn from the population

43
Q

Define parameter

A

A descriptive measure of a population

44
Q

Define statistics

A

A descriptive measure of a sample

45
Q

What is inferential statistics

A

Drawing conclusions. inferences about characteristics of a population based on SAMPLE data

46
Q

What is descriptive statistics

A

Is using data to provide descriptions of the population through numerical calculations or graphs or tables

47
Q

What is a statistical inference?

A

Is the process of making an estimate, prediction or decision about a population based on the data from a sample

48
Q

What is standard error?

A

The standard deviation of the sample mean

49
Q

How do we calculate confidence interval

A

Sample statistic +/- measure of how confident we want to be (1.96)*SE

50
Q

What does the sample statistic equal

A

The sample mean

51
Q

What do we mean when we say we are 95% confident

A

We are 95% confident that our true population mean lies in this bar

52
Q

What is hypothesis testing

A

Testing whether the difference in values obtained is significant or not

53
Q

Talk through the steps of hypothesis testing

A
  1. Decide statistical question
  2. Assume the null hypothesis
  3. , Predict the sampling variability assuming the null hypothesis
  4. Do the experiment
  5. Calculate the p value
  6. Hypothesis test
54
Q

When do we accept our null hypothesis

A

If the p value is greater than 0.05 (p>0.05)

There is no association between the 2 factors

55
Q

When do we reject our null hypothesis

A

If the p value is LESS than 0.05 (p<0.05)

There IS an association between the 2 factors

56
Q

What gives us more information hypothesis test or confidence interval?

A

Confidence interval

57
Q

What does a confidence interval overlapping with zero indicate

A

There is no difference and therefore we reject the null hypothesis

58
Q

What is a type I error

A

When you reject the null hypothesis when it it true (false positive)

59
Q

What is a type II error

A

When you accept the null hypothesis when it was false (false negative)

60
Q

What does power mean in term of statistics?

A

The probability of finding a difference in 2 groups if one truly exists (the probability of NOT making a type II error)

61
Q

Do want our study to have a high or low power?

A

High power (at least 0.8/80%)

62
Q

List some factors that affect the power

A
  1. Size of effect
  2. Standard deviation
  3. Sample size
  4. Significance level
63
Q

How does size of effect affect the power of our study

A

A larger difference in observed values will increase the power as values are further from 0

64
Q

How does standard deviation affect the power of our study

A

A larger SD decreases the power as it means more variability meaning a shallower curve

65
Q

How does sample size affect the power of our study

A

A larger sample size increases the power as it narrows the curves so less of the observed data is likely to fall within “rejection” region

66
Q

How does significance level affect the power of our study

A

Increasing significance level decreases the power

67
Q

What is correlation?

A

Describes the relationship between two variables

68
Q

What is regression

A

Regards one variable as the predicted and one as the outcome

69
Q

What is the ‘predictor varibale’

A

Independent variable

70
Q

What is the ‘outcome variable’?

A

Dependant variable

71
Q

What assumptions do we make when looking at regression

A
  1. Y is normally distributed at each normal value of X

2. The variance of Y at every value of X is the same (

72
Q

How do we calculate the residual of a data set

A

observed value-predicted value

73
Q

How do we calculate the observed value when calculating regression?

A

We extrapolate data from a linear graph

74
Q

What formula does a linear graph follow

A

y=mx+c

75
Q

List some functions of multivariate analysis

A
  1. Control for cofounders
  2. Test for interactions between predictors
  3. Improve predictions
76
Q

Define risk ratio

A

Rate of condition in exposed: rate of condition in no exposed

77
Q

When are risk ratios used

A

WUsed for categorical data

78
Q

What is an odds ration

A

Odds of event occurring in a treatment group: odds of event occurring in a control group

79
Q

What does an odd ratio of 1 mean

A

No difference between control and treatment group

80
Q

What does an odds ration of not 1 mean

A

There is an association between the groups

81
Q

What is survival analysis?

A

A statistical method for analysing longitudinal data on occurrence of events

82
Q

Name the curve commonly used to describe survivorship of study populations

A

The Kaplan Meier curve

83
Q

What does a correlation co efficient of -1 mean

A

Negative relationship as the x variable increases y decreases

84
Q

What does a correlation co efficient of +1 mean

A

Positive relationship as the x variable increases y increases

85
Q

What does a correlation co efficient of 0mean

A

no association as x increases y stays the same (straight line on a graph)