Statistics Flashcards

1
Q

Describe the different types of data

A
  1. Quantitative - A measurement which can either be discrete or continuous (Discrete are whole numbers eg counts where as continuous measurements take any value eg height)
  2. Qualitative - When objects are classified into groups and this can be either ordinal or nominal (In ordinal there is a numerical relationship between the groups whereas in nominal there is so order to the groups. Categorical data with only two values is binary.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a stratified sample?

A

Used in a study where certain categories need to be represented. The population is divided into strata and a random sample is chosen from each of these.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What type of data do pie charts and bar graphs usually represent?

A

Categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which graphs are used to visualise the distribution of continuous data?

A

Histograms
Stem and leaf plots
Box and whisker plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are scatter plots used to visualise?

A

The relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is there no gaps between the bars on a histogram?

A

The data they represent is continuous, whereas is bar charts it is categorical or discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the total area of the colums equal to in a relative frequency histogram?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do scatter plots represent?

A

The relationship between two quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate the strength of the relationship between the two variables in a scatter plot?

A

Calculating the coefficient of correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the line fitted to a scatter plot called?

A

Regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is the mean not useful in skewed data? What would be a better estimate of this?

A

It is very sensitive to outliers. The median is not sensitive to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the adjustment that must be made when calculating sample variance to make an unbiased estimate of population value?

A

Denominator must be n-1, not n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a larger standard deviation tell you about the spread of the data?

A

Large SD = Wide spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does standard deviation measure?

A

The spread of the data around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does positively skewed mean?

A

Most values lie towards the bottom end of the range with a tail to the right (larger end of the range)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give two measurements in healthcare that are most often positively skewed?

A

Units of alcohol drunk or number of cigarettes smoked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does negatively skewed mean?

A

Most values lie towards the upper end of the range with a tail to the left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If you get a coefficient of skewness of 0 what does this mean?

A

Data distribution is symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If you get a coefficient of skewness of 1 what does this mean?

A

Positive skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If you get a coefficient of skewness of -1 what does this mean?

A

Negative skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When do you use the normal distribution?

A

Continuous variables such as lengths, heights and weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When do you use the binomial distribution?

A

Binary data such as alive and dead, male and female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When do you use the poisson distribution?

A

Rare events and events occurring at random intervals of time and space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the characteristics of the normal distribution?

A
  • Bell shaped
  • Single central peak
  • Symmetrical
  • Equal mean, median and mode
  • Continuous
  • Takes values between -ve infinity and + infinity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the mean and standard deviation of the standard normal distribution?

A

Mean 0

Standard Deviation 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Describe how you would standardise any normally distributed variable?

A

Subtracting the mean and dividing by the standard deviation:

((Any of the data values) - Mean) / Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Why do we standardise normal data?

A
  • To allow us to compare data
  • To perform more advanced statistical tests
  • If 0 is in the centre the centile are easier to calculate
  • There is only one table of probabilities for normal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the area used the normal probability density function curve?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you calculate the 95% reference ranges of a set of normally distributed data?

A

mean +/- 1.96 x Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

If a population is believed to have a normal distribution with a mean (û) and a KNOWN standard deviation (õ) then where are 95% of the data values expected to lie?

A

Mean +/- 1.96 x Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What formal statistical test measured how close the data is to normal distribution?

A

Shapiro Wilk statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Give examples of ways of transforming data and in what circumstance you would use each one?

A
  1. Logarithmic: Variances are proportional to the mean, fairly skewed data
  2. Square root: Fairly skewed, counts
  3. Reciprocal: Highly skewed data
  4. Cube transformation: Data relating to volumes
  5. Logit: Proportions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you make standard deviation of the sample unbiased compared to the population standard deviation?

A

Calculate it with denominater n - 1 not n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a confidence interval?

A

The range we would expect, given a certain level of confidence, to include the population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is wider - at 95% or 99% confidence interval?

A

99%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is standard error?

A

The standard deviation around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How do you calculate standard error?

A

Standard deviation/ (square root of number of items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

If you assume that the sample mean is approx normally distributed then where would you expect 95% of samples in the population to lie?

A

Sample mean +/- standard error of sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

When do you use the T distribution?

A

When estimating the mean in normally distributed populations when the sample size is small and the population standard deviation is unknown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Why do we do a hypothesis test?

A

To assess the validity of a claim about a population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

In t distribution at what value of the test statistic do you reject the null hypothesis?

A

Over 1.96 (+ve or -ve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a type 1 error?

A

Rejecting a true null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is a type 2 error?

A

Accepting a false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What does the level of significance mean in a study in relation to errors?

A

The level of significance = the probability of making a Type 1 error. Usually this is set at 5% (95% confidence level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is the general accepted risk of making a type 2 error?

A

0.2 (20%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How do you reduce the risk of a type 2 error?

A

Increasing sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is meant by a test statistic?

A

A measure of the difference between what is expected if the null hypothesis were true and what is observed.

48
Q

What is the general formula of a test statistic?

A

(Observed value - Expected value) / Standard Error

49
Q

What does the z statistic measure?

A

Normal distribution for mean

50
Q

What does the t statistic measure?

A

T test for mean

51
Q

What does the f statistic measure?

A

Variance

52
Q

What does the x^2 statistic measure?

A

Chi squared test for proportions

53
Q

What is a decision rule?

A

A statement of the conditions (value of the test statistic) for which the null hypothesis will be rejected.

54
Q

If the Z statistic is 2.3 what do we conclude?

A

Reject the Null hypothesis as 2.3 is higher than 1.96

55
Q

What is the difference between inferences made in the normal distribution as opposed to the T distribution.

A

The critical values depend on the sample size

56
Q

What happens as the sample size increases in a t distribution?

A

The critical value approaches that of normality

57
Q

What is welch’s T test?

A

An approximation to a T test when samples are known to have arisen from normal distributions with unequal variances.

58
Q

When is the welch’s T test used?

A

To compare samples with unequal variances

59
Q

What is the F statistic and what does it measure?

A

It is the ratios of the variances of the sample and is used as part of the independent T test.

60
Q

What is s squared (used in the independent T test) a measure of?

A

An estimation of the population variance

61
Q

When can inferences about the number of successes in a binomial trial, and the population proportion be based on normal distribution?

A

When np and n(1-p) are greater than 5

62
Q

What equation would you use to get the confidence interval for % success in a binomial trial? E.g. “A new drug claims 70% success rate. 200 patients take part in a study to test this. What is the 99% confidence interval for the % success?) Carry out this calculation.

A
  1. The Confidence limits are p +/-2.58 x standard error
  2. P = 0.7 (70% success rate)
  3. Standard error = √(p(1-p))/n
  4. Answer = 0.62 - 0.78
63
Q

A new treatment claims a 70% success rate. In a group of 200 patients 128 show improvement. Does this data support the claim of 70% effectiveness? (Use the number of successes formula)

A

Steps involved:
1. Binomial situation with n = 200 and p = 0.64 (128/200) and 1-p = 0.36
2. np = 128, n(1-p) = 72. Since both of these are over 5 we can use the normal distribution.
3. To test the hypothesis that the treatment is 70% effective we use a test of the mean number of successes being 140
4. Null Hypothesis = Mean number of successes = 140
Alternate hypothesis = Mean number of successes is not = 140.
5. 5% significance level.
6. Test statistic z = (np - X)/ (√(np (1 - p) where X is the population/reference number of successes.
7. (128 - 140)/ (√(6.79) = -1.77
8. Less that 1.96 therefore we accept the Null Hypothesis that the population mean = 140 and so the success rate = 70%

64
Q

A chiropractor claims that 80% of his clients show relief from back pain. A randoms sample of 40 patients shows that 76% obtained pain relief. Is the chiropractors claim false at the 5% significance rate? (Use the proportions of successes formula)

A
  1. Use the formula (p - p0)/ √(np(1-p))/n
  2. Answer -0.59.
  3. Accept Null Hypothesis.
65
Q

What is meant by regression analysis?

A

Gives us information about the nature of the relationship eg linear. This enables predictions to be made.

66
Q

What is correlation?

A

The extent of the association between two variables

67
Q

What do we use to measure the strength of the linear relationship between two variables?

A

Correlation coefficient

68
Q

What is the most commonly used measure of association between variables?

A

Pearsons correlation coefficient (r)

69
Q

If the pearsons correlation coefficient is 0.96 what can you say about the relationship between the two variables?

A

Very strong positive correlation

70
Q

What will happen to the value of r as the sample size increase?

A

R will also increase

71
Q

What conditions must be met in order to use the r variable?

A
  1. At least one variable must be normally distributed.
  2. They have been measured on a random sample
  3. The pairs of variables are independant
72
Q

What does r squared tell you?

A

It measures the proportion of the variation in the dependant variable (y) which is attributable to its linear relationship with variable x.

73
Q

If a scatter plot is approximately elliptical shaped what does it mean?

A

That both variables are approximately normally distributed

74
Q

What is the general equation for the regression line? What do the letters mean?

A
y = a + bx
a = Intercept (value of y when x = 0) 
b = Slope (Change in y when x increases by one unit)
75
Q

What assumptions must be met for regression methods to be used?

A
  1. Correlation between x and y is significant
  2. For each value of the x variable the values of the y variable has a normal distibution
  3. The variances of the normal distributions are equal
76
Q

What is the null hypothesis for the shapiro wilk statistic?

A

That the data is plausibly normally distributed

77
Q

What exactly does the shapiro wilk statistic measure?

A

A measure of linearity between points in a normal plot

78
Q

If the shapiro wilk statistic is closer to 1 what does this mean?

A

Likely to be normally distributed.

79
Q

Which is more powerful: Observational or experimental studies?

A

Experiemental

80
Q

What kind of study is it if each subject is only observed once?

A

Cross sectional

81
Q

What is a parallel group design study?

A

Treatment and control groups are being measured at the same time.

82
Q

What is the name given to a parallel group trial which continues until a difference between the treatment and control groups becomes apparent?

A

Sequential trial

83
Q

What is a crossover design trial?

A

A study where each subject acts as their own control

84
Q

What must be taken into consideration in a crossover trial?

A

That there is no carry on effect from one treatment to another.

85
Q

What does case controlled mean in a study?

A

When each individual subject who is receiving the treatment is matched for factors critical to the outcome with a subject in the control group.

86
Q

What is a cohort trial?

A

A group of subjects who are initially disease free are followed over time. They are likely to be exposed to a range of factors, which will be noted and the information used to establish risk factors for the disease

87
Q

List some problems with cohort trials?

A
  • Large, costly and take many years to deliver results.

- Unsuitable for rare diseases

88
Q

What is a case control study?

A

A reterospective study of diseased subjects. The range of factors to which they have been exposed are reviewed to establish risk factors for a disease.

89
Q

List some advantages of a case control study?

A

Cheap

Quick

90
Q

List some problems with case control studies?

A

Prone to bias eg patients who have the disease are more likely to have thought about risk factors and hence remember them better.

91
Q

What is a quasi experimental study?

A

A non randomised study and therefore very susceptible to bias. Researcher decided which groups to put people in. Conclusions drawn are usually limited.

92
Q

If crossover trials are more powerful than parallel group trials what does this mean about the number of subjects required?

A

Less subjects required in a crossover trial.

93
Q

The main weakness of a crossover trial is the danger of one treatment having a carry over effect. Give two ways in which this can be overcome?

A
  1. Having a wash out period between successive treatments.

2. Randomising the order of allocation of treatments

94
Q

Describe the difference between point and interval estimation testing?

A
  1. Point estimates of population parameters are derived from the corresponding sample parameters e.g. the mean, SD, proportion of successes. So if the sample found 75% of people to be in favour of something this is the point estimate for the proportion of the population which are in favour.
  2. Interval testing is the interval in which we would expect, given a certain level of confidence, the population parameter to lie. The upper and lower values for the confidence intervals are the confidence limits.
95
Q

In relation to skew when would mean and median be appropriate?

A

Normal distribution = Mean is best

Skewed = Median is best

96
Q

When is non parametric testing of the average of a single sample indicated?

A
  1. The sample is small and the data not plausibly normal.

2. Transformations to address the problem cannot be found.

97
Q

What two tests are used to test hypothesis of non parametric data?

A

Sign Test

Wilcoxon signed rank test

98
Q

What is the sign test based on?

A

The number of observations above and below the hypothesised median. If the null hypothesis is true then these should be equal because the median is a middle observation.

99
Q

What is the wilcoxon signed rank test based on?

A

Based on ranks and therefore incorporates some measure of the actual values

100
Q

What is meant by the p value?

A

The probability that the statistical summary (such as the sample mean difference between two compared groups) would be the same as or more extreme than the actual observed results. When a P value is less than or equal to the significance level, you reject the null hypothesis

101
Q

What is the non parametric equivalant of the pearsons product moment correlation?

A

Spearman Rank correlation

102
Q

What is the non parametric equivalant of the paired sample T test?

A

Wilcoxon paired test

103
Q

What is the non parametric equivalant of the single sample t test?

A

Sign test

104
Q

What is the non parametric equivalant of the independant samples T test?

A

Mann Whitney Test

105
Q

If a questions asks you to summarise the data in a normally distributed variable what tests would you do?

A

Mean

Standard Deviation

106
Q

If a questions asks you to summarise the data in non parametric data what tests would you do?

A

Median (better for skewed data)

Inter quartile range (tells you of the variation around the median)

107
Q

If a questions asks you to compare the sample against population value and the data was normally distributed, what test would you do?

A

z/ One sample T test

108
Q

If a questions asks you to compare the sample against population value and the data was not normally distributed, what test would you do?

A

Wicoxon/Sign test

109
Q

If a questions asks you to compare two independant groups and the data was normally distributed, what test would you do?

A

z/ Unpaired T test

110
Q

If a questions asks you to compare two dependant groups and the data was normally distributed, what test would you do?

A

Paired z/T test

111
Q

If a questions asks you to compare two independant groups and the data was not normally distributed, what test would you do?

A

Mann Whitney

112
Q

If a questions asks you to compare two dependant groups and the data was not normally distributed, what test would you do?

A

Wilcoxon paired test

113
Q

Is the questions asked you to compare the relationship between categorical data what test would you use?

A

Chi squared

114
Q

What test would you use to compare the variances of two groups?

A

Fishers F

115
Q

What are the assumptions that must be met in order for the chi squared statistic to be met?

A

80% or more of the cells have an EXPECTED VALUES of greater than 5
All expected frequencies are greater than 1
The total sample size is greater than 20