Statistics Flashcards

1
Q

What is descriptive statistics?

A
  • Describes the range of values
  • Identify central tendency e.g. average, median
  • Describe the distribution of the whole set e.g. varied, similar
  • Identify outliers
  • Describe percentages

Must know type of data to do this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is categorical data?

A

Nominal
- Discrete categories that are mutually exclusive and unordered e.g. sex, blood group

Ordinal
- Discrete categories that are mutually exclusive and ordered (ranked) e.g. disease stage –> cannot be in more than one category

Used in quantitative research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is continuous data?

A

‘Scale’ variables e.g. counts and measures

Numerical and discrete
- e.g. counts of days

Numerical and continuous
- e.g. age, height

Used in quantitative research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can data be summerised?

A

Bar charts
Box + whisker plot –> e.g. when presenting median values
Line graph –> continuous data changing over time
Scatter plot –> 2 sets of continuous data e.g. grip strength vs arm strength
Pie chart
Histogram –> further development of bar chart data showing the distribution within a category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you describe central tendency?

A

Mean
- Sum of all values divided by the sample size
- Cannot be used as central tendency when there isn’t normal distribution

Median
- The middle or 1/2(n+1) value
- Can be used where there isn’t normal distribution

Mode
- The most frequently occurring value
- Can be used in ordinal data

If data is normally distributed used mean and SD
If data is skewed use median and interquartile range or mode and ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe standard deviation

A

Used for normally distributed data to describe the distribution of the values
Describes the range of values of the whole group around the mean
A small SD indicates that most values are close to the mean

Z-scores are the number of SD away from the overall mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What central tendency is used for different distributions of data?

A

If data is normally distributed used mean and SD
If data is skewed use median and interquartile range or mode and ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are confidence intervals?

A

Identify a range in which we can be confident that the ‘true’ population will lie
A 95% CI is the range within 95% of the population will lie
95% CI = mean +- 1.96x standard error
A large 95% CI indicates a high degree of uncertainty in the results
Confidence limits define the lower and upper values of a confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is inferential statistics?

A

The process of using data obtained from a small group of elements (sample) to make estimates and test hypothesis about the characteristics of a larger group of elements (population)

Sample must accurately represent the population
Used in quantitative data with an appropriate research question in an appropriate research design with an adequate sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can a study be underpowered?

A

If a sample size is too small and there are confounding data undermining whether you can support/ refute null hypothesis –> stats will be underpowered
Statistical methods can still be run but must be highlighted as trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can you interpret from inferential statistics?

A
  • The relationship /association between variables e.g. correlation coefficients
  • The difference between two or more groups
  • The likelihood that the result has occurred by chance (p-values)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are P-values?
What do they show?

A

The likelihood that the result has occurred by chance

p=0.5 (a 50% chance)
p=0.05 (a 5% chance)

The lower the p-values the less likely that any observed effect is due to chance

Also known as the alpha value. Larger than 0.05 is not significant
p=0.05 ‘significant’
p=0.01 ‘highly significant’
0=0.001 ‘very highly significant

The p-values represents the amount of evidence in support of the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between parametric and non-parametric data?

A

Parametric tests have more power than non-parametric i.e. you are less likely to make a type II error

Parametric data:
- Assumed normal distribution
- Assumed homogeneity of variance across groups (the spread of scores around the mean are equal)
- Data sets are independent
- Data are numerical and scale
- Data sets are… interval, continuous with an equal distance between values OR ratio, continuous with an equal distance between values and a true zero

Parametric tests are for those with a normal distribution
Non parametric tests are for non-normal distributions

Non-parametric
Skewed
Biomodal
Small sample size
Flat or very point graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you find out if you have normal data?

A

Use the Shapiro-Wilk test for less than 2000 cases
For more than 2000 cases use the Kolmogorov Smirnov
Levene’s test the distribtuion of two tests, looking at the shape of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a T-test?

A

Used in parametric data
Compared two sample group by comparing two means relative to their distribution
Tests the probability that the samples come from the same population

Can be
Independent - two groups made up of different people
Paired - same people measured twice

Two tailed testing means the differences between the groups are tested for in either direction
Pairs the two means and SD and Looks at the distributions and compares how much overlap between the two groups at the end of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an ANOVA test?

A

Analysis of variance
More then two groups
Made up of factors, different categories being compared
Outputs are effects
Interactions between main effects are being measured

Significant p values, doesn’t tell which group is different from another but says one of the groups is different
A post hoc test is needed to identify which give a p value for each group

17
Q

What are some nonparametric tests for inferences between groups?

A

Mann-Whitney U test
- For comparison of two groups with different subjects in each group

Wilcoxon signed ranked test
- Comparison of data where the same subject has been tested twice - giving two groups

Kruskal Wallis test
- Comparison of more than two groups with different subjects in each group

Freidman ANOVA
- Comparison of data where the subject has been tested more than twice

18
Q

What is Chi-Squared test?

A

For non parametric categorical data

A measure of the difference between observed (actual) and expected frequencies
Tests the association or difference between two categorical variables
Expected frequency is that there is no difference between sets of results value = 0
The larger the difference the great the chi value

Uses percents to compare

19
Q

How do you test for association/ correlation?

A

Test relationship between two variables
Usually done with scale data
Where there is a linear relationship there is said to be a correlation

Non Parametric - Spearman’s correlation
Parametric - Pearson’s correlation

R= 1 there is a perfect linear relation ship (-1 if negative relationship)
R= 0.6-0.8 a high correlation
R= 0.2-0.4 a low correlation
Significance depends on the sample size - the more people = the smaller the p value

20
Q

What are some errors in inferential statistics?

A

Type 1 error (a)
- rejecting a true null hypothesis, a false positive
- the probability that you will accept something as statistically significant when it’s actually not
P value reflects chance in making error

Type 2 error
- failing to reject a false a null hypothesis, a false negative
- hypothesis is true but has not recognised that it’s true with the results as the results are true
- the probability of retaining the null hypothesis when it is in fact false

1-B is the power
0.8 = 80% chance of detecting if one does in fact exist

B = 20
The probability of making a type 2 error is 20%
There is a 20% chance of not identifying an effect when there is one

21
Q

What are power calculations?

A

1-B is the power of a test to correctly reject the null hypothesis often set at. .8

Power can be determined by
P value
Effect size - a clinical,t meaningful difference in means of the outcome measure
Sample size
Standard deviation

The equation can be rearranged to find sample size for a study if you already have other values

22
Q

When is power calculated?

A

Ideally before setting up the study as it informs sample size

You need data from previous similar studies as the basis of the calculation
Can do a post hoc calculation

23
Q

How do you increase power?

A

Increase sample size of the study
Using a less stringent significance level e.g. p0.05 rather than p0.01 - this could increase chance of type 1 error
Replication of the study and it’s findings by independent researchers

Minuses the chance if type 2 errors

24
Q

What are the different types of clinical importance

A

Statistically significant but clinically unimportant
- if a difference is found to be statistically significant then it may well be a real but not necessarily clinically important

Not statistically significant but clinically important
- if a difference is found not to be statistically significant then it may still be real (due to a type 2 error) and clinically important