Statistics Flashcards
What is descriptive statistics?
- Describes the range of values
- Identify central tendency e.g. average, median
- Describe the distribution of the whole set e.g. varied, similar
- Identify outliers
- Describe percentages
Must know type of data to do this
What is categorical data?
Nominal
- Discrete categories that are mutually exclusive and unordered e.g. sex, blood group
Ordinal
- Discrete categories that are mutually exclusive and ordered (ranked) e.g. disease stage –> cannot be in more than one category
Used in quantitative research
What is continuous data?
‘Scale’ variables e.g. counts and measures
Numerical and discrete
- e.g. counts of days
Numerical and continuous
- e.g. age, height
Used in quantitative research
How can data be summerised?
Bar charts
Box + whisker plot –> e.g. when presenting median values
Line graph –> continuous data changing over time
Scatter plot –> 2 sets of continuous data e.g. grip strength vs arm strength
Pie chart
Histogram –> further development of bar chart data showing the distribution within a category
How do you describe central tendency?
Mean
- Sum of all values divided by the sample size
- Cannot be used as central tendency when there isn’t normal distribution
Median
- The middle or 1/2(n+1) value
- Can be used where there isn’t normal distribution
Mode
- The most frequently occurring value
- Can be used in ordinal data
If data is normally distributed used mean and SD
If data is skewed use median and interquartile range or mode and ranges
Describe standard deviation
Used for normally distributed data to describe the distribution of the values
Describes the range of values of the whole group around the mean
A small SD indicates that most values are close to the mean
Z-scores are the number of SD away from the overall mean
What central tendency is used for different distributions of data?
If data is normally distributed used mean and SD
If data is skewed use median and interquartile range or mode and ranges
What are confidence intervals?
Identify a range in which we can be confident that the ‘true’ population will lie
A 95% CI is the range within 95% of the population will lie
95% CI = mean +- 1.96x standard error
A large 95% CI indicates a high degree of uncertainty in the results
Confidence limits define the lower and upper values of a confidence interval
What is inferential statistics?
The process of using data obtained from a small group of elements (sample) to make estimates and test hypothesis about the characteristics of a larger group of elements (population)
Sample must accurately represent the population
Used in quantitative data with an appropriate research question in an appropriate research design with an adequate sample size
How can a study be underpowered?
If a sample size is too small and there are confounding data undermining whether you can support/ refute null hypothesis –> stats will be underpowered
Statistical methods can still be run but must be highlighted as trends
What can you interpret from inferential statistics?
- The relationship /association between variables e.g. correlation coefficients
- The difference between two or more groups
- The likelihood that the result has occurred by chance (p-values)
What are P-values?
What do they show?
The likelihood that the result has occurred by chance
p=0.5 (a 50% chance)
p=0.05 (a 5% chance)
The lower the p-values the less likely that any observed effect is due to chance
Also known as the alpha value. Larger than 0.05 is not significant
p=0.05 ‘significant’
p=0.01 ‘highly significant’
0=0.001 ‘very highly significant
The p-values represents the amount of evidence in support of the null hypothesis
What is the difference between parametric and non-parametric data?
Parametric tests have more power than non-parametric i.e. you are less likely to make a type II error
Parametric data:
- Assumed normal distribution
- Assumed homogeneity of variance across groups (the spread of scores around the mean are equal)
- Data sets are independent
- Data are numerical and scale
- Data sets are… interval, continuous with an equal distance between values OR ratio, continuous with an equal distance between values and a true zero
Parametric tests are for those with a normal distribution
Non parametric tests are for non-normal distributions
Non-parametric
Skewed
Biomodal
Small sample size
Flat or very point graph
How do you find out if you have normal data?
Use the Shapiro-Wilk test for less than 2000 cases
For more than 2000 cases use the Kolmogorov Smirnov
Levene’s test the distribtuion of two tests, looking at the shape of the distribution
What is a T-test?
Used in parametric data
Compared two sample group by comparing two means relative to their distribution
Tests the probability that the samples come from the same population
Can be
Independent - two groups made up of different people
Paired - same people measured twice
Two tailed testing means the differences between the groups are tested for in either direction
Pairs the two means and SD and Looks at the distributions and compares how much overlap between the two groups at the end of the test.