statistics Flashcards
What is an operational definition of variables?
- a specific statement about how a variable will be measured to represent the concept under study
Makes study more replicable
What is a Measurement?
A way to describe real life factors by numbers
What are the 4 types of measurement
Nominal scales
Ordinal scales
Interval scales
Ratio scales
What is a nominal scale
A measurement scale, in which numbers serve as “tags” or “labels” only, to identify or classify an object.
E.g. Bus 19, 242, 3
What is an ordinal scale
-Data are put in order (distances between scores vary)
What is an interval scale
measurement scale where there is order, the difference between the two variables is equal
Zero has no meaning
What is a ratio scale
-Interval scale and 0 is meaningful
-No negative numbers
What are the measures of central tendency
-Mean, median, mode
define what measures of spread are
How much scores vary
What are the 3 measures of spread
Range
Interquartile range
Standard deviation
What is interquartile range
Looks at the measures of spread between the first and third quarters ( the 25th and 75th score)
What is standard deviation
how far away is each data point from the mean
- The larger the SD the larger the spread of scores
What is the 6 step calculation for standard deviation
Step 1: Find the mean.
Step 2: Subtract the mean from each score.
Step 3: Square each deviation.
Step 4: Add the squared deviations.
Step 5: Divide the sum by the number of scores.
Step 6: Take the square root of the result from step 5
What 3 things are graphs for?
representing data
Indicates patterns within the data (e,g. Central tendency, spread of data, correlations)
Use graphs to decide how to analyse data (e.g. outliers = median rather than mean)
What kind of data are bar graphs for?
Ordinal data
Nominal data
What are the 3 types bar graphs?
Horizontal
Stacked
Histograms (however, the area represents the frequency)
What are the properties of stem and leaf plots?
Data in a compact form
Shows the size of data subsets
Stems = Multiples of (e.g. 0s 10s 20s)
Leafs = units (can only be 1 unit)
What do box plots do?
Summarise data and shows the:
Lower and upper quartile
Median
Minimum
Maximum
How does a box plot interpret outliers?
1.5 x interquartile range
(interquartile range is shown by the length of the red box)
What are the properties of scatterplots
- Shows the relationship between variables
- Needs two bits of data (presents each variable) = bivariate data
- Can work out correlations from it
Describe correlations on scatterplots
-Positive, negative relationship = direction
-Strength of relationship = Points lie closer to a line
-Weak relationship = Points are widely scattered
-Variables that are related are correlated
-Correlation makes no distinction between dependant and independent variable (no cause)
What is the purpose of correlational analysis?
- Whether there is a linear (straight line) relationship between two variables
-The direction of the relationship
-Strength of the relationship
Define correlation coefficient
the specific measure that quantifies the strength of the linear relationship between two variables in a correlation analysis.
- Correlation coefficients do not change if we change the unit of measurement (e.g. gallons instead of litres)
What are the two types of correlation coefficients
-Pearson r
-Spearman r
-Values lie between -1 and 1.
-Positive values = positive relationship
-Negative values = negative relationship
-A larger sample size leads to more certainty that relation is real
Why does linear and non linear matter in scatterplots when quantifying correlation?
-Linear relationship = Can measure correlations
-Non linear relationship = Measuring correlation does not make sense, might need to transform data
Describe the correlation coefficient pearson r
- Calculated directly from the raw scores
- interval or ratio data
- Highly affected by outliers
- Not suitable for skewed data
Describe the correlation coefficient spearman r
- Calculated from the ranking of the raw scores
- ordinal data
- Minimally affected by outliers
- skewed data
What shows distributions?
Density curves
What are density curves useful for?
- generalising results to the population.
- A density curve is a histogram distribution
- Displays overall pattern (shape) of a distribution
- always on or above the horizontal axis
What does the area under a curve of a distribution represent and what can you do with this area?
- Curves are calculated so they have an area of exactly 1 (probability) underneath them
- 100% of the scores under the curve
- if you know certain values of the model (e.g. mean or SD) you can make predictions about the overall population
Area above the mean = 0.6
60% of the scores will be above the mean
What is the median in relation to density curves?
point that divides area into two equal parts
What are quartiles in relation to density curves?
points that divide area under curve into quarters
What is the mode in relation to density curves?
positions at the peak of the curve
What is the mean in relation to density curves?
the balancing point of the curve
What are the properties of a normal distribution?
Symmetrical
Single peaked
Tails meet the x- axis at infinity
what is the shape of a normal distribution determined by
its Standard deviation
What is the location of a normal distribution determined by
Its mean
What are z scores?
- Allows us to compare values from two data sets where two values can be made into a single score, this is called Z- scores (standard scores)
What is the calculation for a z score
(Score) - (Mean) divided by (standard deviation)
What is a standard normal distribution?
- To compare data from two different normal distributions = Convert normal datasets into standard normal distributions by calculating the z- score
How are details of a z distribution (standard normal) worked out?
Using table entries
Table entry always gives = area to the left of z score
Can work out the percentage of population above or below our point of interest
What is always the standard deviation for a z distribution
1
What is always mean for a z distribution
0
What is the total area under the curve for a z distribution
1 (representing 100% of the participants)
What is a chi- squared test?
- Non parametric (Makes no assumptions of population parameters so they are distribution free)
What are the two types of chi- squared test?
The goodness of fit test
The test of independence
- Both types of tests are there to test for significant differences between data sets
What is the chi squared goodness of fit test?
- Used on unrelated categorical data, where each person can only be in one category
- Used to look at the proportions of a population
- Looks at the categories of one variable
What are observed and expected frequencies in the chi squared goodness of fit test?
- The observed frequencies are the numbers of participants measured in individual categories e.g. number of men vs number of women
- These frequencies are then compared to frequencies predicted by the null hypothesis (the expected frequencies)
How do you calculate expected frequencies?
Sample size x the proportion
What is the chi squared test of independence?
- Looks at the categories of two variables
- uses data in the form of frequencies in different categories which is compared to expected frequencies predicted by the null hypothesis
- But instead of 2 categories there are 4
Data is presented in the form of a matrix displaying all categories
How do you calculate the degrees of freedom for chi squared test of independence
(number of rows R minus 1) x (number of column C minus 1)
What is probability?
A measure of how likely it is that some event will occur
Probability can vary from 0 (never) to 1 (always)
Summarise testing the null hypothesis
- Assuming there is no difference and there is no relationship between the two conditions
- Calculate how probable it is to get the score as extreme or more extreme than what we obtained
- If the probability is very small, reject the null hypothesis (Thus accepting the alternative H)
If the p- value is less than 0.05 (5%)
What are critical values?
- A score that tells you if someone scores less or higher than this they are outside this 5% range
- Essentially scores that are the cut off point for statistical significance (top 5%)
How do you calculate critical values?
- To get a 5% score you need a z- score of 1.645
- That 5% cut of of significance is roughly 1.6 standard deviations below or above the mean
-The same for every normal distribution
What is a type I error?
Rejecting the null hypothesis (and accepting the alternative) when we shouldn’t
- Deciding the score is statistically significant when its not
What is a type 2 error?
Accepting the null hypothesis (rejecting the alternative) when we shouldnt
How can you decrease the likelihood of a type I error?
By reducing the threshold of significance from 5% to 1% (0.01)
- However this could increase the possibility of a type 2 error.
- And decreasing the likelihood of a type 2 error could also increase the likelihood of a type I error
What is an alpha level?
P value
What is a non directional (two- tailed) alternative hypothesis?
Does not state the direction just states they will differ
What is a directional (one tailed) alternative hypothesis?
States which direction its going on (e.g. higher lower, better worse)
What is statistical inference from samples?
Using probability theory to make inferences about a population from sample data
Why do we do it? - to Make inferences from a sample to the population
What is the calculation for statistical inference
(Estimated mean) divided (standard deviation from sample)