Preliminaries and Descriptive Statistics Flashcards
What is a variable?
Something that can vary
What are the three types of variable?
- Categorical: non-numerical
- Continuous: numerical but doesn’t have to be a whole number
- Discrete: numerical and a whole number
What does a population consist of?
people or items that share a particular characteristic (or set of characteristics)
What does a sample refer to?
to a selection of individual people or items from a population
In statistics what are we trying to do in regard to populations and samples?
draw inferences about a population from a sample
What is a population parameter?
a quantity that describes some characteristics of a population with respect to a specific variable (needs to be worked out with the entire population)
What is a sample statistic?
A quantity that describes some characteristic of a sample with respect to a specific variable
Why is it usually hard to calculate population parameters?
We don’t have access to the entire group
Why is it important to summarise data?
It can be very complex and there can be lots of it
What should a measure of central tendency provide?
An indication of a ‘typical’ score in the data set
What are the three measures of central tendency?
- Mean
- Median
- Mode
How do you work out the mean?
Add all scores together and divide by number of scores (N)
What are the pros and cons of using the mean?
- Pro: provides an estimate of the average score of the data-set
- Con: is affected by extreme data points
What is the mean?
The average
What is the median?
Value that lies in the middle of the data
How do you find the median?
- order (rank) the data
- find the score in the middle
- if working with an even set find the average of the two scores in the middle
What are the pros and cons of using the median?
- Pro: Insensitive to extreme scores in the data set
- Con: Doesn’t reflect the shape of the scores
What is the mode?
Indication of the ‘typical’ score in the data set.
How do you find the mode?
Find the most frequently occurring value
What’s the pro and con of using the mode?
- Pro: very easy to calculate from a histogram and easy to understand
- Con: data set might have more than 1 mode or no mode at all
What is the range and what is the problem with it?
- Difference between the maximum and minimum scores in your data
- Range doesn’t always change for distributions with different shapes
What is the deviation?
The (signed) distance of a score from the mean
How do you calculate the average deviation?
- Calculate the mean
- Calculate the deviation of each score from the mean
- Calculate the average deviation (add up all the deviations and divide by the number of deviations)
Why don’t we usually use the average deviation as a measure of spread?
Deviations often cancel each other out
How do you calculate the average squared deviation (we don’t usually use this)
- Calculate the mean
- Calculate the deviation of each score from the mean
- Square the deviation (doesn’t affect ordering of the largest deviation)
- Calculate the average squared deviation (by dividing by number of deviations)
How do you work out the sample variance? Why don’t we usually use this?
- Work out the mean
- Calculate deviation of each score from mean
- Square deviation
- Calculate a slightly adjusted average squared deviation (divide by n (number of scores) – 1)
- When N is big this won’t make much of a difference
- However the units are in measure^2 which is a bit weird
How do we calculate the standard deviation?
- Calculate mean
- Calculate deviation of each score from mean
- Square deviation
- Calculate sample Variance (divide by n-1)
- Calculate standard deviation by taking square root
- This number is in original units
Will more concentrated data have a larger or smaller standard deviation?
Smaller and more spread out data will have a larger SD
What are the four different types of data plot?
- histogram
- box plot
- scatter plot
- data summary
Why is using a histogram useful?
- easy to spot mode
- easy to see outlying data
- easy to see range and shape of data
Why is using a box plot useful?
- easy to identify median
- easy to see lower and upper hinge
- easy to see hinge spread
- easy to see adjacent values (lowest and highest values falling within the inner fence
What is a scatter plot?
A correlational research design
Where’s a data summary graph often found?
in data research where you have manipulated variable and see what effect it has on the DV - the standard deviation is plotted
What’s the difference between a numerical and categorical data summary graph?
- numerical data summary graph has a line connecting the data points
- categorical usually uses bars