Data Analysis (Week 16) Flashcards
Descriptive Statistics
- Describe data
- Indicate trends
- Show how the IV affects the DV
- Doesn’t tell you which hypothesis to accept
**Two types of descriptive statistics: **
1. Measures of central tendency (averages)
2. Measures of spread (range, standard deviation)
Measures of central tendency
A mathematical way to find out the typical or average score from a data set, using the mode, median or mean.
Types of data
- Nominal data
- Ordinal data
- Interval data
Types of data
Nominal Data
- Data that is discrete
- Fits into named categories eg. tally chart
- Always use the mode when finding the average of this type of data.
Ordinal data
- Data that can be put into rank order
- Is subjective
- Intervals between the score may not be equal
- Usually from rating scales
- Always use the median when finding the average of this type of data.
eg. in a school survey ppts are asked to rate how hard they thought they worked, from 1 to 10.
Interval data
- Data can be put into rank order
- Is objective
- Intervals between scores are equal
- Usually an objective measure like, time, height, weight.
- Always use the mean when finding the average of this type of data.
eg. How many mins each student spends on revision.
The Mode
- Most frequent score in a data set
- Most suited to discrete data that is organised into categories (nominal data)
- If 2 or more values are equally common there will be 2 or more modes.
Evaluating the Mode
**Strengths: **
* Is not skewed by anomalies
* Useful to show the most popular value.
**Weaknesses: **
* Less sensitive as it only uses the most frequent score - ignoring all other data
* Should not be used if there is more than one mode.
The median
- Only used with numerical data on a linear scale
- Cannot be used with discrete data
- Useful in Pysch for subjective linear data on rating scales (ordinal data)
- To find the median all the scores in the data set are put into a list from smallest to largest.
- The middle number is the median
- If there is an even number of scores, the two middle values are added together and divided by 2 to find the median.
Evaluating the median
Strengths:
* Not skewed by anomalies
Weaknesses:
* Less sensitive than the mean, as it only uses the middle scores - ignoring data that is very low or high.
The Mean
- Usually called the ‘average’
- Can only be used with numerical data from linear scales.
- Uses Data which is objective
- Uses data that has equal intervals between the scores eg. weight, height, time (known as interval data)
- Mean is worked out by adding up all the scores in the data set and dividing by the total number of scores.
- It’s the most informative measure of central tendency because it takes every score into account.
Evaluating the mean
Strengths:
* More sensitive as it uses the scores to provide an average.
Weaknesses:
* Can be skewed by anomalies - shouldn’t be used when there are extreme scores.
Measures of Spread
They show how widespread the scores are across the samle. How varied the ppts were.
Includes;
* The range
* Standard deviation
The Range
- Simplest measure of spread
- Find the largest and smallest value in the data set.
- Subtract smallest value from largest
- Add 1
- A Higher range shows more variation between ppts
Evaluating the range
Strengths:
* Quick and easy to calculate
Weaknesses:
* Less sensitive as it only uses the highest and lowest
* Can be skewed by extreme values and it may look like there is lots of variation but there may not be.
The Standard deviation
- Standard deviation considers the difference between each data point and the mean.
- Tells us the spread of the group
- Scores that are more spread out have larger standard deviations
- Closely clustered scores have smaller standard deviations
- When standard deviations of 2 groups are similar, it means they have a similar variation around the mean.
Evaluating standard deviation
Strengths:
* More sensitive than the range, as it uses all the scores to show how far a group of ppts scores vary from the mean.
* not influenced by extreme score at either end of the data set
Weaknesses:
* Time consuming to calculate
Bar charts
- Used for data in discrete categories and total or average scores.
- Gaps between each bar because columns aren’t related in any linear way.
- IV go along X-axis
- DV go on Y-axis
Histograms
- Show the pattern in a whole data set, when this is continuous data
- Can illustrate the distribution of a set of scores
- Dv plotted on the x-axis in groups
- frequency of each score on the y-axis
- Bars have no gaps
- So if no scores in a category a gap must be left empty
Scatter graphs
- Results from a correlational study are displayed on a scatter graph.
- Dots/crosses represent individual’s scores.
- Sometimes a line of best fit is drawn - it should come as close to as many points as possible.
- Strong correlation=data points lie close to line.
- Weak correlation=data points are more spread out.