Data Analysis Flashcards
Distribution =
How frequently different values are observed in the data
Frequency =
Number of times value appears in the data
Frequency distribution =
Table or graph that shows values and their corresponding frequencies
Relative Frequency =
Frequency of a value/Total Number of Data Entries
Relative frequency distribution
Table or graph showing relative frequencies of each value
Make predictions with the slope of trend line of a scatter plot
- Take or estimate 2 points on the trend line
- Work out the slope
- Slope = The change in y axis per every value on the x axis
- Multiply slope if needed to change x axis unit for example (for every hour, for every week etc.)
Arithmetic Mean =
Sum of all the values/ No. Of Values
Weighted Mean =
Sum of All UNIQUE Values/ no. Of unique values
Weight of a value =
Frequency it appears
Median =
‘Middle Number’
- Order values from smallest to biggest
- If no. Of Values is Odd, Median = number in the middle of this list
If No. of Values is even, there are 2 numbers in the middle. Median = Mean of these 2 values
Mode
‘Most frequent’
Value that occurs most frequently in list
There can be more than one in a data set
Positions of data
(Order data from least to greatest)
L = Least
M = Median
G = Greatest
Quartiles
Q1, Q2(M), Q3 Split data in to 4 groups:
L - Q1
Q1 - Q2(M)
Q2(M) - Q3
Q2 - G
Percentiles
99 percentiles split data up in 100 groups
Group 1. L - 1 percentile
Group 100. 99 percentile - G
How to find Q1
Find median of 1st half of data (the data before median)
How to find Q3
Find median of Second half of data (data after the median)
Dispersion
Degree of spread of the data
Most common = range, interquartile range, standard deviation
Range =
G - L
Greatest - Least
(Show maximal spread of data but can be effected by outliers)
Interquartile Range =
Q3 - Q1
Shows spread of middle data. Is not effected by outliers
Standard Deviation - measure of
Measure of spread that depends on every number in the data set (unlike ranges).
The more data is spread away from the mean - the greater the standard deviation
Sometimes called Population Standard Deviation (differentiate it from sample standard Deviation)
How to calculate standard deviation =
- Find the mean
- Find the difference between each value and the mean and square it
- Find the mean of these squared differences
- Square root this number (take only the positive answer)
How to find the SAMPLE Standard Deviation
- Find the mean
- Find the difference between each value and the mean and square it
- Sum of these squared differences/ (no. of Values - 1)
- Square root this number (take only the positive answer)
(Sometimes preferred for a sample of data taken from a larger ‘population’ (set) of data)