Chapter 15 - Data processing, presentation and interpretation Flashcards
Variable, random variable
- denoted by capital letters e.g. X
- if its value varies, it is a variable
- if its value varies at random, it is a random variable e.g. value in a dice throw
Frequency
- number of times a particular value occurs in a data set
Categorical (qualitative) data
- come in classes e.g. types of bird
- can be described without using numbers
- displayed using pictograms, barcharts, dot plots and pie charts
Numerical (quantitative) data
- defined in some way by numbers e.g. time taken to run a race
Ranked data
- given by their position within a group e.g. placing 1st, 2nd or 3rd in a race
- displayed using box-and-whisker plot, cumulative frequency curve
- can deduce median (often called an average/measure of central tendency), quartiles, semi-quartile range (half the IQR, measures how far above/below a large/small value lies),
Discrete variables
- can take certain values but not those in between e.g. shoe size
- displayed using a tally, vertical line chart, stem-and-leaf diagram, grouped frequency table
- can deduce mode, mean, midrange, median, standard deviation, range, IQR
Continuous variables
- if measured accurately, can take any appropriate value; cannot list all the possible values
- displayed using frequency charts, histograms, cumulative frequency charts
Distribution, unimodal and bimodal distributions, positive and negative skew
- pattern in which the values of a variable occur is called its distribution
- displayed in a diagram with the variable on the horizontal and frequency/probability on the vertical
- one peak = unimodal; two peaks = bimodal
- peak to the left = positive skew; peak to the right = negative skew
Grouped data
- easy to allocate data to groups when variable can take many values e.g. age groups
Bivariate and multivariate data
- two variables are assigned to each item in bivariate data e.g. age and mileage of second hand cars
- displayed using scatter diagrams, where linear association (relationship) going upwards = +ve correlation i.e. +ve gradient regression line (line of best fit) and r; linear association going downwards = -ve correlation i.e. -ve gradient regression line and r
–> dependent variable (y-axis) affected by independent variable (x-axis)
–> random variable (can take any value in a set) affected by controlled variable (only takes a set of predetermined values) - more than two variables are involved in multivariate data; stored on spreadsheets
Standard deviation
Mean absolute deviation takes the mean of how far each value is from the mean, by using the magnitudes (ignoring -ve sign)
Standard deviation takes the mean (divide by n-1) of the squares of how far each value if from the mean, so must be square rooted in the end
Variance is the standard deviation without square rooting
Outliers are more than two standard deviations from the mean, or most extreme 5% of data values with a normal distribution