Stats Flashcards
What does qualitative data mean
non-numerical data e.g. hair colour
What does quantitative data mean
numerical data e.g. number of children
discrete data meaning
data that can be counted e.g. number of children
continuous data meaning
data that can be measured e.g. height
how do you find the class width of a set of data that uses short hand e.g.
length | 1-20 | 21-30 |…
put it into ‘5 < x< 9’ form
e.g. 1 - 20 -> 0.5<x<20.5
then minus 20.5 by 0.5
what is descriptive stats?
stats that are collected and organised
what is inferential stats
stats where the data is inferred and analysed for conclusion to be made
what is a population
a whole set of items that are in interest
what is a sample
a selection of items taken as a subset from the population
what is a parameter + example
numerical characterists of a population i.e. a mean
what is a stat?
a numerical characteristic of a sample that can help to estimate a parameter
what tool can you use to remember what a stat and a parameter are used for
stat begins with ‘s’ -> sample
parameter starts with ‘p’ -> Pop
what is a census
data that observes and measures every item within the population
what is an adv and a dis.adv of a census
adv -
representative of the whole pop
d.adv-
expensive, time-consuming, impossible
what can the size of a sample affect
the validity of any conclusions made. The more varied the sample, the more accurate the results
what is a sampling frame?
a list with all items of the population individually named
what is a sampling unit
an individual unit of the population
how do you carry out a simple random sample
- form a sampling name
- allocate each item a specific number
- generate a random number, e.g. using a calculator, as many times as needed for your sample size (if u need a sample size of 30, generate 30 random numbers)
how do you carry out systematic sampling
- form a sampling frame
- allocate each item a unique number
- using a calc, generate a random number within your population size (this is your starting unit)
- calculate the integer component (population size/ sample size = x)
- select every xth item after the first to be included in the sample
how do you carry out stratified sampling
- divide the data into groups (i.e. year groups, age sex)
- calculate sample size for strata (xi) -> xi = (sample size/population size) x strata size
- make sure the sum of all the xi’s equal the sample size (you may have to round is appropriate)
- conduct a simple random sample for each strata
adv and dis adv of
- simple random
adv:
- everyone has an easy chance of being selected - removes bias
- easy to conduct
d.adv:
- time consuming
-
adv and dis adv of
- systematic
adv:
- covers a wider study area
- less likely to introduce bias because the starting point is randomly generated
d.adv:
- need a randomly generated starting point
- need to know the number of total pop for it to work
adv and dis adv of
- stratified
adv:
- each group receives representation within the sample, as it is proprtional to the group size = increased accuracy
d.adv:
- not all members of the pop may belong to a specific group
what are the two non-random techniques
quota sampling
convenience/opportunity sampling
what is quota sampling
- the population is split into groups (year, age, race)
- individuals are chosen who best fit the requirements
what is convenience sampling
- a sample is taken from people who are availble for the study (i.e. the first 20 people i see)
adv and dis adv of
quota sampling
adv:
- representative
d. adv
- non random sampling may introduce bias
adv and dis adv of
convenience/ opportunity sampling
adv:
- quick -> no need of a sampling frame
d.adv:
- not representative -> the first people seen may not effectively represent the whole pop
which is affected by extreme values and which arent: mean, media, and mode
mode is not
median is not
mean is
how do you use your GDC to find mean, median, mode, etc
stats -> enter data ->F2 (calc) -> F6 (SET) -> make sure ‘List1’ is the first line and ‘List2’ is the second ->EXIT -> F1 (1-VAR)
x = mean
n = sample size
Med = median
Mod = Mode
what is the formula for the integer component
population/ sample size
P/n
what is the formula to find the sample size for strata
sample size for strata = sample size/ population size x strata size
xi = n/p x si
what is the formula for the mean from a raw list of data
x = sum of the items/ no of items
what is the formula for the mean from grouped discrete distribution
x = sum of each item x its frequency/ the sum of the frequencies
what is the formula for the mean from grouped continuous distribution
x = sum of the midpoint of each class x its frequency / the sum of its frequencies
what is the formula for the median from very large or grouped distribution
n/2
(n = sum of the frequencies)
what is the formula for the median of a raw list of data
n+1/2 = the median item
what is a quartile
a data point that lies 1/4, 1/2, or 3/4 through the data
define range
the difference between the largest and smallest numbers - > easily affected by large valies
define interquartile range
the difference between the upper and lower quartiles. shows the spread of the central 50% of data so is unaffcted by outliers
define standard deviation
how spread out the data is
what is the formula for variance
variance = (standard deviation)^2
if you add a constant to the mean and standard deviation, how will they be affected
new mean (xi) = x+c
no change to standard deviation
if you multiply a constant (b) by the mean and standard deviation, how will they be affected
mean (x) = bx
standard deviation (o) = bo
what is relative frequency?
the proportion of the total frequency that lies within a class
what is the formula for relatiev frequency
class frequency/ sample size
this will be a decimal between 0 and 1
what does a bell-shaped/ symmetrical distribution suggest
that data is grouped around the mean or median
what does a left/negative distribution suggest
data is largely grouped to the top of the range. median is usually higher than mean
what does a right/positive distribution suggest
the data is largely grouped towards the bottom of the range. median is usually lower than mean
what does a left skewed histogram look like
the highest bars are found on teh right hand side
what does a right skewed histogram look like
the highest bars are found on teh left hand side
what does unimodal mean
having one peak
what does bimodal mean
having two peaks
what does uniform mean
where teh frequency for all classes is approximately equal
how do u graph a histogram on the gdc
STATS -> enter data -> GRAPH -> SET -> GRAPH 2 = HIST (F6 -> F1) -> set width and starting point -> GRAPH
what values do u use when drawing a cumulative frequency graph
use the lower bound value of the first coordinate i.e. if 1.40<x<1.45 is the first class then (1.40,0) is the first coordinate and (1.45, cf for that class) is the next one. After that is (upper bound, cf)
if a question asks you to find the 80th percentile on a cumulative frequency graph, and it goes up to 160, what do you do?
80/100 x 160
what are the 3 equations to calculate outliers
any value which is:
- greater than the upper quartile + 1.5 x interquartile range
- less than the lower quartile - 1.5 x interquartile range
- a value that falls outside the mean + or - 2(standard deviation)
what does the box is a box plot show
the middle 50% of data
what do the whiskers show
the minimum and maximum values
What is bivariate data
data that tracks two characteristics of a population (x and y), called variables
what is an explanatory or independent variable
a variable set and controlled by the observer in a study
what is the response or dependent variable
a variable recorded to measure the outcome of a study
what does association mean
when the explanatory and response variables demonstrate a relationship
what does causation mean
when the explanatory/independent variable is the reason for the relationship
what is correlation
how well two variables are related
what do points closest to a best-fit line mean
closer the points = stronger the correlation
what measure can we use to find a numerical value to represent linear correlation
Pearson’s Moment Correlation Coefficient (PMCC)
what will the value of the PMCC (r) always be between
-1< r <1
if the value of the PMCC is closer to -1 or 1, is it stronger or weaker?
stronger
what does y on x mean and x on y mean
y on x => y=ax+b
x on y => x=ay+b
how do u switch from y on x, to x on y
SET -> swap the 3rd and 4th list around
what is interpolation
making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), WITHIN the range of data
what is extrapolation
making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), OUTSIDE the range of data
is interpolation or extrapolation more reliable and why
interpolation is more reliable, because extrapolation estimates values outside of the given data, so there is no guarantee a linear trend may continue