Stats Flashcards by Nikki Mahandru

What does qualitative data mean

non-numerical data e.g. hair colour

How well did you know this?

Not at all

Perfectly

What does quantitative data mean

numerical data e.g. number of children

How well did you know this?

Not at all

Perfectly

discrete data meaning

data that can be counted e.g. number of children

How well did you know this?

Not at all

Perfectly

continuous data meaning

data that can be measured e.g. height

How well did you know this?

Not at all

Perfectly

how do you find the class width of a set of data that uses short hand e.g.
length | 1-20 | 21-30 |…

put it into ‘5 < x< 9’ form
e.g. 1 - 20 -> 0.5<x<20.5
then minus 20.5 by 0.5

How well did you know this?

Not at all

Perfectly

what is descriptive stats?

stats that are collected and organised

How well did you know this?

Not at all

Perfectly

what is inferential stats

stats where the data is inferred and analysed for conclusion to be made

How well did you know this?

Not at all

Perfectly

what is a population

a whole set of items that are in interest

How well did you know this?

Not at all

Perfectly

what is a sample

a selection of items taken as a subset from the population

How well did you know this?

Not at all

Perfectly

what is a parameter + example

numerical characterists of a population i.e. a mean

How well did you know this?

Not at all

Perfectly

what is a stat?

a numerical characteristic of a sample that can help to estimate a parameter

How well did you know this?

Not at all

Perfectly

what tool can you use to remember what a stat and a parameter are used for

stat begins with ‘s’ -> sample
parameter starts with ‘p’ -> Pop

How well did you know this?

Not at all

Perfectly

what is a census

data that observes and measures every item within the population

How well did you know this?

Not at all

Perfectly

what is an adv and a dis.adv of a census

adv -
representative of the whole pop

d.adv-
expensive, time-consuming, impossible

How well did you know this?

Not at all

Perfectly

what can the size of a sample affect

the validity of any conclusions made. The more varied the sample, the more accurate the results

How well did you know this?

Not at all

Perfectly

what is a sampling frame?

a list with all items of the population individually named

How well did you know this?

Not at all

Perfectly

what is a sampling unit

an individual unit of the population

How well did you know this?

Not at all

Perfectly

how do you carry out a simple random sample

form a sampling name
allocate each item a specific number
generate a random number, e.g. using a calculator, as many times as needed for your sample size (if u need a sample size of 30, generate 30 random numbers)

How well did you know this?

Not at all

Perfectly

how do you carry out systematic sampling

form a sampling frame
allocate each item a unique number
using a calc, generate a random number within your population size (this is your starting unit)
calculate the integer component (population size/ sample size = x)
select every xth item after the first to be included in the sample

How well did you know this?

Not at all

Perfectly

how do you carry out stratified sampling

divide the data into groups (i.e. year groups, age sex)
calculate sample size for strata (xi) -> xi = (sample size/population size) x strata size
make sure the sum of all the xi’s equal the sample size (you may have to round is appropriate)
conduct a simple random sample for each strata

How well did you know this?

Not at all

Perfectly

adv and dis adv of
- simple random

adv:
- everyone has an easy chance of being selected - removes bias
- easy to conduct

d.adv:
- time consuming
-

How well did you know this?

Not at all

Perfectly

adv and dis adv of
- systematic

adv:
- covers a wider study area
- less likely to introduce bias because the starting point is randomly generated

d.adv:
- need a randomly generated starting point
- need to know the number of total pop for it to work

How well did you know this?

Not at all

Perfectly

adv and dis adv of
- stratified

adv:
- each group receives representation within the sample, as it is proprtional to the group size = increased accuracy

d.adv:
- not all members of the pop may belong to a specific group

How well did you know this?

Not at all

Perfectly

what are the two non-random techniques

quota sampling
convenience/opportunity sampling

How well did you know this?

Not at all

Perfectly

what is quota sampling

1. the population is split into groups (year, age, race) 2. individuals are chosen who best fit the requirements

what is convenience sampling

1. a sample is taken from people who are availble for the study (i.e. the first 20 people i see)

adv and dis adv of quota sampling

adv: - representative d. adv - non random sampling may introduce bias

adv and dis adv of convenience/ opportunity sampling

adv: - quick -> no need of a sampling frame d.adv: - not representative -> the first people seen may not effectively represent the whole pop

which is affected by extreme values and which arent: mean, media, and mode

mode is not median is not mean is

how do you use your GDC to find mean, median, mode, etc

stats -> enter data ->F2 (calc) -> F6 (SET) -> make sure ‘List1’ is the first line and ‘List2’ is the second ->EXIT -> F1 (1-VAR) x = mean n = sample size Med = median Mod = Mode

what is the formula for the integer component

population/ sample size P/n

what is the formula to find the sample size for strata

sample size for strata = sample size/ population size x strata size xi = n/p x si

what is the formula for the mean from a raw list of data

x = sum of the items/ no of items

what is the formula for the mean from grouped discrete distribution

x = sum of each item x its frequency/ the sum of the frequencies

what is the formula for the mean from grouped continuous distribution

x = sum of the midpoint of each class x its frequency / the sum of its frequencies

what is the formula for the median from very large or grouped distribution

n/2 (n = sum of the frequencies)

what is the formula for the median of a raw list of data

n+1/2 = the median item

what is a quartile

a data point that lies 1/4, 1/2, or 3/4 through the data

define range

the difference between the largest and smallest numbers - > easily affected by large valies

define interquartile range

the difference between the upper and lower quartiles. shows the spread of the central 50% of data so is unaffcted by outliers

define standard deviation

how spread out the data is

what is the formula for variance

variance = (standard deviation)^2

if you add a constant to the mean and standard deviation, how will they be affected

new mean (xi) = x+c no change to standard deviation

if you multiply a constant (b) by the mean and standard deviation, how will they be affected

mean (x) = bx standard deviation (o) = bo

what is relative frequency?

the proportion of the total frequency that lies within a class

what is the formula for relatiev frequency

class frequency/ sample size this will be a decimal between 0 and 1

what does a bell-shaped/ symmetrical distribution suggest

that data is grouped around the mean or median

what does a left/negative distribution suggest

data is largely grouped to the top of the range. median is usually higher than mean

what does a right/positive distribution suggest

the data is largely grouped towards the bottom of the range. median is usually lower than mean

what does a left skewed histogram look like

the highest bars are found on teh right hand side

what does a right skewed histogram look like

the highest bars are found on teh left hand side

what does unimodal mean

having one peak

what does bimodal mean

having two peaks

what does uniform mean

where teh frequency for all classes is approximately equal

how do u graph a histogram on the gdc

STATS -> enter data -> GRAPH -> SET -> GRAPH 2 = HIST (F6 -> F1) -> set width and starting point -> GRAPH

what values do u use when drawing a cumulative frequency graph

use the lower bound value of the first coordinate i.e. if 1.40

if a question asks you to find the 80th percentile on a cumulative frequency graph, and it goes up to 160, what do you do?

80/100 x 160

what are the 3 equations to calculate outliers

any value which is: - greater than the upper quartile + 1.5 x interquartile range - less than the lower quartile - 1.5 x interquartile range - a value that falls outside the mean + or - 2(standard deviation)

what does the box is a box plot show

the middle 50% of data

what do the whiskers show

the minimum and maximum values

What is bivariate data

data that tracks two characteristics of a population (x and y), called variables

what is an explanatory or independent variable

a variable set and controlled by the observer in a study

what is the response or dependent variable

a variable recorded to measure the outcome of a study

what does association mean

when the explanatory and response variables demonstrate a relationship

what does causation mean

when the explanatory/independent variable is the reason for the relationship

what is correlation

how well two variables are related

what do points closest to a best-fit line mean

closer the points = stronger the correlation

what measure can we use to find a numerical value to represent linear correlation

Pearson’s Moment Correlation Coefficient (PMCC)

what will the value of the PMCC (r) always be between

-1< r <1

if the value of the PMCC is closer to -1 or 1, is it stronger or weaker?

stronger

what does y on x mean and x on y mean

y on x => y=ax+b x on y => x=ay+b

how do u switch from y on x, to x on y

SET -> swap the 3rd and 4th list around

what is interpolation

making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), WITHIN the range of data

what is extrapolation

making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), OUTSIDE the range of data

is interpolation or extrapolation more reliable and why

interpolation is more reliable, because extrapolation estimates values outside of the given data, so there is no guarantee a linear trend may continue

Stats Flashcards

(75 cards)