Stats Flashcards

1
Q

What does qualitative data mean

A

non-numerical data e.g. hair colour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does quantitative data mean

A

numerical data e.g. number of children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

discrete data meaning

A

data that can be counted e.g. number of children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

continuous data meaning

A

data that can be measured e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how do you find the class width of a set of data that uses short hand e.g.
length | 1-20 | 21-30 |…

A

put it into ‘5 < x< 9’ form
e.g. 1 - 20 -> 0.5<x<20.5
then minus 20.5 by 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is descriptive stats?

A

stats that are collected and organised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is inferential stats

A

stats where the data is inferred and analysed for conclusion to be made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a population

A

a whole set of items that are in interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a sample

A

a selection of items taken as a subset from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a parameter + example

A

numerical characterists of a population i.e. a mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a stat?

A

a numerical characteristic of a sample that can help to estimate a parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what tool can you use to remember what a stat and a parameter are used for

A

stat begins with ‘s’ -> sample
parameter starts with ‘p’ -> Pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a census

A

data that observes and measures every item within the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an adv and a dis.adv of a census

A

adv -
representative of the whole pop

d.adv-
expensive, time-consuming, impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what can the size of a sample affect

A

the validity of any conclusions made. The more varied the sample, the more accurate the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is a sampling frame?

A

a list with all items of the population individually named

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is a sampling unit

A

an individual unit of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do you carry out a simple random sample

A
  1. form a sampling name
  2. allocate each item a specific number
  3. generate a random number, e.g. using a calculator, as many times as needed for your sample size (if u need a sample size of 30, generate 30 random numbers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do you carry out systematic sampling

A
  1. form a sampling frame
  2. allocate each item a unique number
  3. using a calc, generate a random number within your population size (this is your starting unit)
  4. calculate the integer component (population size/ sample size = x)
  5. select every xth item after the first to be included in the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do you carry out stratified sampling

A
  1. divide the data into groups (i.e. year groups, age sex)
  2. calculate sample size for strata (xi) -> xi = (sample size/population size) x strata size
  3. make sure the sum of all the xi’s equal the sample size (you may have to round is appropriate)
  4. conduct a simple random sample for each strata
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

adv and dis adv of
- simple random

A

adv:
- everyone has an easy chance of being selected - removes bias
- easy to conduct

d.adv:
- time consuming
-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

adv and dis adv of
- systematic

A

adv:
- covers a wider study area
- less likely to introduce bias because the starting point is randomly generated

d.adv:
- need a randomly generated starting point
- need to know the number of total pop for it to work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

adv and dis adv of
- stratified

A

adv:
- each group receives representation within the sample, as it is proprtional to the group size = increased accuracy

d.adv:
- not all members of the pop may belong to a specific group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are the two non-random techniques

A

quota sampling
convenience/opportunity sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is quota sampling

A
  1. the population is split into groups (year, age, race)
  2. individuals are chosen who best fit the requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is convenience sampling

A
  1. a sample is taken from people who are availble for the study (i.e. the first 20 people i see)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

adv and dis adv of
quota sampling

A

adv:
- representative

d. adv
- non random sampling may introduce bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

adv and dis adv of
convenience/ opportunity sampling

A

adv:
- quick -> no need of a sampling frame

d.adv:
- not representative -> the first people seen may not effectively represent the whole pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

which is affected by extreme values and which arent: mean, media, and mode

A

mode is not
median is not
mean is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

how do you use your GDC to find mean, median, mode, etc

A

stats -> enter data ->F2 (calc) -> F6 (SET) -> make sure ‘List1’ is the first line and ‘List2’ is the second ->EXIT -> F1 (1-VAR)

x = mean
n = sample size
Med = median
Mod = Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what is the formula for the integer component

A

population/ sample size
P/n

32
Q

what is the formula to find the sample size for strata

A

sample size for strata = sample size/ population size x strata size

xi = n/p x si

33
Q

what is the formula for the mean from a raw list of data

A

x = sum of the items/ no of items

34
Q

what is the formula for the mean from grouped discrete distribution

A

x = sum of each item x its frequency/ the sum of the frequencies

35
Q

what is the formula for the mean from grouped continuous distribution

A

x = sum of the midpoint of each class x its frequency / the sum of its frequencies

36
Q

what is the formula for the median from very large or grouped distribution

A

n/2
(n = sum of the frequencies)

36
Q

what is the formula for the median of a raw list of data

A

n+1/2 = the median item

37
Q

what is a quartile

A

a data point that lies 1/4, 1/2, or 3/4 through the data

38
Q

define range

A

the difference between the largest and smallest numbers - > easily affected by large valies

39
Q

define interquartile range

A

the difference between the upper and lower quartiles. shows the spread of the central 50% of data so is unaffcted by outliers

40
Q

define standard deviation

A

how spread out the data is

41
Q

what is the formula for variance

A

variance = (standard deviation)^2

42
Q

if you add a constant to the mean and standard deviation, how will they be affected

A

new mean (xi) = x+c
no change to standard deviation

43
Q

if you multiply a constant (b) by the mean and standard deviation, how will they be affected

A

mean (x) = bx
standard deviation (o) = bo

44
Q

what is relative frequency?

A

the proportion of the total frequency that lies within a class

45
Q

what is the formula for relatiev frequency

A

class frequency/ sample size

this will be a decimal between 0 and 1

46
Q

what does a bell-shaped/ symmetrical distribution suggest

A

that data is grouped around the mean or median

47
Q

what does a left/negative distribution suggest

A

data is largely grouped to the top of the range. median is usually higher than mean

48
Q

what does a right/positive distribution suggest

A

the data is largely grouped towards the bottom of the range. median is usually lower than mean

49
Q

what does a left skewed histogram look like

A

the highest bars are found on teh right hand side

50
Q

what does a right skewed histogram look like

A

the highest bars are found on teh left hand side

51
Q

what does unimodal mean

A

having one peak

52
Q

what does bimodal mean

A

having two peaks

53
Q

what does uniform mean

A

where teh frequency for all classes is approximately equal

54
Q

how do u graph a histogram on the gdc

A

STATS -> enter data -> GRAPH -> SET -> GRAPH 2 = HIST (F6 -> F1) -> set width and starting point -> GRAPH

55
Q

what values do u use when drawing a cumulative frequency graph

A

use the lower bound value of the first coordinate i.e. if 1.40<x<1.45 is the first class then (1.40,0) is the first coordinate and (1.45, cf for that class) is the next one. After that is (upper bound, cf)

56
Q

if a question asks you to find the 80th percentile on a cumulative frequency graph, and it goes up to 160, what do you do?

A

80/100 x 160

57
Q

what are the 3 equations to calculate outliers

A

any value which is:
- greater than the upper quartile + 1.5 x interquartile range
- less than the lower quartile - 1.5 x interquartile range
- a value that falls outside the mean + or - 2(standard deviation)

58
Q

what does the box is a box plot show

A

the middle 50% of data

59
Q

what do the whiskers show

A

the minimum and maximum values

60
Q

What is bivariate data

A

data that tracks two characteristics of a population (x and y), called variables

61
Q

what is an explanatory or independent variable

A

a variable set and controlled by the observer in a study

62
Q

what is the response or dependent variable

A

a variable recorded to measure the outcome of a study

63
Q

what does association mean

A

when the explanatory and response variables demonstrate a relationship

64
Q

what does causation mean

A

when the explanatory/independent variable is the reason for the relationship

65
Q

what is correlation

A

how well two variables are related

66
Q

what do points closest to a best-fit line mean

A

closer the points = stronger the correlation

67
Q

what measure can we use to find a numerical value to represent linear correlation

A

Pearson’s Moment Correlation Coefficient (PMCC)

68
Q

what will the value of the PMCC (r) always be between

A

-1< r <1

69
Q

if the value of the PMCC is closer to -1 or 1, is it stronger or weaker?

A

stronger

70
Q

what does y on x mean and x on y mean

A

y on x => y=ax+b
x on y => x=ay+b

71
Q

how do u switch from y on x, to x on y

A

SET -> swap the 3rd and 4th list around

72
Q

what is interpolation

A

making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), WITHIN the range of data

73
Q

what is extrapolation

A

making an estimate of the response variable for a given explanatory variable ( estimating y, given the x variable), OUTSIDE the range of data

74
Q

is interpolation or extrapolation more reliable and why

A

interpolation is more reliable, because extrapolation estimates values outside of the given data, so there is no guarantee a linear trend may continue