AS Stats Flashcards

1
Q

define population

A

the whole set of items that are of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define census

A

observes or measures every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the advantage of using the census?

A

it should give a completely accurate result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the disadvantages of using the census?

A

time consuming & expensive
cannot be used if the testing process destroys the item
difficult to process a large quantity of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define sample

A

a selection of observations taken from a subset of the population, which is used to find out information about the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the advantages of using a sample?

A

less time consuming & expensive than the census
fewer people have to respond
less data to process than a census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the disadvantages of using a sample?

A

data might not be as accurate
sample might not be large enough to give info about small subsets of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define sampling units

A

individual units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define sampling frame

A

a list of individually named or numbered sampling units of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(how does sampling size affect the validity of the conclusions?)

A

sample size depends on required accuracy & resources
larger sample sizes are more accurate
a varied population requires a larger sample than a uniform population
different samples produce differing results due to natural variation within populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the 3 types of random sampling?

A

simple random
systematic
stratified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

define simple random sampling

A

every sample of size n has an equal chance of being selected
need a sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are advantages of simple random sampling?

A

no bias
easy & cheap for small sample
each sampling unit has a known & equal chance of selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are disadvantages of simple random sampling?

A

not suitable from large sample bc time consuming, disruptive & expensive
need sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define systematic (random) sampling?

A

the required elements are chosen at regular intervals from an ordered list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are advantages of systematic sampling?

A

simple & quick to use
suitable for large samples/populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are disadvantages of systematic sampling?

A

need sampling frame
can be biased if sampling frame is not random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

define stratified (random) sampling

A

population is divided into mutually exclusive strata & a random sample is taken from each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are advantages of stratified sampling?

A

sample accurately reflects the population structure
guarantees proportional representation of groups within the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are disadvantages of stratified sampling?

A

population must be clearly classified into distinct strata
selection within each stratum is random so same disadvantages as random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are the 2 types of non-random sampling?

A

quota
opportunity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

define quota sampling

A

researcher selects a sample that reflects the characteristics of the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are advantages of quota sampling?

A

allows a small sample to be representative of the population
no sampling frame needed
quick, easy & cheap
easy comparison b/w different groups within population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are disadvantages of quota sampling?

A

non-random can introduce bias
population must be divided into groups - expensive or inaccurate
increase scope of study increases # of groups, which increases time & cost
non-responses not recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

define opportunity/convenience sampling

A

take sample from people available at the time of study & who fit the criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what are advantages of opportunity sampling?

A

easy
cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what are disadvantages of opportunity sampling?

A

likely to be not representative of the population
dependent on individual researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

define quantitative variables

A

variables/data associated with numerical observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

define qualitative variables

A

variables/data associated with non-numerical observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

define continuous variable

A

can take any value within a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

define discrete variable

A

can take only specific values within a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

grouped frequency table

A

data is grouped into classes
class boundaries show max. & min. values in each class
midpoint is the average of each class boundary
class width is the difference b/w the upper & lower class boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

when is it best to use mean, median or mode?

A

mean: quantitative data with no extreme values
median: quantitative data with extreme values
mode: qualitative or quantitative data with 1 or 2 modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what is the formula for the mean & for mean of data in frequency table?

A

Σx / n
n = Σf

Σxf / Σf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

how do you calculate median from frequency table?

A

arrange data points in ascending order
add 1 to the # of data points then divide by 2
Σf / 2 + 0.5 to find data - n+1 / 2 th

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

how do you calculate the mode from frequency table?

A

x value with the highest frequency
value that appears the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

how do you calculate mean, median & mode from grouped frequency table?

A

mean: Σ(midpoint x f) / Σf

median: linear interpolation
or Σf / 2 is the number of the value & see what class it is in

mode: class with the highest f
linear interpolation if specified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

what are the other measures of location?

A

Q1 - lower quartile (first 25% of data)
Q2 - median (first 50% of data)
Q3 - upper quartile (first 75% of data)
P10 - 10th percentile (first 10% of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

how do you calculate the location of Q1, Q2 & Q3 for discrete data?

A

Q2: Σf + 1 / 2

Q1: 1/4 x Σf
Q3: 3/4 x Σf
if whole number, Q1/Q3 is halfway b/w this data point & one above
if not whole number, round up & Q1/Q3 is this data point

40
Q

what is the assumption made by using linear interpolation?

A

that the data is evenly distributed within each class

41
Q

what is the formula for linear interpolation?

A

GLB + (PV/GF x CW)

lower bound of class + (place value/group frequency x class width)

place value - how much you have to count up to get into that class

42
Q

what are 3 ways of measuring spread of data & define them?

A

range - difference b/w largest & smallest values in the data set

interquartile range (IQR) - Q3 - Q1, the difference b/w Q3 & Q1

interpercentile range (IPR) - difference b/w the values for 2 given percentiles

43
Q

what are the other ways of measuring spread, define & formulae?

A

variance - each point deviates from the mean by: x - x̄
Sxx/n
Sxx is in FB

standard deviation - square root of variance
see FB

44
Q

what is the formula for coded data?

A

y = x-a / b

45
Q

what is the formula for the mean of coded data?

A

ȳ = x̄ - a / b

46
Q

what is the formula for standard deviation of coded data?

A

σy = σx / b

47
Q

how does coding affect the mean & sd?

A

the code is applied directly to the mean

sd is only impacted by b

48
Q

how do you draw a box plot?

A

see notes sheet
needs scale
x = outlier

49
Q

how are box plots interpreted?

A

comparison of position of median

50
Q

what are the formulae for an outlier?

A

outlier < Q1 - kIQR
outlier > Q3 + kIQR

mean + or - 2σ

51
Q

how do you compare measures of location & spread?

A

location:
1. compare the means or medians
2. e.g. so people in set A have to travel further than set B on average

spread:
1. compare the standard deviations, variance, range or IQR
2. so there is more/less variability in data set A than data set B

52
Q

define outlier

A

an extreme value that lies outside of the pattern of data
it is mathematically defined

53
Q

define anomalies

A

result caused by error
it is removed from the data set (= cleaning)

54
Q

what are the key aspects of a cumulative frequency graph?

A

start at frequency 0
continuous & CW doesn’t need to be equal
join w smooth curve through all points
points plotted at max. of CW

55
Q

why is a CF graph better than linear interpolation when estimating quartiles & percentiles?

A

it doesn’t assume even distribution within class

56
Q

what are the key aspects of a histogram?

A

area of the bar is proportional to the frequency
x: class width (may not be =), continuous variable
y: f density

57
Q

what is a frequency polygon?

A

joining the middle of the top of each bar on a histogram with equal class widths

58
Q

mean & sd compared
median & IQR compared
cannot mix up bc…

A

mean & sd more affected by outliers than median & IQR
mixed up are not comparable

59
Q

what is the difference b/w correlation & causation?

A

correlation: pattern/trend b/w data sets

causation: one variable is directly impacted by the other variable

60
Q

define bivariate data

A

data that has pairs of values for 2 variables

61
Q

what relationship does correlation assume?

A

linear
always say ‘linear correlation’

62
Q

what are the types of correlation?

A

+ve
-ve
strong
weak
none

63
Q

what type of line of best fit is useful in stats?

A

least squares regression line
= straight line that minimises the sum of the squares of the distances of each point from the line
y=a+bx
gradient of line will be +ve for +ve linear correlation & -ve for -ve linear correlation

64
Q

how can you interpret correlation of the data?

A

r (regression statistic) informs how close data is to linear regression line
-1≤ r ≤ 1
r = 0: no linear correlation
r closer to -1: stronger -ve linear correlation
r closer to +1: stronger +ve linear correlation

65
Q

interpolation vs extrapolation of linear regression line

A

interpolation - extracting/predicting value from inside range of data

extrapolation - predicting value from outside the range of data = do not do bc less reliable

66
Q

what variable can you predict using linear regression line?

A

dependent only

67
Q

how would you predict IV from linear regression line?

A

use regression line of x on y = map it the other way round

68
Q

define experiment

A

repeatable process that gives rise to a number of outcomes (results)

69
Q

define event

A

collection of one or more outcomes

70
Q

define sample space

A

set of all possible outcomes
venn diagrams
table
tree diagram

71
Q

define equally likely

A

same probability of outcome

outcomes/total # possible outcomes

72
Q

what 2 ways can probability be calculated?

A

sample space e.g. venn diagram, table, tree diagram

linear interpolation - for continuous data/grouped frequency table

73
Q

rules for venn diagrams

A

fill from middle outwards
assign a value to central intersection - if unknown, put x

74
Q

shade intersection, union & complement on venn diagram
what are the notations?

A

see notes

75
Q

define mutually exclusive & what is the formula?

A

2 events that cannot happen at the same time

P(A n B) = 0

P(A u B) = P(A) + P(B)

76
Q

define independent & what is the formula?

A

the probability of one event is not impacted by the probability of another event
(probability of A happening is the same whether or not B happens)

P(A n B) = P(A) x P(B)

77
Q

what is the assumption for tree diagrams?

A

an object is not replaced

78
Q

define random variable

A

variable whose value depends on the outcome of a random event

79
Q

notation for random variable & outcome

A

X - random variable
x - random outcome

80
Q

sum of all outcomes of an event

A

1

81
Q

what are the types of probability distribution?

A

probability mass function:
P(X=x) = 1/6, x = 1,2,3,4,5,6

table

diagram

82
Q

what is a uniform discrete probability distribution?

A

every outcome has the same probability
fixed numerical values

83
Q

what is a cumulative probability function?

A

tells you the sum of all individual probabilities up to & including x in the calculation for P(X≤x)

84
Q

binomial distribution

A

X ~ B(n,p)

B - 2 possible outcomes (success & failure)
n - fixed number of trials
p - fixed probability for each result
outcomes are independent

85
Q

what is the probability mass function of random variable X, which has binomial distribution

A

P(X = r) = nCr p^r (1-p)^(n-r)
see notes booklet

86
Q

how do you find constant k in random variable probability Qs?

A

use the formula they give & sum all the probabilities to 1
then solve for k

87
Q

define population parameter

A

condition of the distribution that is being tested

88
Q

define test statistic

A

the actual result of doing the experiment

89
Q

define null & alternative hypothesis

A

null: H0
the hypothesis that you assume to be correct

alternative: H1
tells you your assumption about the population parameter is wrong

90
Q

one-tailed vs two-tailed tests

A

one-tailed - one direction
H1: p>… or H1: p<….

two-tailed - 2 directions
H1: p≠…

91
Q

define significance level

A

boundary decided before experiment to decide whether the test fulfils H0 or H1

92
Q

what is the critical region & what is the critical value:

A

critical region is the region of probability which, if test statistic fall inside it, would cause you to reject the null hypothesis

critical value is the first value to be inside the critical region

93
Q

define actual significance level

A

probability of incorrectly rejecting H0

the actual probability of critical region
P(X≤CV) or P(X≥CV)

94
Q

what is the critical region for a two-tailed test?

A

2 parts - half at each end of the distribution

95
Q

what are the 2 methods for conducting a hypothesis test?

A
  1. probability of test statistic
  2. calculate critical region & compare test statistic
96
Q

structure

A

see notes sheet