AS Stats Flashcards

Question 1

Q

define population

Answer

A

the whole set of items that are of interest

Question 2

Q

define census

Answer

A

observes or measures every member of a population

Question 3

Q

what is the advantage of using the census?

Answer

A

it should give a completely accurate result

Question 4

Q

what are the disadvantages of using the census?

Answer

A

time consuming & expensive
cannot be used if the testing process destroys the item
difficult to process a large quantity of data

Question 5

Q

define sample

Answer

A

a selection of observations taken from a subset of the population, which is used to find out information about the whole population

Question 6

Q

what are the advantages of using a sample?

Answer

A

less time consuming & expensive than the census
fewer people have to respond
less data to process than a census

Question 7

Q

what are the disadvantages of using a sample?

Answer

A

data might not be as accurate
sample might not be large enough to give info about small subsets of the population

Question 8

Q

define sampling units

Answer

A

individual units of a population

Question 9

Q

define sampling frame

Answer

A

a list of individually named or numbered sampling units of a population

Question 10

Q

(how does sampling size affect the validity of the conclusions?)

Answer

A

sample size depends on required accuracy & resources
larger sample sizes are more accurate
a varied population requires a larger sample than a uniform population
different samples produce differing results due to natural variation within populations

Question 11

Q

what are the 3 types of random sampling?

Answer

A

simple random
systematic
stratified

Question 12

Q

define simple random sampling

Answer

A

every sample of size n has an equal chance of being selected
need a sampling frame

Question 13

Q

what are advantages of simple random sampling?

Answer

A

no bias
easy & cheap for small sample
each sampling unit has a known & equal chance of selection

Question 14

Q

what are disadvantages of simple random sampling?

Answer

A

not suitable from large sample bc time consuming, disruptive & expensive
need sampling frame

Question 15

Q

define systematic (random) sampling?

Answer

A

the required elements are chosen at regular intervals from an ordered list

Question 16

Q

what are advantages of systematic sampling?

Answer

A

simple & quick to use
suitable for large samples/populations

Question 17

Q

what are disadvantages of systematic sampling?

Answer

A

need sampling frame
can be biased if sampling frame is not random

Question 18

Q

define stratified (random) sampling

Answer

A

population is divided into mutually exclusive strata & a random sample is taken from each

Question 19

Q

what are advantages of stratified sampling?

Answer

A

sample accurately reflects the population structure
guarantees proportional representation of groups within the population

Question 20

Q

what are disadvantages of stratified sampling?

Answer

A

population must be clearly classified into distinct strata
selection within each stratum is random so same disadvantages as random

Question 21

Q

what are the 2 types of non-random sampling?

Answer

A

quota
opportunity

Question 22

Q

define quota sampling

Answer

A

researcher selects a sample that reflects the characteristics of the whole population

Question 23

Q

what are advantages of quota sampling?

Answer

A

allows a small sample to be representative of the population
no sampling frame needed
quick, easy & cheap
easy comparison b/w different groups within population

Question 24

Q

what are disadvantages of quota sampling?

Answer

A

non-random can introduce bias
population must be divided into groups - expensive or inaccurate
increase scope of study increases # of groups, which increases time & cost
non-responses not recorded

Question 25

Q

define opportunity/convenience sampling

Answer

A

take sample from people available at the time of study & who fit the criteria

Question 26

Q

what are advantages of opportunity sampling?

Answer

A

easy
cheap

Question 27

Q

what are disadvantages of opportunity sampling?

Answer

A

likely to be not representative of the population
dependent on individual researcher

Question 28

Q

define quantitative variables

Answer

A

variables/data associated with numerical observations

Question 29

Q

define qualitative variables

Answer

A

variables/data associated with non-numerical observations

Question 30

Q

define continuous variable

Answer

A

can take any value within a given range

Question 31

Q

define discrete variable

Answer

A

can take only specific values within a given range

Question 32

Q

define measure of location

Answer

A

single value that describes the position in a data set

Question 33

Q

define measure of central tendency

Answer

A

when the single value (of the measure of location) describes the centre of the data

Question 34

Q

what are the features of a grouped frequency table?

Answer

A

data is grouped into classes
class boundaries show max. & min. values in each class
midpoint is the average of each class boundary
class width is the difference b/w the upper & lower class boundaries

Question 35

Q

when is it best to use mean, median or mode?

Answer

A

mean: quantitative data with no extreme values
median: quantitative data with extreme values
mode: qualitative or quantitative data with 1 or 2 modes

Question 36

Q

what is the formula for the mean & for mean of data in frequency table?

Answer

A

Σx / n
n = Σf

Σxf / Σf

Question 37

Q

how do you calculate median from frequency table?

Answer

A

arrange data points in ascending order
add 1 to the # of data points then divide by 2
Σf / 2 + 0.5 to find data - n+1 / 2 th

Question 38

Q

how do you calculate the mode from frequency table?

Answer

A

x value with the highest frequency
value that appears the most

Question 39

Q

how do you calculate mean, median & mode from grouped frequency table?

Answer

A

mean: Σ(midpoint x f) / Σf

median: linear interpolation
or Σf / 2 is the number of the value & see what class it is in

mode: class with the highest f
linear interpolation if specified

Question 40

Q

what are the other measures of location?

Answer

A

Q1 - lower quartile (first 25% of data)
Q2 - median (first 50% of data)
Q3 - upper quartile (first 75% of data)
P10 - 10th percentile (first 10% of data)

Question 41

Q

how do you calculate the location of Q1, Q2 & Q3 for discrete data?

Answer

A

Q2: Σf + 1 / 2

Q1: 1/4 x Σf
Q3: 3/4 x Σf
if whole number, Q1/Q3 is halfway b/w this data point & one above
if not whole number, round up & Q1/Q3 is this data point

Question 42

Q

what is the assumption made by using linear interpolation?

Answer

A

that the data is evenly distributed within each class

Question 43

Q

what is the formula for linear interpolation?

Answer

A

GLB + (PV/GF x CW)

lower bound of class + (place value/group frequency x class width)

place value - how much you have to count up to get into that class

Question 44

Q

what are 3 ways of measuring spread of data & define them?

Answer

A

range - difference b/w largest & smallest values in the data set

interquartile range (IQR) - Q3 - Q1, the difference b/w Q3 & Q1

interpercentile range (IPR) - difference b/w the values for 2 given percentiles

Question 45

Q

what are the other ways of measuring spread, define & formulae?

Answer

A

variance - each point deviates from the mean by: x - x̄
Sxx/n
Sxx is in FB

standard deviation - square root of variance
see FB

Question 46

Q

what is the formula for coded data?

Answer

A

y = x-a / b

Question 47

Q

what is the formula for the mean of coded data?

Answer

A

ȳ = x̄ - a / b

Question 48

Q

what is the formula for standard deviation of coded data?

Answer

A

σy = σx / b

Question 49

Q

how does coding affect the mean & sd?

Answer

A

the code is applied directly to the mean

sd is only impacted by b

Question 50

Q

how do you draw a box plot?

Answer

A

see notes sheet
needs scale
x = outlier

Question 51

Q

how are box plots interpreted?

Answer

A

comparison of position of median

Question 52

Q

what are the formulae for an outlier?

Answer

A

outlier < Q1 - kIQR
outlier > Q3 + kIQR

mean + or - 2σ

Question 53

Q

how do you compare measures of location & spread?

Answer

A

location:
1. compare the means or medians
2. e.g. so people in set A have to travel further than set B on average

spread:
1. compare the standard deviations, variance, range or IQR
2. so there is more/less variability in data set A than data set B

Question 54

Q

define outlier

Answer

A

an extreme value that lies outside of the pattern of data
it is mathematically defined

Question 55

Q

define anomalies

Answer

A

result caused by error
it is removed from the data set (= cleaning)

Question 56

Q

what are the key aspects of a cumulative frequency graph?

Answer

A

start at frequency 0
continuous & CW doesn’t need to be equal
join w smooth curve through all points
points plotted at max. of CW

Question 57

Q

why is a CF graph better than linear interpolation when estimating quartiles & percentiles?

Answer

A

it doesn’t assume even distribution within class

Question 58

Q

what are the key aspects of a histogram?

Answer

A

area of the bar is proportional to the frequency
x: class width (may not be =), continuous variable
y: f density

Question 59

Q

what is a frequency polygon?

Answer

A

joining the middle of the top of each bar on a histogram with equal class widths

Question 60

Q

mean & sd compared
median & IQR compared
cannot mix up bc…

Answer

A

mean & sd more affected by outliers than median & IQR
mixed up are not comparable

Question 61

Q

what is the difference b/w correlation & causation?

Answer

A

correlation: pattern/trend b/w data sets

causation: one variable is directly impacted by the other variable

Question 62

Q

define bivariate data

Answer

A

data that has pairs of values for 2 variables

Question 63

Q

what relationship does correlation assume?

Answer

A

linear
always say ‘linear correlation’

Question 64

Q

what are the types of correlation?

Answer

A

+ve
-ve
strong
weak
none

Answer 65

A

least squares regression line b/w bivariate data
= straight line that minimises the sum of the squares of the distances of each point from the line
y=a+bx
gradient of line will be +ve for +ve linear correlation & -ve for -ve linear correlation
can only be used to find y from x not x from y

Answer 66

A

r (regression statistic) informs how close data is to linear regression line
-1≤ r ≤ 1
r = 0: no linear correlation
r closer to -1: stronger -ve linear correlation
r closer to +1: stronger +ve linear correlation

Answer 67

A

interpolation - extracting/predicting value from inside range of data

extrapolation - predicting value from outside the range of data = do not do bc less reliable

Answer 68

A

dependent only

Answer 69

A

use regression line of x on y = map it the other way round

Answer 70

A

repeatable process that gives rise to a number of outcomes (results)

Answer 71

A

collection of one or more outcomes

Answer 72

A

set of all possible outcomes
venn diagrams
table
tree diagram

Answer 73

A

same probability of outcome

outcomes/total # possible outcomes

Answer 74

A

sample space e.g. venn diagram, table, tree diagram

linear interpolation - for continuous data/grouped frequency table

Answer 75

A

fill from middle outwards
assign a value to central intersection - if unknown, put x

Answer 76

A

union: A or B or both
see notes

Answer 77

A

events that cannot happen at the same time

P(A n B) = 0

P(A u B) = P(A) + P(B)

Answer 78

A

the outcome of one event does not affect the outcome of the others
the probability of one event is not impacted by the probability of another event
(probability of A happening is the same whether or not B happens)

P(A n B) = P(A) x P(B)

Answer 79

A

an object is not replaced

Answer 80

A

variable whose value depends on the outcome of a random event

Answer 81

A

X - random variable
x - random outcome

Answer 82

A

probability mass function:
P(X=x) = 1/6, x = 1,2,3,4,5,6

table

diagram

Answer 83

A

every outcome has the same probability
fixed numerical values

Answer 84

A

tells you the sum of all individual probabilities up to & including x in the calculation for P(X≤x)

Answer 85

A

X ~ B(n,p)

B - 2 possible outcomes (success & failure)
n - fixed number of trials
p - fixed probability of success
outcomes/trials are independent

Answer 86

A

P(X = r) = nCr p^r (1-p)^(n-r)
n = index
p = parameter
see notes booklet

Answer 87

A

use the formula they give & sum all the probabilities to 1
then solve for k

Answer 88

A

condition of the distribution that is being tested

Answer 89

A

the actual result of doing the experiment

Answer 90

A

null: H0
the hypothesis that you assume to be correct

alternative: H1
tells you your assumption about the population parameter is wrong

Answer 91

A

one-tailed - one direction
H1: p>… or H1: p<….

two-tailed - 2 directions
H1: p≠…

Answer 92

A

boundary decided before experiment to decide whether the test fulfils H0 or H1

Answer 93

A

critical region is the region of probability which, if test statistic fall inside it, would cause you to reject the null hypothesis

critical value is the first value to be inside the critical region

Answer 94

A

probability of incorrectly rejecting H0

the actual probability of critical region
P(X≤CV) or P(X≥CV)

Answer 95

A

2 parts - half at each end of the distribution

Answer 96

A

probability of test statistic
calculate critical region & compare test statistic

Answer 97

A

see notes sheet