Stats Flashcards
How does Quota sampling work
Name one advantage and one disadvantage
Take a certain number from each category according to the size of each group in the population
Ad: all categories are represented
Disad: Not random so can lead to bias
How does stratified sampling work
One advantage
One disadvantage
Sample data from each strata that is proportional to the population sizes
Adv : sample accurately reflects the population , selection is random
Disad : time consuming , depends on sampling frame available
how does systematic sampling work
one advantage
one disadvantage
number every piece of data in the population then use random number generator to take a starting point, select every nth price of data
ad: random so less likely to lead to bias
disadv : need sampling frame
how does opportunity sampling work
one advantage
one disadvantage
pick the data as it becomes available
adv: easy and quick (cheap)
disadv : not random , can lead to bias
how does simple sampling work
one advantage
one disadvantage
number every piece of data in the population , use number generator to pick he numbers in the sample and keep going until you have your sample
Adv : random and less likely to be biased, each piece of data has an equal chance of being picked
disadv : requires a sampling frame
if you have outliers which value of the average and which measure of the spread would be best to use
the median as the mean is distorted by extreme values
interquartile range as this is not affected by outliers - represents the middle 50%
which value of the average and measure of the spread is most accurate and why
the mean and standard deviation as both measures include every value
is the explanatory variable the x or y values
x values
is the response variable the x or y values
y values
if a line of regression y = 17.0 + 14x represents the relationship between the percentage (x%) of cocoa solids and the price (y pence ) of different chocolate , interpret the value 15.4
for every 1% more cocoa that the chocolate contains, the price can be increased by 15.4 pence
if the relationship between the variables is p on n, is the linear regression line
p = an +b
or
n = ap + b
p = an + b
where data is coded what is the mean affected by
addition / subtraction
multiplication / division
continuous data ….
can take every value
where data is coded what is the standard deviation affect by
multiplication / division
discrete data is ….
data that can only take specific values e.g shoe size
in large data set which UK locations are on the coast (windy )
north to south
Leuchars, Hurn, Camborne
in the large data set which worldwide locations are on the coast
Jacksonville and Perth
in the large data set what is the daily maximum gust measures in and give a definition of what this means
Knots
1 knot = 1.15 mph
in the large data set what the only 3 categories of data where the data is continuous
daily mean rainfall
daily hours of sunshine
daily max temperature
helpful histogram formula
area = k x frequency
what is the definition in words of the standard deviation
the average distance every value is away from the mean
for a hypothesis test that uses binomial distribution what are the null and alternative hypothesis if you are testing a two tailed test
H0 : p = p
H1 : not equal
for a hypothesis test that is testing positive correlation what are the null and alternative hypotheses
H0 : row / p =
H1 : row/p > 0
for a hypothesis test that is testing to see if the mean has decreased, what are the null and alternative hypotheses
H0 : u =
H1 : u<
if you are using the normal approximation to a binomial distribution.
x-b (10,0.2) and find P(X>3), what is the probability for the normal distribution
contribunitety correction P (x>3) includes the values 4,5,6,…..10
P(Y>3.5)
hypothesis test for a mean using the ND, and the ND is K-N (120,5) if a sample is taken of 25 values, what ND do you use for the test and new value for the standard deviation
mean stays the same
SD 5 / 25
ACTUAL significance level of a hypothesis test , what are they asking for
the actual probability of the critical region , E.g. if the significance level is 0.05 but the probability of the critical region if P(X=5) = 0.0223 is the actual significance level
when doing a two tailed hypothesis test what do you need to Ensure you always do first
divide the significance by 2
what type of question means you have to use the standard Normal distribution Z-N(0,1^2)
when you have a missing mean of SD or both
when trying to find the mean or SD using Z-N(0,1^2), what is the key formula
z= X - u / SD
what is the general formula for finding out probabilities with the binomial distribution using SIGMA
upper lower and using the binomial C
if you using the linear regression line to make a prediction what, what the two things you need to watch for
are there values close by ( extrapolation or interpolation) if extrapolation not reliable
if you are using the independent variable (x) to predict the dependent variable (y)
with vectors what is the formula that will give you the position vector of the final position, where the object does not start at the origin
r1 = r0 +s
if string is inextensible what are you able to assume
acceleration is equal
the object is moving is moving in the direction of 5i - 2j
velocity = K(5i - 2j)
if the string is LIGHT what are you able to assume
tension is equal
if an object is positioned south east what does this mean
r = K( i - j )
other than resistance what may affect an object travelling in air due to gravity
spin of the object, dimensions of the object, wind affects