Statistics Year 1 Flashcards
population
the whole set of items that are of interest
census
+ It should give a completely accurate result * Time consuming and expensive
- Cannot be used when the testing
- process destroys the item
- Hard to process large quantity of data
sample
+ Less time consuming and expensive than
a census
+ Fewer people have to respond
+ Less data to process than in a census
- The data may not be as accurate
- The sample may not be large enough
to give information about small subgroups of the population
population
census
sampling form
samling units
average heights
population : eveyone thats ever walked into the school
census : must find eveyone whos walked into the school- impossible!
sampling frame : practical list from which you can pick people to survey, list of all students/ teachers narrowed down focus, the best we can get of the population
sampling units : individual voters
bias
sample doesn’t represent population fairly
random sampling
all sampling units has an equal chance of being picked
+ Free of bias
+ Easy and cheap to implement for small
populations and small samples
+Each sampling unit has a known and equal
chance of selection
- Not suitable when the population size or the
sample size is large - A sampling frame is needed
systematic sampling
the required elements are chosen at regular intervals from an
ordered list
+ Simple and quick to use
+ Suitable for large samples and large
populations
- people who are picked may not want to take part in survey
-A sampling frame is needed - It can introduce bias if the sampling frame is
not random
stratified
the population is divided into mutually exclusive strata (males and
females, for example) and a random sample is taken from each.
+Sample accurately reflects the population
structure
+ Guarantees proportional representation of
groups within a population
-Population must be clearly classified into
distinct strata
- Selection within each stratum suffers from
the same disadvantages as simple random
sampling
quota sampling
an interviewer or researcher selects a sample that reflects the
characteristics of the whole population.
+Allows a small sample to still be
representative of the population
+No sampling frame required
+Quick, easy and inexpensive
+Allows for easy comparison between different
groups within a population
- Non-random sampling can introduce bias
- Population must be divided into groups,
which can be costly or inaccurate - Increasing scope of study increases number
of groups, which adds time and expense - Non-responses are not recorded as such
difference with quota and straified
q: you meet the people and select them, no sampling frame involved, allocate the people in the appropriate quota
s: if you want 5 tall people, you randomly pick from a list of 5 tall people
opportunity sampling
+Easy to carry out
+ Inexpensive
- Unlikely to provide a representative sample
- Highly dependent on individual researcher
continuous variable
■ A variable that can take any value in a given range is a continuous variable.
For example, time can take any value, e.g. 2 seconds, 2.1 seconds, 2.01 seconds etc
e.g. foot size
discrete
A variable that can take only specific values in a given range is a discrete variable.
For example, the number of girls in a family is a discrete variable as you can’t have 2.65 girls in a family
e.g. soe size, goes up in 1/2 s
mode/ modal class
■ The mode or modal class is the value or class that occurs most often.
median
■ The median is the middle value when the data values are put in order.
For data given in a frequency table, the
mean can be calculated using the formula
x bar = ∑ x f / ∑ f
mean calculated
x̄ = ∑ x/ n
how to find the quartiles
Q1 = 1/4 x n
Q2 = 1/2 x n
Q3 = 3/4 x n
the xth value found is where the upper/ lower quaritle lies
interpolation
if modal class is 34-36
numerline = 33.5………………….36.5
interpolation steps
- find the mean/ value your interested in
- find the modal class its in
- draw a numberline, round first down, second up
- on the bottom write the total at the start and end of the modal class
- find fraction of where the mean is on number line
- times fraction by the difference of the modal class valuses numberline
interpolation data is
evenly spaced out
spread
dispersion
variance and standard deviation
takes account of all pieces of data
Variance
Σ (x- x̄)² - the mean square distance from the mean
cant do x - x̄ as some values will be negative
easier formula to calculate:
Σx² / n - ( Σx/n)²
lower case sigma
σ
difference between sd and v
σ- standard deviation, measure of spread
σ² - variant,
variance and standard deviation frequency
σ² = Σfx² / fΣ - ( Σfx/fΣ )²
σ = square root of Σfx² / fΣ - ( Σfx/fΣ )²
coding
makes numbers easier to deal with
coding measure of spread
ignore adding/ taking away when converting back as this doesn’t change the measure of spread
outliers
greater than Q3 + k(Q3 − Q1)
less than Q1 − k(Q3 − Q1) (IQR)
standard for k = 1.5
Anomalies
Anomalies can be the result of experimental or recording
error, or could be data values which are not relevant to
the investigation.
we dont want to include anonalies as they arent representation for population
box plots and outliers
bottom line in box plot goes on the data point right after the outlier
Area of histogram
proportional to the frequency
f.d.
dividing the frequency by the class width
frequency polygon
middle of bars
regression line to work out y value
you cant predict, as the regression is of y on x not regression of x on y
independant variable
x axis
dependant variable
y axis
can you use graph to predict poits outside graph
you shouldnt extrapolate outside of the range
interpolation
using values in range to predict a value in the set of data
an experiment is
a repeatable process that gives rise to a number of outcomes
an event
An event is a collection of one or more outcomes.
sample space
A sample space is the set of all possible outcomes.
The event A and B
intersection
AnB
The event A or B
AUB
A not B
B’
ven diagrams filling values
always work out centre first, if unknown make it x
Mutually exclusive events
P(AUB) = P(A) + P(B)
independant events
(AnB) = P(A) x P(B)
probability questions
which diagram to draw?
- venn
- tree
- simple sample space diagram
try one, if it doesn’t work try the other
random variable
the ‘thing’ whose value is the outcome of the experiment
- all the outcomes of the experiment
- the result of the experiment
P(X=4)
whats the probability of random variable x being 4
X~B(n,p)
n= number of trials
p= number of success for each trial
■ You can model X with a binomial distribution, B(n, p), if:
● there are a fixed number of trials, n
● there are two possible outcomes (success and failure)
● there is a fixed probability of success, p
● the trials are independent of each other
can’t look up in tables:
strictly <, has to be less than or equal to
Null hypothesis
H0, is the hypothesis that you
assume to be correct.
Alternative hypothesis
H1, tells you about the
parameter if your assumption is shown to be wrong.
Steps to answering a hypothesis testing question
- state what propality we are looking at
e.g. we are looking at whether the probablity of landing heads is less than 0.5 - let X be the number of times the coin lands on heads
X~B (10,0.5) - Test statistic is X=O
- let P be the probability of getting heads
- H0 = P =1/2
H1= P>1/2 - set significance level at 5%
P(X=O) = 0.001
0.0010< 0.05
- we can reject H0 as there is sufficient evidence at the 5% level to suggest that the coin is bias against heads
reject H 0 if
its smaller than 0.05
outside the critical region