Stats Flashcards
Definition of a population
A complete set of data where every element is included
Definition of a sample
A selection of the population
Definition of a census
Every element is surveyed. They are rare as it is expensive and labour intensive
Definition of a sampling frame
A list of sampling units (eg oktas for cloud coverage)
Random sampling methods
1) Simple random sample (eg random number generator or names out of a hat)
2) Systematic sample (eg all elements numbered. 1st element chosen random then every nth element is chosen eg pick random student 1-10 then every 10th student after this)
3) Stratified sample- proportions of groups in population represented in sample (eg if was 100 boys and 100 girls use 5 random boys and 5 random girls. if 103 boys, 107 girls use 9.8 boys (10) and 10.2 girls (10))
Non-random sampling methods
1) Opportunity sampling- whoever is present
2) Quota sampling- certain number from each group then filled first comes first served until quota filled.
Risk of non-random sampling
Risk of bias/ lack of equal representation
Qualative data
Non-numerical data eg colours, tv shows, etc
Quantative data
Numerical data eg height, number of siblings, etc
Discrete data
Type of quantative data.
Can only take certain numbers, usually whole numbers.
Few exceptions eg shoe sizes can have half sizes
Usually comes from counting
Continuous data
Type of quantative data.
Can take any decimal value in a certain range.
Usually comes from measuring
What is n/a replaced by in data
n/a is ignored
What is tr (trace) replaced by in data
0
Advantages of simple random samplings vs disadvantages
+ve:
1) Free of bias
2) Easy and cheap for small samples
3) Each sampling unit has a known and equal chance of selection
- ve:
1) Requires a sampling frame
2) Unsuitable for larger samples/ populations as time consuming, disruptive and expensive
Advantages of systematic sampling vs disadvantages
+ve:
1) Simple and quick
2) suitable for larger samples/ populations
- ve:
1) Sampling frame required
2) Can introduce bias if sampling frame is not random
Advantages of stratified sampling vs disadvantages
+ve:
1) Sample actively refelcts population structure
2) Guarantees representation of all groups within a sample/population
- ve:
1) Population must be clearly classifies into distinct groups
2) Selcection within each stratum can be time consuming/ expensive and requires a sampling frame
Advantages of quota sampling vs disadvantages
+ve:
1) Allows even small samples to be representative of population
2) No sampling frame required
3) Quick, easy and inexpensive
4) Allows for easy comparison between groups in a population
- ve:
1) Non-random so can introduce bias
2) Population must have seperate distincty groups
3) Can be costly or inaccurate
4) Non-responses are not recorded as such
5) Increasing scope of stud/ number of groups adds time and expense
Advantages of opportunity sampling vs disadvantages
+ve:
1) Easy to carry out
2) Inexpensive
- ve:
1) Unlikely to provide representaive results
2) Highly dependent on individual researcher
variance=
standard deviation^2
variance=
{fx^2/n - mean^2
How is standard deviation affected by coding
Affected by x and ÷, however not affected by + or -
How is mean affected by coding
Affected by both x and ÷, as well as + and -
Units for daily max temp
Degrees Celsius
Units for daily total rainfall
millimetres (mm). If the total amount of rainfall collected is less than 0.05 mm, it is referred to as a trace of rain
Units for daily total sunshine
given in hours and to one decimal place
Units for Daily Maximum Relative Humidity
Values for this are recorded as percentages (%). Relative humidities above 95% are associated with mist
and fog
Units for daily mean winspeed and daily max gust speed
The daily mean windspeed is given in knots. 1 knot is 1.15 mph. The windspeeds are also categorised according to the Beaufort scale
Units for Daily Mean Wind Direction and daily max gust direction
The value is given in degrees relative to the true north
Units for cloud cover
It is measured in eighths. The technical unit used in this case is called oktas.
0 oktas indicates a completely clear sky, while 8 oktas indicates complete overcast.
Units for visibility
Metres (greatest horizontal distance at which an object can be seen)
Units for pressure
hectopascals (hPa).
Warmest/ highest temp places in large data set
Jacksonville (24.8)
Beijing (22.6)
As are in northern hemisphere so closest to equator
Not perth as is in Southern hemisphere
Warmest locations in UK from large data set
Base on mean temperature: Heathrow (15.6), Hurn (14.1), Cambourne (13.6),
Coldest location
Perth (15.2) outside UK
Leuchars (12.2) including UK
Most rainfall
Based on mean rainfall:
Cambourne, UK - 2.8mm
Jacksonville, World wide -
Dryest places
Heathrow, UK - 1.8mm
Heathrow, World wide - 1.8mm (If just B, J & P then Beijing at 2.1mm)
Interpolation eqn
Lower bound + (how far through group/group frequency) x class width
Histograms freqeuncy eqn
Frequency= class width x frequency denisty Area of each bar represents the frequency of the group
When does a product moment correlation coefficient suggest there is a linear relationship between 2 variables
When is close to 1 (+ve linear) or close to -1 (-ve linear)
P(AnB) for mutually exclusive events=
p(A) + p(B)
Rules to check if 2 events are statistically independent (2 rules)
1) P(AnB) = P(A) x P(B)
2) P(AlB) = P(a)
n vs u meaning in probability
P(XnY)= prob of x and y (intersection of x and y) p(XuY)= prob of x or y
How to find lower quartile for discrete data
n/4th data
If give a 0.5 number then round up to nearest whole observation
Discrete uniform distribution meaning
When probability of each potential outcome is equal
P(X=r) for x-b(n,p)
NCr x p^r x (1-p)^(n-r)
P(A’)=
1- P(A)
If mutual exclusive P(AuB)=
P(A) + P(B)
P(AlB)=
P(AnB)/ P(B)
Addition rule of P(AuB)
P(AuB)= P(A) + P(B) - P(AnB)
Addition rule of P(AnB)
P(AnB) = P(A) + P(B) - P(AuB)
Normal distribution eqn
X-N(mean, variance)
Standard normal substitution for Z (used to find unknow values for mean/ standard distribution)
Z= (X- mean) / standard deviation
Whicj=h may must inequality face for inverse normal distribution
< (same as binomial)
Median of a normal distribution
for a normal distribution, mean = median = mode.
What is the sign of the area for inverse normal on calc function
Uses prob less than observed value
How many decimal places to use for Z values in normal dist question as this is how given in table of z values
give to 4dp
How to tell if data has a +ve or -ve skew
+ve skew= mean> median
- ve skew: meanQ2-Q1
- ve skew Q2-Q1>Q3-Q2
Formula for coefficient of skewness
Coefficient of skewness= 3(mean-median) / standard deviation
What skew is required for a normal distribution
No skew/ slight skew/ almost symetrical distribution
Sxx=
{x^2 - (({x)^2 / n ) = variance x n = in formula book
Outlier eqns
Any value GREATER than: Q3 + k(Q3-Q1)
Any vlaue LOWER than: Q1 - k(Q3-Q1)
If K not given use k=1.5
In question were says use log
use log_10 (use log button on calc)
Q3=
3n/4
Q1=
n/4
y~ = y bar above =
mean of y
Sx =
different version of st dev
Sxx=
Variance /n
standar dev =
square root of (Sxx/n)
If y= ax^n for constants a and n then
y= log(a) + nlog(x)
If y= kb^x for constants k and b then
log (y) = log (x) + xlog(b)
Reasons to use a histogram
1) Continuous data
2) Unequal class widths