Stats Flashcards
median is also known as
Q2, second quartile
how to calculate the median, and rules for calculating the median
take n and divide it by 2
- if n is a decimal, round up
- if n is a whole number, take an average
what does n represent
the sample size
how to calculate the mean from raw data
sum of x / n
how to calculate the mean from a frequency table
sum of fx / sum of f
median
lower quartile
upper quartile equations
- n/2
- n/4
-3n/4
how to find the xth percentile
Px = x * n / 100
what is dicrete data
what is continuous data
data that takes a finite number of values
continuous takes an infinite number of values
examples of continous data
- grouped data
- time
rules for discrete data
if you get a decimal number, round up
if you get a whole number, take the average of that number at that position and the number after it
rules for continuous data
- check for gaps in the data
- take exact values
- use linear interpelation
what to do if your data has gaps
rewrite the frequency classes in boundary form
eg 300 - 349 would be 299.5 - 349.5
INTERQUARTILE RANGE
Q3 - Q1
How to find the distance between n% to m% (interpercentile range)
Pm - Pn
what is the variance and standard deviation of a statistical data
measure the average spread of the data values from its mean value
- variance is a square measure
- standard deviation is the square root of the variance
what is random sampling
when each sampling unit in our sampling frame has an equal chance of being chosen in order to avoid bias
what is simple random sampling, and what are its advantages and disadvantages
- every sampling unit in sampling frame has equal chance of being selected
- each item in sampling frame has an identifying number
- use random number generator
adv:
- bias free
- easy/cheap to implement
- fair
disadv:
- not suitable when population size is large
what is systematic sampling
what are its advantages and disadvantages
- when required elements are chosen at regular intervals in an ordered list
adv:
- simple/quick to use
- suitable for larger samples
disadv:
u need to have a list
- can introduce bias if sampling frame not random
how to carry out systematic sampling
- do pop size/sample size
- pick a number from 1 to N
- eg pick 17 out of 50
- select every 50th person. eg 17,67,117
what is stratified sampling
hows it carried out
advantages and disadvantagess
- population divided into groups and a simple random sample is carried out in group
- used when sample is large and population naturally divides into groups
- sample size / pop size from each strata
adv:
- reflects population structure
- guarantees proportional representation of groups within population
disadv:
-population must be clearly classified into distinct strata
- selection within each strata suffers from the same disadvantages as simple random sampling
why would random sampling be problematic
might not know the sampling frame
non random sampling - quota sampling
adv and disadv
- divide population into groups according to characteristic of interest
- choose people within each group via suitable means until the quota for each geoup is filled
adv - allows small sample to still be representative of population
- no sampling frame required
- quick, easy, inexpensive
- allows for easy comparison between different groups in population
disadv - non random sampling can introduce bias
-population must be divided into groups, can be costly or inaccurate
- increasing scope of study increases number of groups, adding time/expense
- non responses are not recorded
non random sampling - opportunity/convenience sampling
- sample taken from people who are available at the time of study
- interviewer selects the actual sampling units according to the set criteria
adv
- easy and inexpensive to carry out
disadv
- unlikely to provide a representative sample
- highly dependent researcher
types of data
- qualitative/categorical - non numerical values
- quantitative- numerical data
- discrete - can only take specific values eg shoe size - part of quantitative
continuous - can take any decimal value - part of quantitate
locations of the 5 uk weather stations
3 international weather stations
UK
- Heathrow
- Camborne - on the coast
- Hurn - on the coast
- Leeminh
- Leuchars - on the coast
INTERNATIONAL
- Beijing - northern
- Jacksonville - hot /tropical florida lots of hurricanes and tornadoes - northern
- Perth - southern hemisphere
difference between northern and southern hemisphere
when northern is in winter southern is in summer and vice versa
rules for events which are independent
Two events
𝐴 and 𝐵 are independent if the occurrence of one event does not affect the probability of the other. This means:
AnB = P(A) . P(B)
- The probability of both events occurring is the product of their individual probabilities.
- Independent events can overlap—they are not necessarily disjoint
rules for mutually exclusive events
- Two events
𝐴 and 𝐵 are mutually exclusive if they cannot both occur at the same time. This means:
AnB = 0
and
P(AUB) = P(A) + P(B)
why might you include an outlier in the readings
and
why might you not include it
- because it is still part of the distribution data
- it might be unlikely to be an anomaly
- but it is an outlier so may not be representative of the typical data
- it might be a mistake
in y = a + bx, what do the coefficients represent
- the coefficient b tells you the change in y for each unit change in x
what is interpolation (scatter diagram) and what is extrapolation
- interpolation is making a prediction based on a value inside the range
- extrapolation is making a prediction based on a value outside the range, and gives a less reliable estimate
how predictions are made with scatter diagrams
- only make predictions for the dependent variable
- the independent variable is used to make the prediction
how else can u work standard deviation
sxx / n