AS Statistics Flashcards
Population
The whole set of items that are of interest
Census
Observes or measures every member of a population
Sample
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
Sampling frame
List of sampling units, with each unit given an identifying name or number
Advantages/disadvantages of a census
Advantages:
Completely accurate result
Disadvantages:
Time consuming, expensive, cannot be used when testing process destroys the item, hard to process large quantity of data
Sample advantages/ disadvantages
Advantages:
Less time consuming/expensive, fewer people have to respond, less data to process than in a census
Disadvantages:
Data may not be as accurate, sample may not be large enough to give data about small subgroups of the population
Sampling units
Individual units of a population
Simple random sample
Where every sample of size n has an equal chance of being selected
Systematic sampling
Required elements are chosen at regular intervals from an ordered list
Stratified sampling
Population is divided into mutually exclusive strata and a random sample is taken from each
Simple random sampling advantages/disadvantages
Advantages: free of bias, easy and cheap to implement for small samples/populations, each sampling unit has a known and equal chance of selection
Disadvantages: not suitable for large populations (time consuming, disruptive, expensive), a sampling frame is needed
Systematic sampling advantages/disadvantages
Advantages: simple and quick to use, suitable for large samples/populations
Disadvantages: a sampling frame is needed, can introduce bias if the sampling frame is not random
Stratified sampling advantages/disadvantages
Advantages: sample accurately reflects population structure, guarantees proportional representation of groups within a population.
Disadvantages: population must be classified into distinct strata, selection within each stratum suffers from same disadvantages as simple random sampling
Quota sampling
an interviewer or researcher selects a sample that reflects the characteristics of the whole population
Opportunity sampling
Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for
Quota sampling advantages/disadvantages
Advantages: allows a small sample to still be representative of the whole population, no sampling frame required, quick, easy and inexpensive, allows for easy comparison between different groups within a population
Disadvantages: non-random sampling can introduce bias, population must be divided into groups which can be costly and inaccurate, increasing scope of study increases number of groups (adding time and expense), non-responses are not recorded as such
Opportunity sampling advantages/disadvantages
Advantages: easy to carry out, inexpensive
Disadvantages: unlikely to provide a representative sample, highly dependent on individual researcher
Quantitative data/variables
variables or data associated with numerical observations
Qualitative data/variables
variables or data associated with non-numerical observations
continuous variable
a variable that can take any value in a given range
discrete variable
a variable that can take only specific values in a given range
grouped frequency table (gft)
the specific data values are not shown but are included in groups (or classes)
mid-point (gft)
average of class boundaries
Daily mean temperature units
degrees Celsius
Daily total rainfall units
mm
Amounts less than 0.05 mm are recorded as ‘tr’ or ‘trace’
Daily total sunshine
recorded to the nearest tenth of an hour
Daily mean windspeed/daily maximum gust units
knots
Daily maximum relative humidity
Given as a percentage of air saturation with water vapour.
Above 95% gives rise to misty and foggy conditions
Daily mean cloud cover units
‘okras’ (eighths of the sky covered by cloud)
Daily mean visibility units
decametres (Dm)
Daily mean pressure units
hectopascals (hPa)
measure of location
a single value which describes a position in a data set
Measure of central tendency
a single value which describes the centre of the data
Mode/modal class
the value or class that occurs most often
Median
the middle value when the data values are put in order
Mean formula
sum of the values/number of values
Lower quartile
one-quarter of the way through the data set
Upper quartile
three-quarters of the way through the data set
Range
difference between the largest and smallest values in the data set
Interquartile range (IQR)
the difference between the upper and lower quartiles
Interpercentile range
the difference between the values for two given percentiles
Standard deviation
square root of the variance
Coding
a way of simplifying statistical calculations
Outlier
an extreme value that lies outside the overall pattern of the data
Outlier common definition
Greater than Q3 + k(IQR)
Or less than Q1 - k(IQR)
cleaning the data
the process of removing anomalies from the data
anomalies
when an outlier should be removed from the data because it is clearly an error and misleading.
frequency polygon
When the middle of the top of each bar in a histogram is is joined with a straight line
frequency density equation
frequency/class width
Bivariate data
data which has pairs of values for two variables
Independent/explanatory variable
the variable controlled by the researcher (x-axis)
Dependent/response variable
the variable measured by the researcher (y-axis)
correlation
describes the nature of the liner relationship between two variables
causal relationship
when a change in one variable causes a change in the other
(Correlation does not mean causation!)
You need to use context of question and common sense to determine this
experiment
a repeatable process that gives rise to a number of outcomes
event
a collection of one or more outcomes
sample space
the set of all possible outcomes
mutually exclusive
when events have no outcomes in common
Addition rule (probability)
For mutually exclusive events:
P(A or B) = P(A) + P(B)
Independent
when one event has no effect on another
Multiplication rule (probability)
P(A and B) = P(A) x P(B)
tree diagram
can be used to show the outcomes of two or more events happening in succession
random variable
a variable whose value depends on the outcome of a random event
Sample space
the range of values that a random variable can take
probability distribution
fully describes the probability of any outcome in the sample space
discrete uniform distribution
when all of the probabilities are the same
Sum of the probabilities of all outcomes of an event add up to 1
ΣP(X=x) = 1
test statistic
the result fo the experiment or the statistic that is calculated
null hypothesis, H0
the one you assume to be correct
alternative hypothesis, H1
tells you about the parameter if your assumption is wrong
critical region
region of the probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis
critical value
first value to fall inside the critical region
actual significance level
probability of incorrectly rejecting the null hypothesis