Statistics Year 1 Flashcards
Define population
Population- complete collection of people or items
Define sample
Sample- part of population- as ✖️ possible to gather data about every individual in population … sample used to gather info which is used to draw conclusions about population
How many types of sampling are there?
7
What is simple random sampling?
Simple random sampling- sampling method in which items in sample chosen by random process e.g. drawing names from 🎩- every member of population has equal chance of being selected
What is opportunity sampling?
Opportunity sampling- choosing individuals for sample as opportunity arises e.g. interviewing passers-by
What is systematic sampling?
Systematic sampling- select individuals from population using systematic method e.g. selecting every 10th person on list of population
What is stratified sampling?
Stratified sampling- used when population can be divided into subgroups (strata) using criteria e.g. age or gender 👨 👩 and ensures all strata represented in sample
- Sometimes requirement that numbers sampled from each stratum is proportional to sizes of the strata (proportional stratified sampling)
- Otherwise, weighting used
What is quota sampling?
Quota sampling- can also be used when population can be divided into strata- certain number of items from each stratum are required
What is cluster sampling?
Cluster sampling- used when population consists of subgroups which are each reasonably representative of population (e.g. year 6 classes in several schools)- sample taken from just a few of these subgroups
What is self selected/volunteer sampling?
Self-selected sampling- individuals choose to be part of sample e.g. survey posted on internet
Which sampling techniques are prone to bias?
1) Opportunity sampling
2) Self-selected sampling
- sample unlikely to be representative of population
What is a good thing about larger samples?
Larger samples usually ⬆️ representative of population than smaller samples
What are the types of sampling?
1) Simple random
2) Opportunity
3) Systematic
4) Stratified
5) Quota
6) Cluster
7) Self selected/volunteer
How many types of data are there?
3
What are statistical diagrams?
Statistical diagrams used to illustrate data
What are types of data?
1) Categorical
2) Discrete
3) Continuous
What are bar charts?
Bar charts 📊 show frequencies for each item of data
- height of bar equal to frequency
- unlike histograms gaps between bars- indicates discrete data
- 📊 often used for categorical data
What is a dot plot?
Dot plot- similar to bar chart📊 but uses stacks of dots to represent frequency
What is a vertical line chart?
Vertical line chart- similar to bar chart 📊 BUT uses vertical lines instead of bars
- ⬆️ appropriate than 📊 to show numerical 1️⃣2️⃣3️⃣ data
What is a histogram?
Histogram- used to illustrate grouped data
- vertical axis gives frequency density (frequency ➗ class width)
- frequency for each group proportional to area of bars
- no gaps between the bars
What is a frequency chart?
Frequency chart- similar to histogram BUT has equal width bars and its vertical axis gives frequency
What is a stem and leaf diagram?
Stem-and-leaf diagram- used for numerical data
- stem indicates groups of data
- leaves give actual data
- shows shape of distribution in same way as bar chart, dot plot or vertical line graph does BUT includes actual raw data
What is a pie chart?
Pie 🥧 chart- used for categorical data.
- frequencies of data items displayed as sectors of a circle ⭕️ with angle in each sector proportional to frequency
How many ways can data be distributed?
5
What is a box and whisker diagram?
Box-and-whisker diagram (boxplot)- summarises numerical data by showing lowest value, lowest quartile, median, upper quartile and highest value
What is a cumulative frequency curve?
Cumulative frequency curve- graph illustrating numerical data
- cumulative frequency curve useful for estimating values of median, quartiles or other percentiles
How many statistical diagrams are there?
9
What are the types of distribution of data?
1) Positively skewed
2) Negatively skewed
3) Symmetrical
4) Unimodal
5) Bimodal
What is positively skewed data?
Right-hand tail to distribution
- median closer to lower quartile than to upper quartile
What is negatively skewed data?
left-hand tail to distribution
- median closer to upper quartile than lower quartile
What is symmetrical data?
Peak of data approximately in centre and distribution looks reasonably symmetrical
- mean and the median will be close together
What is unimodal data?
A unimodal distribution has 1 peak
What is bimodal data?
A bimodal distribution has 2 distinct peaks
How many measures of central tendency are there?
3
What are the measures of central tendency?
1) Mean
2) Median
3) Mode
What is the mean?
Found by ➕ up data items and ➗ by number of data items
What is the mode?
Most frequently occurring data value
How many measures of variation are there?
4
What is the median?
Midpoint of data when placed in numerical order
How do calculate the variance?
SEE MATHS WORD DOC CALLED GRAPHS TO KNOW
What is the range?
Difference between highest and lowest values from data
How do you calculate the standard deviation?
Square root of standard deviation
SEE MATHS WORD DOC CALLED GRAPHS TO KNOW
What is the interquartile range?
Difference between upper quartile (3/4 of data when ranked numerically) and lower quartile (1/4 of data when ranked numerically)
What is the variance?
Variance- measure of spread of sample of data
What is standard deviation?
Average distance of each data item from mean
What must you remember with scatter diagrams and the type of population they represent?
Sometimes scatter diagram show data falling in 2 or ⬆️ groups
These may represent different sections of population (e.g. adults 🧑 and children 👶) … may ✖️ be appropriate to treat data as single set
What are the measures of variation?
1) Range
2) IQR
3) Variance
4) Standard deviation
What is bivariate data?
Data which involves 2 variables, e.g. height and weight
How can you illustrate bivariate data?
Illustrated on scatter diagram in which axes represent the 2 variables and each data item is plotted using coordinates
What can you infer once you have plotted bivariate data on a scatter diagram?
If bivariate data plotted on scatter diagram fall close to straight line = linear correlation (the closer the data lie to the line, the 💪 the correlation)
If line has ➕ gradient = ➕ correlation
If line has ➖ gradient = ➖ correlation
If all data lies on line = perfect linear correlation
What is important to remember about causation and correlation?
Correlation ✖️ imply causation (cause and effect ✖️ be established between 2 co-variables which show a correlation)
What is association?
Relationship between variables which is not linear
What is an outlier?
Unusually ⬆️ or ⬇️ value in set of data
How many definitions of outliers are there?
2
What is categorical data?
Categorical data- ✖️ numerical in value (e.g. colours of cars)
What is discrete data?
Discrete data- numerical data- can take only specific values e.g. shoe 👞 sizes or number of pets 🐕
What is continuous data?
Continuous data- numerical data- can take any real values in a range e.g. weights or times ⏰
How can you define outliers in a set of data?
1) Any data value which is ⬆️ than 2 standard deviations away from mean
2) Any data value which is ⬆️ than 1.5 times the IQR above the upper quartile or below the lower quartile
What is cleaning data?
Cleaning data involves dealing with missing data, errors and outliers
What is a sample space?
Set of all possible outcomes of a trial or experiment
What does P( A U B) mean?
‘probability that either event A occurs, or event B occurs (or both)’
What does P( A n B) mean?
‘probability that both event A and event B occur’
What are mutually exclusive events?
2 events are mutually exclusive if it is impossible for them to occur together
How can you confirm if 2 events are mutually exclusive?
If two events are mutually exclusive, then
P(A U B) = P(A) + P(B)
What are independent events?
2 events A and B are independent if whether or not A occurs has ✖️ effect on whether or not B occurs
How can you confirm if 2 events are independent?
If events A and B are independent, then
P(A n B) = P(A) ✖️ P(B)
What are the 4 main steps of hypothesis testing?
1) State the X (squiggly line) B (n,p)
2) State the null and alternate hypothesis … decide whether 1 or 2 tail
3) State your test statistic (e.g. number of sixes obtained in the 50 spins)
4) Note the significance level
What is the critical region?
Set of values for test statistic X for which you would reject null hypothesis
What is the critical value?
Value of X for which you change from accepting null hypothesis to rejecting it
- critical region includes critical value
What is the acceptance region?
Range of values of X for which you would accept the null hypothesis
What must you remember about a hypothesis test?
Remember that result of hypothesis test ✖️ ‘prove’ anything!
You can always have unusual results from a sample, that are not representative of the population
How does the p-value or the critical region/value determine if you accept or reject the null hypothesis?
If your p-value is ⬇️ than the significance level, (alternatively, your test statistic lies in the critical region), you reject H0 (null hypothesis) and accept alternate hypothesis
If your p-value is ⬆️ than the significance level, (alternatively, your test statistic lies outside critical region), you accept H0 (null hypothesis) and reject alternate hypothesis
What MUST you remember about the conclusion of an hypothesis test?
Always give your conclusion in terms of the original problem- CONTEXT
How do calculate the median ?? COMPLETE
A