Statistics Year 1 Flashcards

1
Q

Define population

A

Population- complete collection of people or items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define sample

A

Sample- part of population- as ✖️ possible to gather data about every individual in population … sample used to gather info which is used to draw conclusions about population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many types of sampling are there?

A

7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is simple random sampling?

A

Simple random sampling- sampling method in which items in sample chosen by random process e.g. drawing names from 🎩- every member of population has equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is opportunity sampling?

A

Opportunity sampling- choosing individuals for sample as opportunity arises e.g. interviewing passers-by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is systematic sampling?

A

Systematic sampling- select individuals from population using systematic method e.g. selecting every 10th person on list of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is stratified sampling?

A

Stratified sampling- used when population can be divided into subgroups (strata) using criteria e.g. age or gender 👨 👩 and ensures all strata represented in sample

  • Sometimes requirement that numbers sampled from each stratum is proportional to sizes of the strata (proportional stratified sampling)
  • Otherwise, weighting used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is quota sampling?

A

Quota sampling- can also be used when population can be divided into strata- certain number of items from each stratum are required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is cluster sampling?

A

Cluster sampling- used when population consists of subgroups which are each reasonably representative of population (e.g. year 6 classes in several schools)- sample taken from just a few of these subgroups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is self selected/volunteer sampling?

A

Self-selected sampling- individuals choose to be part of sample e.g. survey posted on internet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which sampling techniques are prone to bias?

A

1) Opportunity sampling
2) Self-selected sampling
- sample unlikely to be representative of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a good thing about larger samples?

A

Larger samples usually ⬆️ representative of population than smaller samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the types of sampling?

A

1) Simple random
2) Opportunity
3) Systematic
4) Stratified
5) Quota
6) Cluster
7) Self selected/volunteer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many types of data are there?

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are statistical diagrams?

A

Statistical diagrams used to illustrate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are types of data?

A

1) Categorical
2) Discrete
3) Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are bar charts?

A

Bar charts 📊 show frequencies for each item of data

  • height of bar equal to frequency
  • unlike histograms gaps between bars- indicates discrete data
  • 📊 often used for categorical data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a dot plot?

A

Dot plot- similar to bar chart📊 but uses stacks of dots to represent frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a vertical line chart?

A

Vertical line chart- similar to bar chart 📊 BUT uses vertical lines instead of bars
- ⬆️ appropriate than 📊 to show numerical 1️⃣2️⃣3️⃣ data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a histogram?

A

Histogram- used to illustrate grouped data

  • vertical axis gives frequency density (frequency ➗ class width)
  • frequency for each group proportional to area of bars
  • no gaps between the bars
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a frequency chart?

A

Frequency chart- similar to histogram BUT has equal width bars and its vertical axis gives frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a stem and leaf diagram?

A

Stem-and-leaf diagram- used for numerical data

  • stem indicates groups of data
  • leaves give actual data
  • shows shape of distribution in same way as bar chart, dot plot or vertical line graph does BUT includes actual raw data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a pie chart?

A

Pie 🥧 chart- used for categorical data.

- frequencies of data items displayed as sectors of a circle ⭕️ with angle in each sector proportional to frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How many ways can data be distributed?

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
What is a box and whisker diagram?
Box-and-whisker diagram (boxplot)- summarises numerical data by showing lowest value, lowest quartile, median, upper quartile and highest value
29
What is a cumulative frequency curve?
Cumulative frequency curve- graph illustrating numerical data - cumulative frequency curve useful for estimating values of median, quartiles or other percentiles
30
How many statistical diagrams are there?
9
31
What are the types of distribution of data?
1) Positively skewed 2) Negatively skewed 3) Symmetrical 4) Unimodal 5) Bimodal
32
What is positively skewed data?
Right-hand tail to distribution | - median closer to lower quartile than to upper quartile
33
What is negatively skewed data?
left-hand tail to distribution | - median closer to upper quartile than lower quartile
34
What is symmetrical data?
Peak of data approximately in centre and distribution looks reasonably symmetrical - mean and the median will be close together
35
What is unimodal data?
A unimodal distribution has 1 peak
36
What is bimodal data?
A bimodal distribution has 2 distinct peaks
37
How many measures of central tendency are there?
3
39
What are the measures of central tendency?
1) Mean 2) Median 3) Mode
40
What is the mean?
Found by ➕ up data items and ➗ by number of data items
41
What is the mode?
Most frequently occurring data value
42
How many measures of variation are there?
4
43
What is the median?
Midpoint of data when placed in numerical order
44
How do calculate the variance?
SEE MATHS WORD DOC CALLED GRAPHS TO KNOW
45
What is the range?
Difference between highest and lowest values from data
46
How do you calculate the standard deviation?
Square root of standard deviation | SEE MATHS WORD DOC CALLED GRAPHS TO KNOW
47
What is the interquartile range?
Difference between upper quartile (3/4 of data when ranked numerically) and lower quartile (1/4 of data when ranked numerically)
48
What is the variance?
Variance- measure of spread of sample of data
50
What is standard deviation?
Average distance of each data item from mean
51
What must you remember with scatter diagrams and the type of population they represent?
Sometimes scatter diagram show data falling in 2 or ⬆️ groups These may represent different sections of population (e.g. adults 🧑 and children 👶) ... may ✖️ be appropriate to treat data as single set
52
What are the measures of variation?
1) Range 2) IQR 3) Variance 4) Standard deviation
53
What is bivariate data?
Data which involves 2 variables, e.g. height and weight
54
How can you illustrate bivariate data?
Illustrated on scatter diagram in which axes represent the 2 variables and each data item is plotted using coordinates
55
What can you infer once you have plotted bivariate data on a scatter diagram?
If bivariate data plotted on scatter diagram fall close to straight line = linear correlation (the closer the data lie to the line, the 💪 the correlation) If line has ➕ gradient = ➕ correlation If line has ➖ gradient = ➖ correlation If all data lies on line = perfect linear correlation
56
What is important to remember about causation and correlation?
Correlation ✖️ imply causation (cause and effect ✖️ be established between 2 co-variables which show a correlation)
58
What is association?
Relationship between variables which is not linear
59
What is an outlier?
Unusually ⬆️ or ⬇️ value in set of data
60
How many definitions of outliers are there?
2
61
What is categorical data?
Categorical data- ✖️ numerical in value (e.g. colours of cars)
62
What is discrete data?
Discrete data- numerical data- can take only specific values e.g. shoe 👞 sizes or number of pets 🐕
63
What is continuous data?
Continuous data- numerical data- can take any real values in a range e.g. weights or times ⏰
65
How can you define outliers in a set of data?
1) Any data value which is ⬆️ than 2 standard deviations away from mean 2) Any data value which is ⬆️ than 1.5 times the IQR above the upper quartile or below the lower quartile
66
What is cleaning data?
Cleaning data involves dealing with missing data, errors and outliers
67
What is a sample space?
Set of all possible outcomes of a trial or experiment
68
What does P( A U B) mean?
‘probability that either event A occurs, or event B occurs (or both)’
69
What does P( A n B) mean?
‘probability that both event A and event B occur’
70
What are mutually exclusive events?
2 events are mutually exclusive if it is impossible for them to occur together
71
How can you confirm if 2 events are mutually exclusive?
If two events are mutually exclusive, then | P(A U B) = P(A) + P(B)
72
What are independent events?
2 events A and B are independent if whether or not A occurs has ✖️ effect on whether or not B occurs
73
How can you confirm if 2 events are independent?
If events A and B are independent, then | P(A n B) = P(A) ✖️ P(B)
74
What are the 4 main steps of hypothesis testing?
1) State the X (squiggly line) B (n,p) 2) State the null and alternate hypothesis ... decide whether 1 or 2 tail 3) State your test statistic (e.g. number of sixes obtained in the 50 spins) 4) Note the significance level
75
What is the critical region?
Set of values for test statistic X for which you would reject null hypothesis
76
What is the critical value?
Value of X for which you change from accepting null hypothesis to rejecting it - critical region includes critical value
77
What is the acceptance region?
Range of values of X for which you would accept the null hypothesis
78
What must you remember about a hypothesis test?
Remember that result of hypothesis test ✖️ ‘prove’ anything! You can always have unusual results from a sample, that are not representative of the population
79
How does the p-value or the critical region/value determine if you accept or reject the null hypothesis?
If your p-value is ⬇️ than the significance level, (alternatively, your test statistic lies in the critical region), you reject H0 (null hypothesis) and accept alternate hypothesis If your p-value is ⬆️ than the significance level, (alternatively, your test statistic lies outside critical region), you accept H0 (null hypothesis) and reject alternate hypothesis
80
What MUST you remember about the conclusion of an hypothesis test?
Always give your conclusion in terms of the original problem- CONTEXT
81
How do calculate the median ?? COMPLETE
A