Statistics Year 1 Flashcards

1
Q

Define population

A

Population- complete collection of people or items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define sample

A

Sample- part of population- as ✖️ possible to gather data about every individual in population … sample used to gather info which is used to draw conclusions about population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many types of sampling are there?

A

7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is simple random sampling?

A

Simple random sampling- sampling method in which items in sample chosen by random process e.g. drawing names from 🎩- every member of population has equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is opportunity sampling?

A

Opportunity sampling- choosing individuals for sample as opportunity arises e.g. interviewing passers-by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is systematic sampling?

A

Systematic sampling- select individuals from population using systematic method e.g. selecting every 10th person on list of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is stratified sampling?

A

Stratified sampling- used when population can be divided into subgroups (strata) using criteria e.g. age or gender 👨 👩 and ensures all strata represented in sample

  • Sometimes requirement that numbers sampled from each stratum is proportional to sizes of the strata (proportional stratified sampling)
  • Otherwise, weighting used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is quota sampling?

A

Quota sampling- can also be used when population can be divided into strata- certain number of items from each stratum are required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is cluster sampling?

A

Cluster sampling- used when population consists of subgroups which are each reasonably representative of population (e.g. year 6 classes in several schools)- sample taken from just a few of these subgroups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is self selected/volunteer sampling?

A

Self-selected sampling- individuals choose to be part of sample e.g. survey posted on internet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which sampling techniques are prone to bias?

A

1) Opportunity sampling
2) Self-selected sampling
- sample unlikely to be representative of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a good thing about larger samples?

A

Larger samples usually ⬆️ representative of population than smaller samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the types of sampling?

A

1) Simple random
2) Opportunity
3) Systematic
4) Stratified
5) Quota
6) Cluster
7) Self selected/volunteer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many types of data are there?

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are statistical diagrams?

A

Statistical diagrams used to illustrate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are types of data?

A

1) Categorical
2) Discrete
3) Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are bar charts?

A

Bar charts 📊 show frequencies for each item of data

  • height of bar equal to frequency
  • unlike histograms gaps between bars- indicates discrete data
  • 📊 often used for categorical data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a dot plot?

A

Dot plot- similar to bar chart📊 but uses stacks of dots to represent frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a vertical line chart?

A

Vertical line chart- similar to bar chart 📊 BUT uses vertical lines instead of bars
- ⬆️ appropriate than 📊 to show numerical 1️⃣2️⃣3️⃣ data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a histogram?

A

Histogram- used to illustrate grouped data

  • vertical axis gives frequency density (frequency ➗ class width)
  • frequency for each group proportional to area of bars
  • no gaps between the bars
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a frequency chart?

A

Frequency chart- similar to histogram BUT has equal width bars and its vertical axis gives frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a stem and leaf diagram?

A

Stem-and-leaf diagram- used for numerical data

  • stem indicates groups of data
  • leaves give actual data
  • shows shape of distribution in same way as bar chart, dot plot or vertical line graph does BUT includes actual raw data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a pie chart?

A

Pie 🥧 chart- used for categorical data.

- frequencies of data items displayed as sectors of a circle ⭕️ with angle in each sector proportional to frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How many ways can data be distributed?

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a box and whisker diagram?

A

Box-and-whisker diagram (boxplot)- summarises numerical data by showing lowest value, lowest quartile, median, upper quartile and highest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a cumulative frequency curve?

A

Cumulative frequency curve- graph illustrating numerical data
- cumulative frequency curve useful for estimating values of median, quartiles or other percentiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How many statistical diagrams are there?

A

9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the types of distribution of data?

A

1) Positively skewed
2) Negatively skewed
3) Symmetrical
4) Unimodal
5) Bimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is positively skewed data?

A

Right-hand tail to distribution

- median closer to lower quartile than to upper quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is negatively skewed data?

A

left-hand tail to distribution

- median closer to upper quartile than lower quartile

34
Q

What is symmetrical data?

A

Peak of data approximately in centre and distribution looks reasonably symmetrical
- mean and the median will be close together

35
Q

What is unimodal data?

A

A unimodal distribution has 1 peak

36
Q

What is bimodal data?

A

A bimodal distribution has 2 distinct peaks

37
Q

How many measures of central tendency are there?

A

3

39
Q

What are the measures of central tendency?

A

1) Mean
2) Median
3) Mode

40
Q

What is the mean?

A

Found by ➕ up data items and ➗ by number of data items

41
Q

What is the mode?

A

Most frequently occurring data value

42
Q

How many measures of variation are there?

A

4

43
Q

What is the median?

A

Midpoint of data when placed in numerical order

44
Q

How do calculate the variance?

A

SEE MATHS WORD DOC CALLED GRAPHS TO KNOW

45
Q

What is the range?

A

Difference between highest and lowest values from data

46
Q

How do you calculate the standard deviation?

A

Square root of standard deviation

SEE MATHS WORD DOC CALLED GRAPHS TO KNOW

47
Q

What is the interquartile range?

A

Difference between upper quartile (3/4 of data when ranked numerically) and lower quartile (1/4 of data when ranked numerically)

48
Q

What is the variance?

A

Variance- measure of spread of sample of data

50
Q

What is standard deviation?

A

Average distance of each data item from mean

51
Q

What must you remember with scatter diagrams and the type of population they represent?

A

Sometimes scatter diagram show data falling in 2 or ⬆️ groups
These may represent different sections of population (e.g. adults 🧑 and children 👶) … may ✖️ be appropriate to treat data as single set

52
Q

What are the measures of variation?

A

1) Range
2) IQR
3) Variance
4) Standard deviation

53
Q

What is bivariate data?

A

Data which involves 2 variables, e.g. height and weight

54
Q

How can you illustrate bivariate data?

A

Illustrated on scatter diagram in which axes represent the 2 variables and each data item is plotted using coordinates

55
Q

What can you infer once you have plotted bivariate data on a scatter diagram?

A

If bivariate data plotted on scatter diagram fall close to straight line = linear correlation (the closer the data lie to the line, the 💪 the correlation)
If line has ➕ gradient = ➕ correlation
If line has ➖ gradient = ➖ correlation
If all data lies on line = perfect linear correlation

56
Q

What is important to remember about causation and correlation?

A

Correlation ✖️ imply causation (cause and effect ✖️ be established between 2 co-variables which show a correlation)

58
Q

What is association?

A

Relationship between variables which is not linear

59
Q

What is an outlier?

A

Unusually ⬆️ or ⬇️ value in set of data

60
Q

How many definitions of outliers are there?

A

2

61
Q

What is categorical data?

A

Categorical data- ✖️ numerical in value (e.g. colours of cars)

62
Q

What is discrete data?

A

Discrete data- numerical data- can take only specific values e.g. shoe 👞 sizes or number of pets 🐕

63
Q

What is continuous data?

A

Continuous data- numerical data- can take any real values in a range e.g. weights or times ⏰

65
Q

How can you define outliers in a set of data?

A

1) Any data value which is ⬆️ than 2 standard deviations away from mean
2) Any data value which is ⬆️ than 1.5 times the IQR above the upper quartile or below the lower quartile

66
Q

What is cleaning data?

A

Cleaning data involves dealing with missing data, errors and outliers

67
Q

What is a sample space?

A

Set of all possible outcomes of a trial or experiment

68
Q

What does P( A U B) mean?

A

‘probability that either event A occurs, or event B occurs (or both)’

69
Q

What does P( A n B) mean?

A

‘probability that both event A and event B occur’

70
Q

What are mutually exclusive events?

A

2 events are mutually exclusive if it is impossible for them to occur together

71
Q

How can you confirm if 2 events are mutually exclusive?

A

If two events are mutually exclusive, then

P(A U B) = P(A) + P(B)

72
Q

What are independent events?

A

2 events A and B are independent if whether or not A occurs has ✖️ effect on whether or not B occurs

73
Q

How can you confirm if 2 events are independent?

A

If events A and B are independent, then

P(A n B) = P(A) ✖️ P(B)

74
Q

What are the 4 main steps of hypothesis testing?

A

1) State the X (squiggly line) B (n,p)
2) State the null and alternate hypothesis … decide whether 1 or 2 tail
3) State your test statistic (e.g. number of sixes obtained in the 50 spins)
4) Note the significance level

75
Q

What is the critical region?

A

Set of values for test statistic X for which you would reject null hypothesis

76
Q

What is the critical value?

A

Value of X for which you change from accepting null hypothesis to rejecting it
- critical region includes critical value

77
Q

What is the acceptance region?

A

Range of values of X for which you would accept the null hypothesis

78
Q

What must you remember about a hypothesis test?

A

Remember that result of hypothesis test ✖️ ‘prove’ anything!
You can always have unusual results from a sample, that are not representative of the population

79
Q

How does the p-value or the critical region/value determine if you accept or reject the null hypothesis?

A

If your p-value is ⬇️ than the significance level, (alternatively, your test statistic lies in the critical region), you reject H0 (null hypothesis) and accept alternate hypothesis

If your p-value is ⬆️ than the significance level, (alternatively, your test statistic lies outside critical region), you accept H0 (null hypothesis) and reject alternate hypothesis

80
Q

What MUST you remember about the conclusion of an hypothesis test?

A

Always give your conclusion in terms of the original problem- CONTEXT

81
Q

How do calculate the median ?? COMPLETE

A

A