💝✨Sampling✨💝 Flashcards

1
Q

What is statistics?

🍬Hint: ITSOHTCOAAID

A

✨Its the study of how to collect, organise, analyse and interpret data.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Individuals = ?

A

✨People or objects included in a study.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable = ?

A

✨Characteristics of the individual to be measured or observed.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population = ?

A

✨A group of individuals with a common theme.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample = ?

🍬Hint: ASPOTPICBROB

A

✨A small portion of the population. It can be representative or biased✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Census

🍬Hint: ARACIAAWP

A

✨Acquiring, recording, and, calculating information (ARC) about a whole population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

N = ?

A

✨Total population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

n = ?

A

✨Sample✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parameters = ?

A

✨A measure that describes the entire population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Statistic = ?

🍬Hint: AMTDAS

A

✨A measure that describes a sample.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

🌸Descriptive statistics = ?🌸

🍬Hint: MOOPASIFSAP

A

✨Methods of organising, picturing, and summarising (OPS )information from samples and populations.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

🌸Inferential statistics = ?🌸

🍬Hint: MOUIFASTDCRAP

A

✨Methods of using information from a sample to draw conclusions regarding a population.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

⚠ Its important to properly idenitfy meansures as [ 👛 ] or [ 🐽 ]in statistics…..?

A

✨ 👛Population Parameters 👛 or 🐽Sample Statistics.🐽✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

⚠️ Different types of data are……?

🍬Hint: UFPAS

A

✨Used for parameters and statistics.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Quantitative = ?

🍬Hint: ANMOS

A

✨A numerical measurement of something.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

🌸Qualitative = ?🌸

🍬Hint: AQOCCOS

A

✨A quality or categorical characteristic of something.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

🌸Interval = ?🌸

A

✨no true 0✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Ratio = ?

A

✨True 0✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Nominal = ?

🍬Hint: CLONTDHAHO

A

✨Categories, labels or names, that don’t have a hierarchal order✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ordinal = ?

🍬Hint: CLN

A

✨Categories,labels and names that do have a hierarchy✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

QUANTITATIVE ——– QUALITATIVE
🎀👇🏻🎀 💅🏻👇🏻💅🏻

A

✨🎀Interval and ratio🎀 💅🏻Nominal and ordinal💅🏻✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

🌸All data can be qualified as……?🌸

🍬Hint: QOQATF

A

✨Quantitative or Qualitative and then further.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

🌸Sampling frame = ?🌸

🍬Hint: LOIFWASIAS

A

✨List of individuals from which a sample is actually selected.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

🌸Undercoverage = ?🌸

🍬Hint: OPMFTSF

A

✨Omitting population members from the sampling frame.✨

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
🌸Sampling error = ?🌸 | 🍬Hint: IETCWSBATEOTDWAIANWWTFP
✨Inevitable errors that come with sampling because at the end of the day we are inferring, and not working with the full population.✨
26
🌸Non-sampling error = ?🌸 | 🍬Hint: AVOAHE
✨Any variety of avoidable human errors.✨
27
Simulations = ?
✨Numerical facsimiles that mimic real-word processes using random sampling to estimate probabilities or outcomes.✨
28
⭐️A common rule of thumb is...?⭐️
✨The more complex the model and assumptions, the higher the likelihood of error. Generally, aim for at least a 95% confidence level in results, but always validate with real data when possible✨
29
🎀Simple random sampling (SRS)💅🏻 = ?
✨A method where each member of a population has an equal chance of being selected for the sample✨
30
⚠️ A high quality sampling frame is crucial in....?
✨🎀SRS💅🏻✨
31
🎀💅🏻 = ?
🎀✨Known terms✨💅🏻
32
🎀Stratified sampling💅🏻
✨Dividing your list into subgroups (strata) based on specific variables.✨
33
⚠️ If you have a strata that is somewhere in between specific variables then you must....?
✨choose a variable in an objective manner to put the strata into✨
34
The strata are based on.....?
✨a specific characteristic✨
35
⭐️Draw an SRS from each stratum because....?⭐️
✨To ensure each subgroup (stratum) is properly represented, reducing bias and increasing accuracy✨
36
⭐️ You can further [ ] you strata for better accuracy.⭐️
✨stratify✨
37
⚠️ Oversampling = ?
✨Intentionally collecting more data from an underrepresented group.✨ ⚠️ It distorts the natural proportions of the data leading to bias if not handled correctly
38
kᵗʰ = ?
✨Every "k" number in a sequence✨ Example: If k = 5, you pick every 5th item (5th, 10th, 15th, etc.). 📊
39
🎀Systematic sampling💅🏻 = ?
✨Systematic sampling is when you pick every kᵗʰ person from a list after randomly starting somewhere ✨
40
Steps of 🎀Simple random sampling💅🏻 = ?
1. 💖Define the population: Identify everyone or everything in your group.💖 2. 💖List all individuals: Write down each item or person in the population (could be in a list or database). 💖 3. 💖Random selection: Use a random method (like drawing names from a hat or using a computer generator) to pick individuals.💖 4. 💖Select your sample: The number of individuals you pick depends on your desired sample size, but every person has an equal chance of being chosen.💖
41
Steps of 🎀Systematic sampling💅🏻 = ?
💖 Define the population: Identify the full group of items or individuals (like all your Barbie accessories). 💖 💖 Choose your sample size: Decide how many you want to pick (e.g., 10 Barbie accessories). 💖 💖 Calculate the interval (k): Divide the total number of items by the sample size. For example, if you have 100 accessories and want 10, your interval (k) is 10. 💖 💖 Pick a random starting point: Choose a random number between 1 and k. Let’s say you randomly choose 3. 💖 💖 Select every kᵗʰ item: Starting from your random point (e.g., 3), pick every 10th item (3, 13, 23, etc.). 💖
42
🎀Stratified sampling💅🏻 steps = ?
1. 💖 Define the population: Think of this as the whole brain—every neuron and pathway you need to consider in your study. 💖 2. 💖 Divide into strata: Just like how a neurosurgeon divides the brain into different regions (frontal, temporal, occipital, etc.), you categorize your population into specific groups based on a characteristic—such as age or gender. 💖 3. 💖 Randomly sample within each stratum: After identifying the brain regions, you carefully extract samples from each region (just like taking samples from specific parts of the brain during surgery) to make sure each region is well-represented in your data. 💖 4. 💖 Combine the samples: Once you’ve sampled from each region, you reconnect the brain regions into one holistic view, just like stitching up the brain after surgery, combining all your samples into one complete dataset. 💖
43
⚠️ You can't do [ ] with 🎀systematic sampling💅🏻
✨Patterns✨
44
🎀Cluster Sampling💅🏻 = ?
✨A method where the population is divided into groups (clusters), and entire clusters are randomly selected for study.✨
45
🎀Cluster Sampling💅🏻 steps = ?
✨💖 1. Define the population: Identify the entire group you're studying (e.g., all schools, hospitals, or neighborhoods). 💖 2. Divide the population into clusters: Break the population into natural groups or clusters (e.g., different cities, schools, or regions). 💖 3. Randomly select clusters: Choose entire clusters at random from the list of clusters you created. 💖 4. Study all individuals in selected clusters: Collect data from every individual within the randomly chosen clusters. 💖 5. Analyze the data: Combine the data from all the individuals within the selected clusters to draw conclusions about the population. 💖✨
46
⚠️ 🎀Cluster Sampling💅🏻 is better used when.....?
✨💖 When the population is large or geographically spread out: It’s more practical to divide the population into clusters (e.g., cities, schools, or neighborhoods) and randomly select entire clusters to make the sampling process more manageable. 💖 💖 When you can’t list every individual: If it’s hard or time-consuming to list every individual in the population, using clusters allows you to focus on groups, making it more feasible. 💖 💖 When cost and time are factors: Studying entire groups within clusters can save you time and money compared to sampling individuals from every possible group. 💖 💖 When the groups are similar: Cluster sampling works well when the individuals within each cluster are similar to each other, so studying one cluster can give you a good understanding of the whole population. 💖 In short, use it when it’s more efficient and practical than other sampling methods, especially for large or hard-to-reach population. If time and money aren't a factor it's usually better to do more individualised studies💖✨
47
🎀Convenience Sampling💅🏻 = ?
✨A non-probability sampling method where participants are selected based on their ease of access (convenience) rather than randomly.✨
48
Steps of 💅🏻Convenience Sampling🎀 = ?
✨💖 1. Define your population: Identify the group you want to study (e.g., customers, students, etc.). 💖 2. Select participants based on availability: Choose participants who are easiest to reach or access, such as people nearby, volunteers, or those readily available. 💖 3. Collect data: Gather information from the selected individuals. 💖 4. Analyze the data: Use the collected data to draw conclusions about the population (though remember, the results may not be fully representative due to potential bias). 💖 It's all about ease and accessibility, but it can introduce bias because it doesn’t provide a random selection. 💖✨
49
🎀Multistage Sampling💅🏻 = ?
✨A method where sampling happens in steps, selecting large groups first and then narrowing down using different sampling methods at each stage.✨
50
🎀Multistage Sampling💅🏻 steps = ?
💖 1. Define the population: Identify the entire group you want to study. 💖 2. Divide the population into large clusters: Break it into big, manageable groups (e.g., cities, schools, hospitals). 💖 3. Select clusters using a sampling method: Use random sampling (or another method) to pick some clusters. 💖 4. Further divide the selected clusters: Break them down into smaller subgroups (e.g., schools → classrooms → students). 💖 5. Use a different sampling method at each stage: For example, SRS for cities, cluster sampling for schools, and systematic sampling for students. 💖 6. Collect data from the final sample: Gather data from the individuals selected in the last stage.
51
⚠️Multistage sampling can be less accurate due to.....?
✨to multiple sampling steps, more complex to design and execute, and has a higher risk of bias if the sampling methods at each stage aren’t properly chosen.✨
52
✨Every member of the population has a known and non-zero chance of being selected (e.g., SRS, stratified sampling).✨
53
🌸Eight rules of thumb for conducting a Statistical Study = ?🌸
✨1.💖 State a hypothesis.💖 2. 💖 Identify individuals of interest.💖 3. 💖 Specify the variables to measure.💖 4. 💖 Determine if you will use an entire population or a sample. (If you choose a sample, choose a sampling method)💖 5. 💖 Address ethical concerns before data collection💖 6. 💖Collect data💖 7. 💖 Use descriptive or inferential statistics to answer your hypothesis💖 8. 💖Note any concerns about your data collection or analyses and make recommendations for future studies💖✨
54
⚠️ 👛Which review board decides if your study is considered "Human Research".👛 ⚠️ 🐽What are the implications of your study being considered "Human Research".🐽
✨👛Institutional Review Board (IRB)👛✨ ✨🐽Consent: If the study is being conducted children you'll need it BOTH from their parents and the children🐽✨
55
💫 What are two options for Data Collection?💫
✨1. 💖Collect from Existing Data sets💖 2. 💖Collect Manually💖✨
56
⭐️ If you need population measure you can collect from [ ] data sets.
✨Government✨
57
💫👛In a [ ] measurements or observations from the entire population are used.👛💫 💫🐽In a [ ] measurements or observations from part of the population are used.🐽💫
✨👛Census👛 ✨ ✨ 🐽Sample🐽✨
58
💫 What are the two main types of studies?💫
✨ Experimental and Observational✨
59
🌸Experimental study = ?🌸
✨A treatment or intervention is deliberately assigned to the individuals. i.e. Controlling conditions, applying treatments. ✨
60
🌸Observational study = ?🌸
✨No treatment or intervention is deliberately assigned to individuals. i.e Watching and recording, no interference. ✨
61
💫Purpose of Experimental study?💫
✨To study the possible effect of the treatment or intervention on the variables measured✨
62
💫Studies must be done rigorously enough to be [ ]💫
✨Replicated✨
63
💫Purpose of Observational study?💫
✨To analyse relationships between variables without applying a treatment or intervention✨
64
💫What are the 7 types of bias?💫 🍬 Hint: Sassy Researchers Really Can Mess Numbers, Right Harper?
✨1. 💖S – Selection Bias (Sample isn’t random or representative) 💖 2. 💖R – Response Bias 🎤 (People alter answers due to wording or pressure)💖 3. 💖R – Recall Bias 🧠 (People misremember past events)💖 4. 💖C – Confirmation Bias 🔍 (Looking for data that supports beliefs)💖 5. 💖M – Measurement Bias 📏 (Flawed tools or inconsistent data collection)💖 6. 💖N – Nonresponse Bias 📭 (People who don’t respond may be different)💖 7.💖 R – Survivorship Bias 💃 (Only looking at successful cases)💖 8. 💖Hidden Bias = Bias that influences results unnoticed, skewing the study without anyone realising it💖✨
65
🌸Lurking Variable = ?🌸
✨A hidden factor that influences both the independent and dependent variables, causing a false impression of a relationship between them. ✨
66
🌸Blocked Randomisation = ?🌸
✨Grouping participants by a characteristic and then randomly assigning them to treatment groups within each group.✨
67
Steps of Blocked 🎀Randomisation💅🏻 = ?
✨1. 💖Identify blocks: Group participants by a characteristic (e.g., age, gender).💖 2. 💖Randomly assign: Within each block, randomly assign participants to different treatment groups.💖 3. 💖Repeat: Continue for all blocks to ensure balanced groups.💖✨
68
🌸Blinding = ?🌸
✨where a person (participant, research staff) is deliberately not told of a treatment assignment in a study so s/he is not biased in reporting study information.✨
69
🌸Double Blinding = ?🌸
✨where both study staff and participant does now know treatment assignment✨
70
🌸Unblinding procedures = ?🌸
✨1.💖Emergency Unblinding – Done when a participant's safety is at risk.💖 2. 💖Planned Unblinding – Occurs at pre-specified points in the study.💖 3.💖 Accidental Unblinding – Happens unintentionally due to errors or clues.💖 4. 💖Partial Unblinding – Only specific study personnel are unblinded.💖 5. 💖Complete Unblinding – Everyone is unblinded, usually at study completion.💖✨
71
🌸Frequency Histogram = ?🌸
✨a graph that shows how data is distributed across classes (Tupperwares), with bars representing the frequency of data in each class .✨
72
💫Steps to a 🎀Frequency Histogram💅🏻?💫
✨1. 💖Collect all your data points💖 2. 💖Choose you classes💖 3. 💖Sort you data points into the classes💖 4. 💖Note your classes on the x axis💖 5. 💖 Note the frequency on the y axis 💖 6. 💖For each class, draw a bar that reaches up to the corresponding frequency.💖 💖P.S. Maybe you make a Frequency Table for clarity💖✨
73
🌸Relative Frequency Histogram = ?🌸
✨A Relative Frequency Histogram shows the percentage of data in each class instead of the count.✨
74
Steps to 🎀Relative Frequency Histogram💅🏻?
✨1. 💖 Collect all your data points.💖 2.💖 Choose your classes.💖 3.💖 Sort your data points into the classes.💖 4.💖 Calculate the relative frequency for each class (class frequency divided by total number of data points).💖 5.💖 Note your classes on the x-axis.💖 6.💖 Note the relative frequency on the y-axis.💖 7.💖 For each class, draw a bar that reaches up to the corresponding relative frequency.💖 💖P.S. It will be very helpful to do Frequency Table here 💖 ✨
75
🌸Distribution = ?🌸
✨The description of how the data points of a variable are spread or arranged. It shows the frequency or probability of different outcomes in a dataset.✨
76
5 Main types of 🎀Distribution💅🏻?
✨1. 💖Normal Distribution: Most numbers are close to the middle. It’s balanced. 💖 2. 💖Uniform Distribution: Every number has the same chance of happening. No number is more likely than the other. 💖 3. 💖Skewed Right: Most numbers are small, but a few big numbers stretch the group to the right. 💖 4. 💖Skewed Left: Most numbers are big, but a few small numbers stretch the group to the left. 💖 5. 💖Bimodal Distribution: There are two separate groups of numbers that appear the most. 💖✨
77
🌸Outliers = ?🌸
✨Data points that are significantly different from the rest of the data.✨
78
2 Main types of 🎀Outliers💅🏻?
✨1. 💖Global Outliers: These are numbers that are really far away from the rest of the numbers.💖 2.💖These numbers are strange in one group, but okay in another group.💖✨
79
🌸Cumulative Distribution = ?🌸
✨It’s the total number of times something has happened up to a certain point (Your maximum). ✨
80
💫The point of a 🎀Histogram💅🏻is the reveal the [ ]💫
✨Distribution✨
81
🌸Time Series Graph = ?🌸
✨A time series graph shows how data changes over time, with time on the x-axis and values on the y-axis. It helps identify trends, patterns, and anomalies.✨
82
💫🎀Time Series Graph💅🏻 steps?💫
✨1. 💖Collect your data over time💖 2. 💖Label the x-axis with time intervals💖 3. 💖Label the y-axis with the measured values💖 4. 💖Plot each data point at the correct time💖 5. 💖Connect the points with a line to show trends💖 6. 💖Analyse for patterns, trends, and outliers💖✨
83
🌸Bar Graph = ?🌸
✨A chart that uses rectangular bars to show the size of different categories✨
84
⚠️Changing the scale of the y axis can be [🐽].⚠️ ⚠️When comparing to another Data point use the [👛] on the y axis.⚠️
✨🐽Misleading🐽✨ ✨👛same scale👛✨
85
🌸Clustering = ?🌸
✨Splitting bars in bar charts into subcategories for further precision✨
86
🌸Pie Chart = ?🌸
✨A circular chart that is divided into slices to show how different categories compare as parts of a whole✨
87
💫Pie charts work best with [ ] categories, e.g., small/medium/large, yes/no, white/black.💫
✨Mutually Exclusive✨
88
🎀Pie Chart💅🏻 steps?
✨1.💖Add up all the numbers to get the total. (Example: 5 + 3 + 8 = 16)💖 2.💖Find the percentage for each part by dividing each part by the total and multiplying by 100. (Example: (5 ÷ 16) × 100 = 31.25%, (3 ÷ 16) × 100 = 18.75%, (8 ÷ 16) × 100 = 50%)💖 3.💖Convert percentages to decimals by dividing each by 100. (Example: 31.25 ÷ 100 = 0.3125, 18.75 ÷ 100 = 0.1875, 50 ÷ 100 = 0.5)💖 4.💖Multiply the decimal by 360 to get the degrees for each section. (Example: 0.3125 × 360 = 112.5°, 0.1875 × 360 = 67.5°, 0.5 × 360 = 180°)💖 5.💖Draw a circle and mark the center.💖 6.💖Use a protractor to measure the angles for each section based on the degrees you calculated. (Example: 112.5° for the first part, 67.5° for the second, and 180° for the third.)💖 7.💖Label each section with the corresponding category and percentage.💖✨
89
💫Rules of thumb for ALL Graphs?💫
1. 💖 Provide a title Always include a clear and descriptive title that explains what the graph shows.💖 2. 💖 Label axes Clearly label both the x-axis and y-axis with their respective variables. This helps explain what each axis represents.💖 3. 💖 Identify units of measure Always include the units of measurement (e.g., dollars, hours, percentage) so people understand the scale of the data.💖 4. 💖 Make the graph clear Font size: Use legible fonts and avoid making text too small. Graph complexity: Avoid overcrowding the graph with too many data points or categories. Keep it simple and focused. Colors: Use contrasting colors for clarity and to differentiate between categories, but don’t overdo it. Legends: Use legends or labels to explain what different colors or lines represent if needed.💖 5. 💖 Scale appropriately Choose an appropriate scale for the data. Ensure the graph is not misleading by adjusting the scale to fit the data points properly.💖 6. 💖 Show trends clearly If possible, emphasize the patterns or trends in the data (e.g., using lines or markers on a line graph) to make the message easier to understand.💖 7. 💖 Consistent formatting Make sure your graphs are consistent, especially if you’re comparing multiple graphs. Use the same colors, fonts, and layout style.💖
90
💫Cases where each graph is most useful?💫 1.🌈Frequency Histogram?🌈 2.🌺Relative Frequency Histogram ?🌺 3.🏝️Stem-And-Leaf Display?🏝️ 4.🦩Time Series Graph?🦩 5.🐠Bar Graph?🐠 6.🍍Pareto Chart?🍍 7.🦜Pie Graph?🦜
1.🌈For quantitative data, when you want to see the distribution.🌈 2. 🌺For quantitative data, when you want to see the distribution. Also, good for comparing to other data.🌺 3.🏝️For quantitative data, when you want to see the distribution. Easier to make by hand than histogram.🏝️ 4.🦩For graphing a variable that changes over time and is measured at regular intervals.🦩 5.🐠For qualitative or quantitative data, and for displaying frequency or percentage.🐠 6.🍍For frequencies of rare events in descending order.🍍 7.🦜For mutually-exclusive categories (quantitative or qualitative).🦜
91
🌸Frequency table = ?🌸
✨organises data by showing how often each value appears. It’s useful for spotting patterns, summarising large datasets, and preparing for further analysis.✨
92
🌸Class = ?🌸
✨An interval grouping data values. Example: Between 30 and 40 miles.✨
93
🌸Class Limit = ?🌸
✨The smallest and largest values that fit in a class. Example: 30 is the lower class limit, and 40 is the upper class limit.✨
94
🌸Class Width = ?🌸
✨The size of a class. Example: Upper class limit (40) minus lower class limit (30) = 10, then add 1 → 11. Example: 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 → 11 values.✨
95
🌸Frequency = ?🌸
✨ The number of data points that fall within a class. Example: The number of patients transported 30 to 40 miles.✨
96
💫How to decide on Classes?💫
Find the range → Biggest number − Smallest number Example: 98 − 12 = 86 Pick the number of classes (usually 5 to 10) Let’s say 6 Find class width → Range ÷ Number of classes 86 ÷ 6 = 14.3 (Always round up, so we use 15) Make class groups (start from the smallest number and add the width) 10 - 24 25 - 39 40 - 54 55 - 69 70 - 84 85 - 99 Count how many numbers fit in each group → That’s the frequency!
97
🌸Relative Frequency Table = ?🌸
✨A table that shows the proportion of data that falls into each class relative to the total sample size.✨
98
🌸Relative = ?🌸
✨In relationship to the rest of the data.✨
99
🌸f = ?🌸
✨Frequency✨
100
🌸n = ?🌸
✨Total Sample Size✨
101
🌸f/n = ?🌸
✨Relative Frequency✨
102
🌸Relative Frequency = ?🌸
✨Is the proportion of the values that are in that class.✨
103
💫How to calculate Relative Frequency?💫
✨1.💖Find the frequency (f) of the class you're interested in.💖 2.💖Find the total sample size (n) by adding up the frequencies of all classes. 💖 3.💖Calculate relative frequency by dividing frequency (f) by total sample size (n):💖 4.💖Convert to percentage if needed by multiplying by 100.💖✨
104
💫How do you do a 🎀 Stem-And-Leaf Display💅🏻?💫
✨1. 💖Sort the data: Put the data in order from smallest to largest.💖 2.💖 Split the numbers into stem and leaf: The stem is everything except the last digit (the tens, hundreds, etc.). The leaf is just the last digit (ones, tenths, etc.). For example, 52 → stem = 5, leaf =2.💖 3. 💖List the stems: Write down all the stems (without repeating). If you have 52, 54, and 58, you will just write "5" for the stem.💖 4.💖 Add the leaves: For each stem, list all the corresponding leaves next to it. For example, if you have 52, 54, and 58, the stem "5" will have leaves "2", "4", and "8".💖 5. 💖Organise the leaves: Arrange the leaves in numerical order for each stem. For example, "5 | 2, 4, 8" becomes "5 | 2, 4, 8" after ordering.💖 6. 💖Repeat for all stems: Do the same for every stem in your data.💖 7. 💖Final display: Now, your stem-and-leaf display is ready! Each stem shows a group of numbers, and the leaves show the details of each number. Example: Data: 52, 54, 58, 61, 63, 67, 72, 74, 75💖 Steps: 💖Sort the data: 52, 54, 58, 61, 63, 67, 72, 74, 75💖 💖Split into stems and leaves: 52 → Stem = 5, Leaf = 2 54 → Stem = 5, Leaf = 4 58 → Stem = 5, Leaf = 8 61 → Stem = 6, Leaf = 1 63 → Stem = 6, Leaf = 3 67 → Stem = 6, Leaf = 7 72 → Stem = 7, Leaf = 2 74 → Stem = 7, Leaf = 4 75 → Stem = 7, Leaf = 5 List the stems: 5, 6, 7💖 💖Add the leaves: 5 | 2, 4, 8 6 | 1, 3, 7 7 | 2, 4, 5💖 💖Organise the leaves: Already in order.💖 💖Final stem-and-leaf display: 5 | 2, 4, 8 6 | 1, 3, 7 7 | 2, 4, 5 And that's it! You've got your stem-and-leaf display.💖✨
105
🌸Central Tendency = ?🌸
✨tells us where the centre or middle of the dataset is.✨
106
💫Three main types of 🎀Central Tendency💅🏻?💫
✨Mean (Average) Add up all the numbers. Divide by how many numbers there are. Example: (5 + 10 + 15) ÷ 3 = 10 Mean = Σx / n Median (Middle Value) Put the numbers in order from smallest to largest. If there’s an odd number of values, pick the middle one. If there’s an even number, take the average of the two middle numbers. Example: Ordered list: 3, 5, 8, 12, 15 → Median = 8 Example (even count): Ordered list: 2, 6, 9, 11 → (6 + 9) ÷ 2 = 7.5 Mode (Most Frequent Number) Find the number that appears the most. There can be one mode, multiple modes, or no mode if all numbers appear equally. Example: 4, 7, 7, 9, 10 → Mode = 7✨
107
💫When to use each main type of 🎀Central Tendency💅🏻?💫
✨Mean: Best when data has no extreme values (outliers). Median: Best when data has outliers or is skewed. Mode: Best for categorical data (e.g., favourite colour, most common shoe size).✨
108
🌸Greek letter capital sigma Σ = ?🌸
✨shorthand for "sum of" Example: if x = {3,4,7} then Σx = 14 Also, ∑xy is shorthand for "multiply x and y for each pair, then add them up." Example: If x = {2, 5, 7} and y = {3, 4, 1}, First, multiply: (2×3), (5×4), (7×1) → {6, 20, 7} Now, add: 6 + 20 + 7 = 33✨
109
🌸x̄ (x-bar) = ?🌸
✨is the average of a sample (a small group from a big group).✨
110
🌸μ (mu) = ?🌸
✨is the average of the whole group.✨
111
⚠️P.S. Means are very sensitive to [👛], Medians aren't. ⚠️
✨ 👛outliers👛 Example {1, 2, 3, 4, 100} Median: The middle number is 3. It stays the same even though 100 is way higher than the other numbers. Mean: If you add all the numbers up (1 + 2 + 3 + 4 + 100 = 110) and divide by 5, the mean is 22. The mean is much higher because 100 is an outlier, and it affects the mean a lot. So, in cases with outliers, the median is more reliable for representing the “typical” value. The mean is less reliable because it can be easily skewed by extreme numbers.✨
112
🌸Trimmed Mean = ?🌸
✨A method of ameliorating the influence of the outliers How to Calculate a 5% Trimmed Mean: 💖Find the total number of data points (n): Example: 100 data points.💖 💖Calculate 5% of the data points: 5% of 100 = 5.💖 💖Order the data: Sort from lowest to highest (or vice versa).💖 💖Remove the 5% from both ends: 💖 💖Remove the 5 smallest and 5 largest values.💖 💖Calculate the new mean: Find the mean of the remaining data, which is now less affected by outliers.💖✨
113
⭐️To find the percentage of anything quickly⭐️
✨the whole times 0.0what-ever -percentage Example: To find 5% of 1200 1200×0.05=60✨
114
🌸Weighted Average = ?🌸
✨a method of computing an average where some data points contribute more than others✨
115
💫How to calculate a 🎀Weighted Average💅🏻? 💫
✨1.💖Identify Given Information💖 2.💖Multiply Each Value by Its Weight💖 3.💖Sum Up the Weighted Values💖 4.💖Divide by the Sum of Weights💖✨
116
💫How to decide the importance of 🎀Weighted Averages💅🏻?💫
✨The weights depend on importance and are usually assigned in one of three ways: Arbitrarily – If no real justification exists, weights might just be assigned based on intuition or preference. Example: A teacher decides homework is worth 30% and exams 70% just because they think exams matter more. Empirically – Based on data or past trends. Example: A company weighs customer feedback scores differently based on how predictive they are of future sales. By Policy/Rules – Set by an institution, standard, or contract. Example: University grading systems (e.g., final exams = 50%, quizzes = 30%, participation = 20%). It all depends on what matters most in the context.✨
117
🌸Normal Distribution = ?🌸
✨when the mean, median, and mode are all the same value Mean: The average of all the data points. Median: The middle value when all data points are arranged in order. Mode: The value that appears most frequently. In a perfect normal distribution (like a bell curve), these three measures of central tendency (mean, median, and mode) will be the same and perfectly aligned with the center of the curve. Why? The data is symmetrically distributed, so the average, middle, and most frequent values all occur at the same point.✨
118
🌸Skewed Distributions = ?🌸
✨the positions of the mean, median, and mode change depending on the direction of the skew: Right-skewed distribution (positively skewed): The tail of the distribution is stretched to the right, meaning there are more lower values and a few higher values. In this case, the mean is greater than the median, and the median is greater than the mode. The order of the measures from left to right is: Mode < Median < Mean. Left-skewed distribution (negatively skewed): The tail of the distribution is stretched to the left, meaning there are more higher values and a few lower values. In this case, the mean is less than the median, and the median is less than the mode. The order of the measures from left to right is: Mean < Median < Mode. Example: If you have a distribution of values: Right-skewed: {1, 2, 3, 4, 5, 5, 5, 6, 7, 8, 9} Mode: 5 (most frequent) Median: 5 (middle value) Mean: 5.36 (average) Left-skewed: {1, 2, 3, 4, 5, 5, 6, 7, 8, 9} Mode: 5 (most frequent) Median: 5 (middle value) Mean: 4.6 (average) In skewed distributions, the skew affects where the mean gets pulled relative to the median and mode.✨
119
🌸Variation = ?🌸
✨How spread out the data is. If numbers are close together, variation is low (consistent). If numbers are far apart, variation is high (all over the place). Example: Two classes have the same average grade of 75, but the grades could look very different: Class A: Everyone got 75, 75, 75, 75, 75 → No variation, everyone got the same grade. Class B: Some got 50, 60, 75, 90, 100 → Big variation, some did much better or worse than others. Even though both have the same mean (75), Class B has more spread-out grades. That’s why we need variation (like range, variance, or standard deviation) to see how much the data is spread out or clustered together. ✨
120
💫Measures of Variation = ?💫
✨1.💖Range → Difference between the largest and smallest values.💖 2.💖Variance → The average of the squared differences from the mean.💖 3.💖Standard Deviation → The square root of the variance; shows how much data deviates from the mean.💖 4.💖Coefficient of Variation → Standard deviation divided by the mean; useful for comparing variability between different datasets.💖 💖💖✨
121
🌸Range = ?🌸
✨The Largest Value - The Smallest Value 🧮 For example: Data: 42, 33, 21, 78, 62 Maximum = 78 Minimum = 21 Range = 78 − 21 = 57 ✅ The data spreads 57 units from the smallest to the largest value.✨
122
💫⚠️Why isn't the 🎀Range💅🏻 all that useful?⚠️💫
✨Range is too sensitive to outliers, so we use better tools like: Standard deviation Variance Interquartile range Range only looks at two numbers: → The highest and the lowest. It ignores everything else in the middle. So if you change just one of those two numbers (highest or lowest), the range changes a lot—even if the rest of the data stays the same. Example: Original: [21, 33, 42, 62, 78] → Range = 78 − 21 = 57 Now change one number: [23, 33, 42, 62, 90] → Range = 90 − 23 = 67 The range changed a lot (from 57 to 67), but most of the numbers didn’t.✨
123
🌸Variance = ?🌸
✨Variance – average of the squared differences from the mean It tells you how spread out the numbers are from the mean. If the variance is small, your numbers are huddled close together. If it’s big, your numbers are scattered like confetti 🎉. Think of it like: “How well does the average (mean) describe everyone?” ✨
124
🌸Standard Deviation = ?🌸
✨Standard deviation – the square root of the variance Variance gives you the squared spread — a bit awkward. Standard deviation is just the square root of that, so it’s easier to interpret and in the same units as your data. So if you’re measuring grades, standard deviation is also in grade units (not grade² like variance is).✨
125
💫How to calculate Variance and Standard deviation? 💫
✨Find the mean  (8+6+2)/3 = 5.333 Subtract the mean from each value (get deviations)  8−5.333 = 2.667  6−5.333 = 0.667  2−5.333 = −3.333 Square each deviation  2.667² = 7.113  0.667² = 0.445  (−3.333)² = 11.109 Add them up  7.113 + 0.445 + 11.109 = 18.667 Divide by n−1 (for sample variance)  18.667 ÷ (3−1) = 9.334 → This is the variance Square root the variance  √9.334 = 3.055 → This is the standard deviation ✅✨
126
💫What does 🎀Variance💅🏻 and 🎀Standard Deviation💅🏻?💫
✨🔢 Variance Tells you how spread out your data is. ➡️ Higher variance = more spread ➡️ Lower variance = more consistent Example: Set A: [5,5,5] → Variance = 0 (super consistent) Set B: [2,5,8] → Variance > 0 (more spread) 📏 Standard Deviation Tells you on average how far values are from the mean. ➡️ It’s the "typical" distance from the center. Example: Mean = 5 Standard Deviation = 2 → Most values are about 2 units away from 5 ✅✨
127
🔔Reminder!🔔
✨🎯 Focus on Sample Formulas We usually don’t know the whole population — just a sample. So we use sample variance and sample standard deviation formulas. s² = (sum of (x - mean)²) ÷ (n - 1) s = square root of s² Where: x = each value in the dataset mean = average of all values n = number of values s² = variance s = standard deviation ✨
128
💫How many ways are there to calculate variance and standard deviation?💫
✨Two — the defining formula and the computational formula.✨
129
🌸Defining formula = ?🌸
✨“Square all the x-values first, then add the result.” The step-by-step method: mean → deviations → squares → average → square root.✨
130
🌸Computational Formula = ?🌸
✨"Add all the x-values first, then square the result." A shortcut using Σx² and (Σx)² that gives the same result but skips the logic.✨
131
🌸Which formula is easier to understand?🌸
✨ The defining formula — it shows what’s really happening in the data.✨
132
💫Which formula is faster for large datasets but harder to understand?💫
✨The computational formula.✨
133
🔔Reminder🔔
✨Sample Defining Formulas These are the formulas used to calculate sample variance and standard deviation. The main part of the formula, Σ(x − x̄)², is called the Sum of Squares. This captures how much each value differs from the mean, squared and then added together. Sample Variance (s²):  s² = Σ(x − x̄)² / (n − 1) Sample Standard Deviation (s):  s = √[Σ(x − x̄)² / (n − 1)] We divide by (n − 1) when using a sample. First calculate the Sum of Squares, then plug it into the formula. Standard deviation is just the square root of the variance.✨
134