Lecture 1 Ucc Flashcards
What is descriptive statistics?
State the three forms that descriptive stats can be in
Descriptive Statistics provides information regarding the overview of the general features of a given dataset.
They may be in the form of tables, graphs or numerical summary measures (i.e proportions, means, etc.)
Examples of numerical summary measures in statistics, which summarize and describe important features of data, include:
- Mean: The average of all data points.
- Median: The middle value in a sorted dataset.
- Mode: The most frequently occurring value(s).
- Range: The difference between the maximum and minimum values.
- Variance: The average squared deviation from the mean.
- Standard Deviation: The square root of the variance; measures how spread out the values are from the mean.
- Interquartile Range (IQR): The range between the first (Q1) and third quartile (Q3), representing the middle 50% of the data.
- Percentiles: Values that divide the data into 100 equal parts.
- Quartiles: Values that divide the data into four equal parts (Q1, Q2, Q3).
-
Z-Score: Represents how many standard deviations a data point is from the mean. A z-score of 0 means the data point is exactly at the mean.
• A positive z-score means the data point is above the mean.
• A negative z-score means the data point is below the mean.
• The magnitude of the z-score indicates how far the data point is from the mean. For example, a z-score of +2 means the data point is 2 standard deviations above the mean.
Uses of Z-Score:.
2. Identifying Outliers: Data points with very high or very low z-scores (e.g., above 3 or below -3) are often considered outliers.
In a standard normal distribution (mean = 0, standard deviation = 1), z-scores help determine the probability of a data point occurring within the distribution. For instance:
• About 68% of data points fall within a z-score range of -1 to +1.
• About 95% fall within -2 to +2.
• About 99.7% fall within -3 to +3.
- Skewness: Indicates the asymmetry of the data distribution.
- Kurtosis: Measures the “tailedness” of the distribution.
These numerical summaries provide insights into the distribution, central location, spread, and overall structure of the data.
What is inferential statistics
inferential statistics - how you’re going to use data to draw a conclusion,hypothesis,inferences
State the Population,target population,sample needed in the example below
Example: a study for 200 pregnant women in cape coast.
Population- women in cape coast
Target population-pregnant women
Sample 200
What is discrete and continuous data?
What is nominal data?
What is ordinal data?
Give three examples each
Here are two more challenging MCQs to test your understanding of discrete and continuous variables:
A researcher is studying the growth of bacteria in a lab over time. They record the size of the bacterial colony every 12 hours. Which statement best describes the variable “colony size”?
A) It is discrete because the bacteria are counted.
B) It is continuous because the size of the colony can be measured in precise units like millimeters or micrometers.
C) It is discrete because the colony size is measured at regular intervals.
D) It is continuous because the number of bacteria in the colony grows over time.
A hospital tracks the number of days each patient stays in the intensive care unit (ICU). Which of the following best describes the variable “length of stay in the ICU”?
A) It is continuous because the total time can be measured in hours and minutes.
B) It is discrete because the number of days is counted as whole numbers.
C) It is continuous because it can vary significantly between patients.
D) It is discrete because the stay in the ICU is divided into distinct time blocks.
Qualitative :
Ordinal data -The categories are ordered in some way. is used in disease state- mild moderate severe anemia ,another example is degree of pain,wealth index being rich poor or poorest
Nominal-The categories are not ordered but rather have names. Blood group (A, B, AB, and 0 )
Marital status (married/ widowed/ single) Sex (male/female),
Political party affiliation,
Educational level,
HIV status (Positive, negative).
Quantitative or numerical data data:
Numerical- discrete and continuous. Examples of discrete data- Days in the week or months in the year,number of people in a class, number of cars,
Continuous variables can assume any value within a specified relevant interval of values assumed by the values.
e.g. height, weight, and temperature of a patient
Continuous- weight of individual cuz your weight can change everytime you check it at intervals
To know which is which, rule out the other options by asking if you can get a decimal from them. If not, they’re discrete.
Example: you can’t get number if children as 44.5
In theory: Age is continuous since it can be measured to any degree of precision.
• In practice: Age is often treated as discrete for convenience when we round to whole numbers.
Here are two more challenging MCQs to test your understanding of discrete and continuous variables:
A researcher is studying the growth of bacteria in a lab over time. They record the size of the bacterial colony every 12 hours. Which statement best describes the variable “colony size”?
A) It is discrete because the bacteria are counted.
B) It is continuous because the size of the colony can be measured in precise units like millimeters or micrometers.
C) It is discrete because the colony size is measured at regular intervals.
D) It is continuous because the number of bacteria in the colony grows over time.
Answer: B) It is continuous because the size of the colony can be measured in precise units like millimeters or micrometers.
(The size of the bacterial colony is measured in a continuous manner, as it can take any value within a range.)
A hospital tracks the number of days each patient stays in the intensive care unit (ICU). Which of the following best describes the variable “length of stay in the ICU”?
A) It is continuous because the total time can be measured in hours and minutes.
B) It is discrete because the number of days is counted as whole numbers.
C) It is continuous because it can vary significantly between patients.
D) It is discrete because the stay in the ICU is divided into distinct time blocks.
Answer: B) It is discrete because the number of days is counted as whole numbers.
(The length of stay is often recorded as whole days, making it a discrete variable in this context.)
These questions focus on understanding the nature of measurement and how variables are typically recorded in real-world scenarios.
What are numerical summary measures and measures of central tendency
Numerical summary measures are used to make concise quantitative statements that characterize the whole distribution of quantitative values.
Measures of Central Tendency
These are measures used to investigate the central characteristic of the data or the point which the observations (or values) tend to cluster
What is arithmetic mean?
The arithmetic mean is extremely sensitive to unusual values.
question:
The forced expiratory volume (FEV) in 1 second for 13 study participants are as follows: 2.30, 2.15, 3.50, 2.60, 2.75, 2.82, 4.05, 2.25, 2.68, 3.00, 4.02, 2.85, 3.38. Calculate the arithmetic mean for FEV
Arithmetic mean: It is the sum of all observations in a set of data divided by the total number of observations
The arithmetic mean is extremely sensitive to unusual values.
Examples
The forced expiratory volume (FEV) in 1 second for 13 study participants are as follows: 2.30, 2.15, 3.50, 2.60, 2.75, 2.82, 4.05, 2.25, 2.68, 3.00, 4.02, 2.85, 3.38. Calculate the arithmetic mean for FEV values
The arithmetic mean is given by
(2.30+2.15+3.50+2.60+2.75+2.82+4.05+2.25+2.68+3.00, 4.02+2.85+3.38)/13=2.95 litres
Sure, let’s break down each mean with simple explanations and examples:
Definition: The arithmetic mean is what most people commonly refer to as the “average.” You add up all the numbers and then divide by how many numbers there are.
Formula:
[
\text{Arithmetic Mean} =Sum of all values / Number of values
Example:
Consider these numbers: 4, 7, 10.
- Sum: (4 + 7 + 10 = 21)
- Count: There are 3 numbers.
- Arithmetic Mean: {21/3} = 7
So, the arithmetic mean is 7.
Definition: The harmonic mean is used for rates or ratios. You take the reciprocal (1 divided by the number) of each value, find their average, and then take the reciprocal of that average.
Formula:
[
\text{Harmonic Mean} ={Number of values/{Sum of the reciprocals of all values}
Example:
Consider these values: 4, 6.
- Reciprocals: 1/4= 0.25, 1/6= 0.167
- Sum of Reciprocals: (0.25 + 0.167 = 0.417
-
Harmonic Mean: {2/0.417)
It’s 2 cuz the number of values are 2. If the number of values were 3, it would’ve been 3 divided by the reciprocal total.
So, the harmonic mean is approximately 4.8.
Definition: The geometric mean is used for sets of numbers that are multiplied together or when dealing with percentages. You multiply all the numbers together, then take the nth root, where n is the number of values.
Formula:
[
\text{Geometric Mean} = \sqrt[n]/Product of all values}}
]
Example:
Consider these numbers: 2, 8.
- Product: (2 x 8 = 16)
-
Geometric Mean: (\sqrt{16} = 4)
It’s square root cuz the values are 2.
If it was 3 values, it would’ve been cube root
So, the geometric mean is 4.
- Arithmetic Mean: Average of numbers (e.g., (7) for (4, 7, 10)).
- Harmonic Mean: Useful for rates (e.g., (4.8) for (4, 6)).
- Geometric Mean: Average for multiplicative datasets (e.g., (4) for (2, 8)).
How is calculation for mean of ungrouped data different from mean of grouped data
for a grouped frequency distribution table with 20-40,40-60,60-80 and 2,4,6 as their respective frequencies
The mean is?
Mean for ungrouped data is different from mean for grouped data.
Mean of ungrouped data is the sum of all the numbers divided by the total number of the numbers
For grouped data, you construct a table and give a range or categories. Then you find the frequency of the categories. Example 0-5,6-10,11-25
How many people fell within 0-5,how many between 6-10
Calculation for grouped data is sigma fx divided by sigma f. Where sigma f is the frequencies, x is the midpoint
So for a grouped frequency distribution table with 20-40,40-60,60-80 and 2,4,6 as their respective frequencies
The mean is:
20-40 midpoint is 60/2=30
40-60 midpoint is 100/2=50
60-80 midpoint is 140/20=70
So sigma fx = 30x2,50x4,70x6
60+200+420
=680
F=12
So sigma fx/sigma f is 680/12
=56.66
So mean for this grouped data is 56.67
What is the median under measures of central tendency
What is the median of these values: 2, 4, 1, 2, 3, 3, 1
Find the median of the following values: 6, 5, 2, 4, 3, 2
Median: it is the middle value of the set of n observations after they have been arranged in order of magnitude
For a set of n observations, the median is identified by using the formula (n+1)/2
Medians are not sensitive to unusual values
Examples
What is the median of these values: 2, 4, 1, 2, 3, 3, 1. Median is 1,1,2,2,3,3,4. Median is 2
Find the median of the following values: 6, 5, 2, 4, 3, 2
2,2,3,4,5,6
N/2 is 6/2 is 3rd position
N/2 +1 is 4th position
So 3+4/2 is 3.5. Median is 3.5
For odd number- (n+1)/2 to get the position after you’ve arranged the values in order of magnitude
For even number- n/2 to find the first middle number and (n/2)+1 to find the second middle number. So if x is first middle and y is second middle then x+y/2 is the median of the even set of values
How is calculation for median in odd numbers different from even set of numbers
Median:
Odd number of observations calculation is different from even number . Note you can sort it in descending or ascending order
Even number of observations : arrange in ascending or descending orders. N+1/2.
The number you get is nth number that has the median number.
So for 1122334, n is 7. 7+1 divided by 2 is 4. So the 4th number is the median number so 2 is the median.
Mode:most frequent value
From an ungrouped frequency, the mode is the value with the highest frequency
What are measures of dispersion under numerical summary measures
What is range?
Is range sensitive to unusual values? Why?
Find the range of the following observations:
5, 4, 6, 2, 2, 3, 3, 8
2, 3, 3, 1, 3, 5, 40
Measures of Dispersion
These are measures used to describe the spread (or variability) of the data.
Range: It is the difference between the minimum and maximum values of a dataset.
It is more useful when the minimum and maximum values are reported
Since range depends on only the minimum and maximum values, it is sensitive to unusual values.
First sort out the values in ascending order else you’ll miss it if there’s a bigger value there.
1. 6-2 is 4 so range is 4
2.40-1 is 39 so range is 39
Number 1 I had it wrong cuz I didn’t arrange it so I didn’t see the 8
So 1. 8-2 is 6. Range is 6
Number 2 is correct
What is interquartile range
Difference between IQ and Q2
Interquartile Range: It the difference between the Upper quartile (75th percentile) and the lower quartile (25th percentile)
The interquartile range contains middle 50% of the observations in a dataset
It is not sensitive to unusual values
The following steps are used to find the 25th and 75th percentiles:
The interquartile range (IQR) and the median (Q2) both relate to measures of central tendency and dispersion but serve different purposes. Here’s a detailed comparison:
-
Interquartile Range (IQR):
-
Definition: The IQR measures the spread of the middle 50% of the data. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1):
[
\text{IQR} = Q3 - Q1
] - Purpose: It provides an indication of the variability or dispersion of the central 50% of the data, excluding the lower 25% and the upper 25%. It helps to identify the spread of data and is useful for detecting outliers.
-
Definition: The IQR measures the spread of the middle 50% of the data. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1):
-
Median (Q2):
-
Definition: The median, or the second quartile (Q2), is the middle value of the dataset when it is ordered. It divides the dataset into two equal halves:
[
Q2 = \text{median}
] - Purpose: It represents the 50th percentile and indicates the central value of the dataset. It shows where the center of the data lies, but it does not measure the spread or variability.
-
Definition: The median, or the second quartile (Q2), is the middle value of the dataset when it is ordered. It divides the dataset into two equal halves:
Key Differences:
-
Measurement Focus:
- The IQR measures the spread of the central 50% of the data.
- The Median (Q2) measures the center of the data distribution.
-
Calculation:
- The IQR is derived from Q1 and Q3, focusing on the range between these quartiles.
- The Median (Q2) is a single value representing the middle of the dataset.
-
Utility:
- The IQR helps in understanding the distribution and identifying outliers.
- The Median helps in understanding the central tendency of the data.
In summary, while both IQR and the median relate to the distribution of data, they serve different purposes: IQR measures the range within which the central 50% of data falls, while the median indicates the middle point of the data.
How do you get the 25th quartile?
Don’t use this one
I’ve found an easier one
Interquartile range -quartile is 4 or something being divided into 4. 100 divided into 4 is 25,50,75 and 100.
IQ=Q3-Q1
1.Arrange values you have in ascending order
Find the median position
Find the median position of the median😂.
So if the median position is 5th then find the median of this so 5+1 divided by 2.
The position you get and the number it correlates with is Q1 or the lower quartile
K is the percentile you’re interested in
How do you get the 75th quartile
Find the interquartile range of the observations below
1. 2, 4, 4, 1, 1, 5, 4, 4, 4
- 6, 6, 2, 1, 3, 5, 6, 7, 8
112444445
|||||||||
123456789
The numbers with their positions so 5 is the 9th position
Step 1: rearrange in ascending order
Step 2: find the median of the number set above
So 9+1/2= 5
So the number that correlates with the fifth position is the median so 4 is the median.
Now, split from 1-5 into two and split from 5-9 also to find lower quartile. So yoj find the median of the median position.
So 5+1/2=3
So the value that corresponds to the third position is 2,so 2 is the lower quartile.
The third quartile or 75th quartile is 3(n+1)/2 so n =9. The final answer is 7.5 so rounded up to 8(you never use decimals. Always round to the highest nearest whole number. Round the result to the immediate integer larger than the result to give you the location of the kth percentile. For example 3.12 has to be rounded to 4 and not 3)
So the number corresponding with the 8th position is 4
So IQ=Q3-Q1
So IQ is 4-2(don’t subtract it. It is a range not a value. Leave your final answer as 4-2)
Or
Start with rearranging numbers in ascending order
Q1= n+1/4 so the position is the answer then you relate it to the dataset you have
Q2 is the same as finding the median of a regular set
Q3= 3(n+1)/4 so the number that this position corresponds to is the 75th quartile
So using this example on the dataset, it’s the same thing to get the same answer. So Q1= 9+1/4= 2.5 and this is rounded to 3. And this three is the third position which correlates to 2. So 2 is the lower quartile
Then q3=3(9+1)/4
Is 7.5 this is rounded to 8 and so the number correlating to the 8th position is 4. So Q3 is 4 hence IQ is 4-2
What term encompasses the methods of collecting, summarizing, analysing and drawing conclusions from the data?
What term is given to a set of recorded observations on one or more variables?
What is the term given to a characteristic (personal aspect ) or attribute (feel, behave, or think) of an individual or
an organization that can be measured or observed and varies among individuals or organizations ?
What is Statistics?
●It encompasses the methods of collecting, summarizing, analysing and drawing conclusions from the data.
●Data: is a set of recorded observations on one or more variables.
Eg. Data on demography of patients with particular illness.
● A variable is a characteristic (personal aspect ) or attribute (feel, behave, or think) of an individual or
an organization that:
(a)can measure or observed
(b) varies among individuals or organizations