Kvantitativ statistik lecture 1 Flashcards
What are the major types of data statisticians work with
interval, nominal, ordinal
Why do we use sampling in statistics? What is the meaning of random sampling, can you
provide an example
To minimize errors for researches, if wanna say something about a big population there is a big risk for error s, but if they use sampling they can minimise they errors because when say something about a sampling is a little piece of the population
Why do we use sampling in statistics? What is the meaning of random sampling, can you
provide an example?
We use sampling in statistic to say something about the whole population.
- Random sampling example
- If we have 1.000 students and we wants to estimate the average gpa.
- we assign a unique number to all 1.000 students.
- We use a random number generator to select 100 students, so every students have a equal chance to being included.
- We collect data from the 100 students.
What are some of the challenges or issues in sampling of data? List out any three.
- Selection bias: Selection bias occurs when the sampling method used doesn’t truly represent the population being studied.
- Sampling frame issues: Choosing an appropriate sample size is critical. If the sample size is too small, it may not provide enough information to draw meaningful conclusions or detect significant patterns
Non-Response Bias: Non-response bias occurs when a significant portion of the selected sample does not participate or respond to the survey or study.
What is mean, median? Why should we care about these statistics?
- Mean reference to the average in sample or population.
- Median is the middle value in a population or sample.
- To get a understanding of your data. easy to compare to other data. Can make you a decision on what the average salary is in a company or what is the median in the students gpa.
What does standard deviation in a dataset tell us? What does it mean when standard
deviation is small or large?
- Standard deviation tells us how much the dataset is spread out.
- If there is a small deviation. The dataset is close around the mean.
- If there is a large deviation. The dataset is spread out and not around the mean.
Define numerical variable and categorical variable. Provide examples.
Numerical variable. Variables which you can performed mathematical on, for example, age, income, temperature.
- Categorial variable. Variables which you can not performed mathematical on, for example, Gender, Education level, Marital status.
Provide examples for discrete and continuous data.
- Discrete data. Is data where is whole numbers you get. Roll a dice you get a whole number.
- Continuous data. Is data where you can get decimal or fractional after the whole number. Heigh of induvial is a good example.
List out one difference between discrete and continuous variables. Provide an example of
each.
One difference between discrete and continuous variables is that discrete is in whole numbers and continuous, there is decimal or fractional after the whole number.
- Discrete data. Roll a dice you get a whole number.
- - Continuous data. Heigh of induvial, because each induvial have a different height and often have decimal or fraction after the whole number.
What does it mean when one says - “90th percentile”?
- It means that the value calculated below 90% of the data falls.
What does it mean when one says - “10th percentile”?
- This is the value below 10% of the data falls.
If one flips a fair coin many times, what is the probability of getting a heads?
- 50%
If one flips a fair coin many times, what is the probability of getting a tails?
50%
. If one rolls a fair six-sided die {1, 2, 3, 4, 5, 6}, what is the probability of each face
occurring?
1/6 or 16,67%
Explain any one symmetric distribution - either discrete or continuous.
Normal distribution is a symmetric distribution, which describes discrete and continuous. Normal distribution have a peak which divides the data symmetric.
Properties of a binomial distribution, mention any two. Provide an example
Fixed number of trials, which often is labeled success or failure.
- Independence of trials. Each trail is independent, which means one trail do not affect the outcome on any other trail.
- We can take a look on coinflip, where the fixed number is 3 in this example. And when I flip the coin 3 times, the first flip do not affect the 2 other flips.
List out properties of the normal distribution. Mention any three.
symmetry, the normal distribution is symmetry which means is have a peak and the sides is evenly distributed.
- Unimodality, It means it have a single peak.
- Asymptotic Tails, It means the tails is extend indefinitely in both directions and it never reaches zero.
Properties of continuous random variables. Mention any three.
- Infinite possible values, it can take infinite number of continuous random variables.
- Probability Density Function, continuous do not assign a probability to induvial values but provides a continuous curve that describes the relative likelihood of different values.
- No Probability mass at individual values. Example if we have 1,2,3 values it does not have a specific value attached to it.
What is the area under the curve for a continuous distribution?
The likelihood of the random variable taking on values within a specific interval.
How does mean and standard deviation affect the shape of the normal distribution?
Mean, if the mean shift either to the right or left the peak follows the normal distribution.
- Standard deviation, affect how wide the normal distribution is or how small the normal distribution is.
What is the relationship between mode, mean and median in a normal distribution?
The all share the same value and location, which is in the center of the normal distribution.
What is an outlier and what are the ways of dealing with them?
- An outlier is a observation or data point, which is significantly different from the rest of the dataset.
- Identify and examine: make a visualizations to identified the outliers.
- You can replace the outlier but I will affect your result.
Under a normal distribution, what interval does 95% of the probability fall within? And
for 90%?
When the interval have a 95% the probability falls within 1.96 z score, it means the interval reaches from -1.96 to 1.96.
- When the interval have a 90% the probability falls within 1.645, it means the interval reaches from -1.645 to 1.645.
What is the empirical rule, and when can it be helpful?
Empirical rule is known as 68, 95, 99,7. Which means that approximately 68% of the data falls within one standard deviation of the mean. Approximately 95% of the data falls within two standard deviation of the mean. Approximately 99,7% of the data falls within three standard deviation of the mean.
- When using the empirical rule is important to notice that it only holds on a perfect normal distribution. It can be helpful to set confidence interval.
Define central limit theorem. Why is the central limit theorem important in statistics?
Regardless of the shape of the original population, the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
- Statistical inference. T-test and confidence intervals rely on CLT to make inference about population parameters, so we can work with real-world data that may not be normally distributed.
What is the Law of large numbers?
The law of large numbers have two versions, Weak law of large numbers and strong law of large numbers.
- The weak law of large numbers states that as you take larger and larger samples from a population and calculate the mean, the probability that the sample mean is close to the population mean.
- The strong law of large numbers states that in almost every possible outcome, the sample mean will equal the population mean when the sample size is sufficiently large.
Explain sampling distribution.
Sampling distribution provides valuable insights into behavior of sample statistics. It forms basis for making inferences about population and assessing the reliability of sample estimates.
In experimental design - what is a treatment group, and what is a control group?
Treatment group consists of subjects or participants who are exposed to the experimental treatment.
-control group is the same as treatment group except for the experimental treatment.