Kvantitativ statistik lecture 1 Flashcards
What are the major types of data statisticians work with
interval, nominal, ordinal
Why do we use sampling in statistics? What is the meaning of random sampling, can you
provide an example
To minimize errors for researches, if wanna say something about a big population there is a big risk for error s, but if they use sampling they can minimise they errors because when say something about a sampling is a little piece of the population
Why do we use sampling in statistics? What is the meaning of random sampling, can you
provide an example?
We use sampling in statistic to say something about the whole population.
- Random sampling example
- If we have 1.000 students and we wants to estimate the average gpa.
- we assign a unique number to all 1.000 students.
- We use a random number generator to select 100 students, so every students have a equal chance to being included.
- We collect data from the 100 students.
What are some of the challenges or issues in sampling of data? List out any three.
- Selection bias: Selection bias occurs when the sampling method used doesn’t truly represent the population being studied.
- Sampling frame issues: Choosing an appropriate sample size is critical. If the sample size is too small, it may not provide enough information to draw meaningful conclusions or detect significant patterns
Non-Response Bias: Non-response bias occurs when a significant portion of the selected sample does not participate or respond to the survey or study.
What is mean, median? Why should we care about these statistics?
- Mean reference to the average in sample or population.
- Median is the middle value in a population or sample.
- To get a understanding of your data. easy to compare to other data. Can make you a decision on what the average salary is in a company or what is the median in the students gpa.
What does standard deviation in a dataset tell us? What does it mean when standard
deviation is small or large?
- Standard deviation tells us how much the dataset is spread out.
- If there is a small deviation. The dataset is close around the mean.
- If there is a large deviation. The dataset is spread out and not around the mean.
Define numerical variable and categorical variable. Provide examples.
Numerical variable. Variables which you can performed mathematical on, for example, age, income, temperature.
- Categorial variable. Variables which you can not performed mathematical on, for example, Gender, Education level, Marital status.
Provide examples for discrete and continuous data.
- Discrete data. Is data where is whole numbers you get. Roll a dice you get a whole number.
- Continuous data. Is data where you can get decimal or fractional after the whole number. Heigh of induvial is a good example.
List out one difference between discrete and continuous variables. Provide an example of
each.
One difference between discrete and continuous variables is that discrete is in whole numbers and continuous, there is decimal or fractional after the whole number.
- Discrete data. Roll a dice you get a whole number.
- - Continuous data. Heigh of induvial, because each induvial have a different height and often have decimal or fraction after the whole number.
What does it mean when one says - “90th percentile”?
- It means that the value calculated below 90% of the data falls.
What does it mean when one says - “10th percentile”?
- This is the value below 10% of the data falls.
If one flips a fair coin many times, what is the probability of getting a heads?
- 50%
If one flips a fair coin many times, what is the probability of getting a tails?
50%
. If one rolls a fair six-sided die {1, 2, 3, 4, 5, 6}, what is the probability of each face
occurring?
1/6 or 16,67%
Explain any one symmetric distribution - either discrete or continuous.
Normal distribution is a symmetric distribution, which describes discrete and continuous. Normal distribution have a peak which divides the data symmetric.