5 | Introduction To Statistics Flashcards
(POLL)
Two sections of one hospital have very different survival rates for patients with heart problems. Station 2 (second floor) has much better performance, than station 1 (ground floor)? Who is most likely responsible?
- The doctors, on station two, they are just better!
- The porter, who asks the patients usually: “Feel you fit to walk the stairs into the second floor?”.
- The patient, who just feel better in higher floors?
- None of them
The porter, who asks the patients usually: “Feel you fit to walk the stairs into the second floor?”.
(POLL)
Looking at the relation between black schokolade consumption and IQ is:
- correlational research
- experimental research
- making me hungry
- none of these suggestions
correlational research
(POLL)
Double blind trials are clinical trials where neither the patient nor the doctor knows the medication, select true statements:
- they are correlational research
- they are experimental research
- they are worse than observational studies because they increase selection bias
- they are better than observational studies because they decrease selection bias
- they have no selection bias
- they still can have selection bias
- they are experimental research
- they are better than observational studies because they decrease selection bias
- they still can have selection bias
(POLL)
To evaluate an outcome of a patient after a virus infection the following boxes were prepared for a survey: asymptomatic, common cold, long term suffering, dead … What type of variable is this?
- discrete numerical 0, 1, 2, 3 etc for the levels
- continuous numerical 0.0 1.0, 2.0, 3.0 for the level
- nominal categorical, 0 (asymptomatic), 1 (cold), 2 (suffer a lot), 3 (dead)
- ordinal categorical, 0 (asymptomatic), 1 (cold), 2 (suffer a lot), 3 (dead)
ordinal categorical, 0 (asymptomatic), 1 (cold), 2 (suffer a lot), 3 (dead)
(POLL)
Which of the following measures are robust against outliers?
- mean
- trimmed mean
- Median
- trimmed mean
- Median
(POLL)
Which measures give information about the spread of the data
- mean
- trimmed mean
- median
- IQR
- sd
- z-score
- IQR
- sd
Name some terms from statistics which are deceptive when compared with the terms in the context of science
significant, error, hypothesis
Explain the difference between dependent and independent variables and give an example
dependent:
– depends on another –> an outcome variable
independent
– variable influences another (dependent) variable –> is a predictor variable
– might be manipulated
example:
let’s assume we can predict weight based on sex
and height (machine learning!)
– weight is the outcome variable (dependent)
- sex and height are predictor variables (independent)
Define descriptive statistics.
vs inferential?
📊descriptive statistics:
- describe main features of data (sample)
- in quantitative terms
vs inferential statistics:
- used to support inferential statements
- data (sample👥) –> population 👥👥👥
Define inferential statistics
Statistical inference or statistical induction comprises the use of statistics and random sampling to make inferences concerning some unknown aspect of a population.
Name the different sample data centers
- Modus: most frequent value
- Median: value where 50% of data are smaller and 50% of
data are larger (robust against outliers) - IQR (interquartile range) = 3.Quart-1.Quart = mid 50%
- Mean
Which sample data center can be used for nominal datatypes?
- modus
Which sample data center can be used for ordinal datatypes?
- modus
- (median ?)
- (mean ?)
Which sample data center can be used for numerical datatypes?
- median
- mean
How can one describe the sample distribution?
with:
- max, min
- quantile
- IQR (interquartile range)
- standard deviation
- CV (coefficient of variation)
Do the following describe the sample distribution?
- SEM
- CI
- P-value
They describe more the population
SEM:
What does this stand for?
What does it measure?
How is it calculated?
stands for:
standard error of the mean.
measures:
likelihod of discr. in sample’s mean vs pop mean.
calculate:
SD / sqrt(N)
CI:
What does this stand for?
What does it measure?
How is it calculated?
Confidence interval
Range of values estimate expected to fall between if test redone, within certain level of confidence.
Confidence, in statistics, is another way to describe probability.
CI = mean of estimate plus and minus variation in that estimate.
P-value
What does this stand for?
What does it measure?
How is it calculated?
p stands for probability
P values used in hypothesis testing to help decide whether to reject null hypothesis (inferential statistics)
Describes how likely you are to have found a particular set of observations if the null hypothesis were true.
Calculated from a statistical test.
IQR
Stands for?
Meaning?
Interquartile range.
In descriptive statistics: tells you spread of middle half of distribution.
Quartiles segment any distribution that’s ordered from low to high into four equal parts.
The interquartile range (IQR) contains the second and third quartiles, or the middle half of your data set
What is meant by parameters vs statistics?
A parameter is a number describing a whole population (e.g., population mean)
A statistic is a number describing a sample (e.g., sample mean).
we use the sample to estimate the parameters of the
population
Parameters and statistics have the same name but mean different things (eg mean of sample ȳ ≈ μ, mean of the population)
How can you tell which one is parameter and which one is statistic? eg ȳ, μ
Usually:
- use latin letters for statistics (sample)
- use greek letters for parameters (population
How is deviation of samples and populations described?
- **deviation from mean: s, sd **
- (Sample variance s2, population variance σ2)
How is the uncertainty/quality of results described, in inferential statistics?
SEM
CI