5 | Introduction To Statistics Flashcards
(POLL)
Two sections of one hospital have very different survival rates for patients with heart problems. Station 2 (second floor) has much better performance, than station 1 (ground floor)? Who is most likely responsible?
- The doctors, on station two, they are just better!
- The porter, who asks the patients usually: “Feel you fit to walk the stairs into the second floor?”.
- The patient, who just feel better in higher floors?
- None of them
The porter, who asks the patients usually: “Feel you fit to walk the stairs into the second floor?”.
(POLL)
Looking at the relation between black schokolade consumption and IQ is:
- correlational research
- experimental research
- making me hungry
- none of these suggestions
correlational research
(POLL)
Double blind trials are clinical trials where neither the patient nor the doctor knows the medication, select true statements:
- they are correlational research
- they are experimental research
- they are worse than observational studies because they increase selection bias
- they are better than observational studies because they decrease selection bias
- they have no selection bias
- they still can have selection bias
- they are experimental research
- they are better than observational studies because they decrease selection bias
- they still can have selection bias
(POLL)
To evaluate an outcome of a patient after a virus infection the following boxes were prepared for a survey: asymptomatic, common cold, long term suffering, dead … What type of variable is this?
- discrete numerical 0, 1, 2, 3 etc for the levels
- continuous numerical 0.0 1.0, 2.0, 3.0 for the level
- nominal categorical, 0 (asymptomatic), 1 (cold), 2 (suffer a lot), 3 (dead)
- ordinal categorical, 0 (asymptomatic), 1 (cold), 2 (suffer a lot), 3 (dead)
ordinal categorical, 0 (asymptomatic), 1 (cold), 2 (suffer a lot), 3 (dead)
(POLL)
Which of the following measures are robust against outliers?
- mean
- trimmed mean
- Median
- trimmed mean
- Median
(POLL)
Which measures give information about the spread of the data
- mean
- trimmed mean
- median
- IQR
- sd
- z-score
- IQR
- sd
Name some terms from statistics which are deceptive when compared with the terms in the context of science
significant, error, hypothesis
What is the aim of statistics
descriptive:
- describe and summarize the data
- visualize trends for better understanding
inferential –> make conclusions from sample to population:
- decide whether two groups are different
- decide if sample is different to total population
- describe the relationship between two variables
Statistics:
- can decide if data difference is just a random one
- just estimates the “degree” of randomness
- can’t tell you how likely it is that there is really a difference
Sampling: requirements / best practices?
- sample items must be randomly selected (not some subset)
- items must be independent from each other (selection of one item should not alter the chance of other items to be selected)?
- having a plan before entering the greenhouse
- more samples are better
- if groups (eg experiment and control): balanced sampling is better
Name some common sampling problems
- your cohort and topic might change over time (cancer, virus)
- true population is more diverse than the population you were sampling from
- using a convenience sample rather than a random sample (falling cats)
- your measured variable ist just a proxy for an other variable (poll)
- imprecise measurements (misunderstandings, wrong scale for some people)
- combination of different measurements required
- even with clear results: there is room for interpretation
- statistics can’t help against bad experimental design
Three examples of poor sampling
- study about cats surviving falls: those that ended up in the bin not part of study, only ones that got taken to the vet.
- telephone interview: 1936 election in US Landon/Roosevelt predicuted landslide victory for Landon, but telephone owners disproportionately conservative/republican
- coronavirus in DE: total cases increasing, but no real random sampling took place, eg to check antibodies. Numbers based on reported illnesses
Name 4 different types of sampling strategies
- simple random sampling (SRS
- systematic sampling
- stratified sampling
- cluster sampling
Simple random sampling?
simple random sampling:
all subsets of a sampling frame have an equal probability of being selected
eg throw a dice to decide who to choose
Systematic sampling?
systematic sampling
relies on arranging the study population according to some ordering scheme and then selecting elements at regular intervals through that ordered list.
eg every kth person
Stratified sampling?
stratified sampling
making subgroups based on categories eg male and female
Clustered sampling?
Sometimes it is more cost-effective to select respondents in groups (‘clusters’). Sampling is often clustered by geography, or by time periods.
Research Methods for the Sample - two types?
correlational and experimental
Correlational vs experimental research - what’s the difference?
Explain with the example: does reading books help with learning?
correlational research
– we don’t manipulate a variable
experimental research
– we manipulate a variable
Eg: does reading books help with learning?
correlational research
– we don’t manipulate a variable
– we observe what happens naturally w/o interfering
– we just collect answers for reading behaviour
experimental research
– we manipulate a variable
– divide our sample for reading into two groups
– one group must read statistic books
– other group is not allowed to read statistics books
– after a month we summarize
Correlational vs experimental - what is the preferred method?
experimental
Correlational vs experimental - why would you choose correlational?
It’s difficult to research experimentally for some reason:
- ethical reasons
- financial reasons
What are the Four Different Levels of Measurement, in descending level or precision?
Data types:
Categorical (quality)
- nominal
- ordinal
Numerical (quantity)
- discrete
- continuous
(often not clear distinction eg with height)
Define this level of measurement and give an example:
nominal
A categorical level of measurement that doesn’t have any order.
eg
- gender: female, male (binary)
- smoker: yes, no (binary)
- protein structures: H, E, …
- nucleotides: A,C, G, T, U
Define this level of measurement and give an example:
ordinal
A categorical level of measurement with a specified order.
eg:
- age: young,medium,old
- grade: 1,2,3,4,5
- month: 01..12(?)
Define this level of measurement and give an example:
discrete
A numerical level of measurement, of discrete numbers.
eg:
– age: 6, 8, 84
– height: 112, 176, 161cm
– number of helices per 1000 AA
– length of helices
Define this level of measurement and give an example:
continuous
A numerical level of measurement, non-discrete
eg:
– weight: 79.99kg, 72kg,…
– height: 12.2, 12.5, 15.0