Chapter 1 - Data Visualisation Flashcards
Remove observations (rows) with missing values
DF = DF %>% tidyr::drop_na()
Dpylr::Glimpse()
Str()
Inspect structure of objects
Matrix[1,2]
Access element in 1st row, 2nd column
Matrix[1:3, 2:4]
Returns matrix with data from rows 1, 2, 3 and columns 2, 3, 4
Matrix[ ,1]
All elements of the 1st column
Literary digest poll (1936)
Roosevelt won a high no. Of votes fo US presidential election despite predicted victory for Landon.
10 million people were polled via mail survey of which 2.4 responded back.
- Americans suffered from Great Depression then
Survey sampling
- selecting subset of observations from an entire population to draw conclusions about the whole population
- consider design, maximise profit, high reliability, optimisation with less resources
- type of survey design that’s appropriate for the study
Selection bias
Taking a large sample thinking it would make a difference (repetition of error on a large scale)
- inaccurate representation of the population
- getting opinion based on certain qs asked e.g should a doctor be allowed to murder unborn kids who can’t defend themselves?
Measurement bias
- recall bias
- misinterpret questions
- sensitive questions
- wording of question
- sampling method influences data obtained
Non-response bias
- certain groups are under-represented due to lack of participation
E.g those that choose to not respond/ not participate in research experiment, those that don’t want to give a tip/fill out ‘customer satisfaction’ after eating at restaurant
Randomised controlled double blind trials
Allocates subjects into a treatment + control group randomly
- neither subjects nor investigators know who’s in which group
- investigator compares responses of the 2 groups and if there is a difference in response it’s likely due to be caused by the treatment
Observational studies
- doesn’t establish causation, only establishes association
- used in educational research
- if had confounding variable can lead to Simpson’s Paradox
Confounding variables
When both treatment + control group differ by some 3rd variable that influences the response that is studied
- e.g if not all subjects keep taking the treatment/placebo, we can get confounding of adherers/non-adherers