Introduction To Data Flashcards
3 components of statistics
Collect
Analyze
Infer
Study of how best to collect, analyze and draw conclusions from data
Statistics
In a study, the group that provides the reference point against the treatment group is
Control group
Single number summarizing a large amount of data
Summary statistic
The first step in most analyses
Effective presentation and description of data
Each row in the table is the
Case
Each column on the table is a
Variable
Row + column
Data matrix
Another term for case
Unit of observation or an observational unit
A variable with values that can be added, subtracted or averaged is
Numerical
A numerical value that cannot take non negative numbers is
Discrete
Variables that denotes classification is
Categorical
The possible values of categorical is
Level
Categorical variable with levels of natural ordering is
Ordinal
When two variables show some connection with one another, they are called ___________________ or _____________________ variables.
Associated; dependent
If a variable increase and the other decrease, there is
Negative association
If the variable increase, and the other increase, this is
Positive association
If two variables are not associated, this is
Independent
Can a pair of variable be associated and independent at the same time?
No
Each research question refers to a target
Population
A subset of cases which is a small fraction of the population is known as
Sample
Data collected in haphazard fashion is
Anecdotal evidence
If someone was permitted to pick and choose exactly the included subjects in a sample, this introduces _____________ into a sample.
Bias
Most basic random sample is
Simple random sample
In simple random sample, each case in a population has a/an __________ chance of being included
Equal
Bias can crop up. If only 30% of people randomly sampled actually responded, it is unclear whether the results are __________________ of the entire population. The _____________ bias can skew results.
Representative / non response
When individuals who are easily accessible are more likely included in the sample, this is _____________________.
Convenience sample
Explanatory variable might affect
Response variable
Association implies causation. True or false.
Not always. False.
Two primary types of data collection
Observational studies
Experiments
Collecting data in a way that does not directly interfere with how the data arise is
Observational study
When researchers want to investigate the possibility of a causal connection, they conduct a/an
Experiment
When individuals are randomly assigned to a group, the experiment is called a
Randomized experiment
In a two group experiment, the fake treatment is called a
Placebo
Causation can only be inferred from a ______________.
Randomized experiment
A variable correlated with both the explanatory and response variables
Confounding variable
Two forms of observational studies
Prospective
Retrospective
What observational study identifies individuals and collects information as events unfold
Prospective
What observational study collect data after events have taken place, eg, researchers review past events in medical records
Retrospective
Three random sampling techniques
Simple
Stratified
Cluster
Most intuitive form of random sampling
Simple random sampling
Fishbowl is
Simple random
Divide and conquer sampling strategy
Stratified sampling
When similar cases are grouped together, then simple random sampling is employed in each group, this is
Stratified sampling
A two-stage simple random sample is
A cluster sample
This is similar to stratified sampling but no requirement
Cluster
Studies where researchers assign treatments to cases are called
Experiments
Four principles of experimental design
Controlling
Randomizing
Replication
Blocking
Asking all patients to drink a 12 ounce of water with the pill demonstrates
Control
To even out differences and prevent accidental bias, what is done?
Randomization
Verifying an earlier finding to make it more accurate requires
Replication
If variables influence a response, split the cases in categories, then split the distribution. This is
Blocking