Kap 1 - Introduction to data Flashcards

1
Q

Randomized Experiment

A

When individuals are randomly assigned to a group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Anacdotal evidence

A

Lítið av data, kann vera . Um man hevur eitt data og persónurin doyr eftir at hava tikið medisinið, kann tað vera ein ekstrem case av medisinum, meðan tað kann gott vera eitt gott medisin alíkavæl.
Be careful of data collected in a haphazard fashion. Such evidence may be true and veriable,
but it may only represent extraordinary cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a summary statistic?

A

A summary statistic is a single number

summarizing a large amount of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is variables?

A

Columns represent characteristics, called variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a data matrice?

A

A table. A convenient and common way to organize data, especially if collecting data in a spreadsheet. Each row of a data matrix corresponds to a unique case
observational unit), and each column corresponds to a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a case or observational unit?

A

Ein rekkja í einari talvu.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a numeric variable?

A

numerical variable since it can take a wide

range of numerical values, and it is sensible to add, subtract, or take averages with those values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a discrete variable?

A

Discrete variables are whole integers in a specific range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a continuous variable?

A

It is numeric and can be all numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a categorical variable?

A

Usually categories. Usually text. The possible values in a category is called the variable’s levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are ordinal and nominal variables?

A

An ordinal variable is a categorical variable but the levels have a natural ordering, while a regular categorical variable without this type of special ordering is called a nominal variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a scatterplot?

A

Scatterplots are one type of graph used to study the relationship between two numerical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is it called when two variables show some connection with one another?

A

When two variables show some connection with one another, they are called
associated variables. Associated variables can also be called dependent variables and vice-versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is it called when two variables don’t have a connection with one another?

A

Independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

explanatory and response variables?

A

When we suspect one variable might causally affect another, we label the frst variable the
explanatory variable and the second the response variable.
For many pairs of variables, there is no hypothesized relationship, and these labels would not
be applied to either variable in such cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an observational study?

A

Researchers perform an observational study when they collect data in a way that does not
directly interfere with how the data arise. For instance, researchers may collect information via
surveys, review medical or company records, or follow a cohort of many similar individuals to form
hypotheses about why certain diseases might develop. In each of these situations, researchers merely
observe the data that arise. In general, observational studies can provide evidence of a naturally
occurring association between variables, but they cannot by themselves show a causal connection.
observational studies are generally only sucient to show associations or form
hypotheses that we later check using experiments.

17
Q

What is an experiment?

A

When researchers want to investigate the possibility of a causal connection, they conduct an
experiment. Usually there will be both an explanatory and a response variable. For instance, we
may suspect administering a drug will reduce mortality in heart attack patients over the following
year. To check if there really is a causal connection between the explanatory variable and the
response, researchers will collect a sample of individuals and split them into groups. The individuals
in each group are assigned a treatment.

18
Q

What is a randomized experiment?

A

When people are randomly picked into a group of an expermient

19
Q

What is a placebo?

A

Fake treatment.

20
Q

What is a sample

A

Often times, it is too expensive
to collect data for every case in a population. Instead, a sample is taken. A sample represents
a subset of the cases and is often a small fraction of the population. For instance, 60 swordsh
(or some other number) in the population might be selected, and this sample data may be used to
provide an estimate of the population average and answer the research question.

21
Q

What is the problem with selecting samples by hand?

A

When selecting samples by

hand, we run the risk of picking a biased sample, even if their bias isn’t intended.

22
Q

What is a simple random sample?

A

The most basic random sample is called a simple random sample, and which is
equivalent to using a raffle to select cases. This means that each case in the population has an equal
chance of being included and there is no implied connection between the cases in the sample.

23
Q

What is the problem if the non-response rate is high?

A

if only 30% of the people randomly sampled for
a survey actually respond, then it is unclear whether the results are representative of the entire
population. This non-response bias can skew results.

24
Q

What is a convenience sample?

A

where individuals who are easily accessible

are more likely to be included in the sample.

25
Q

What is a confounding variable?

A

A variable that is correlated with both the explanatory and response variables. While one method to justify making causal conclusions from observational studies is to exhaust the search for confounding variables, there is no guarantee that all confounding variables can be examined or measured.

Example:
Some previous research tells us that using sunscreen actually reduces skin cancer risk, so maybe
there is another variable that can explain this hypothetical association between sunscreen usage and
skin cancer. One important piece of information that is absent is sun exposure. If someone is out
in the sun all day, she is more likely to use sunscreen and more likely to get skin cancer. Exposure
to the sun is unaccounted for in the simple investigation.

26
Q

What is a prospective study?

A

identies individuals and collects information as events unfold. For instance, medical
researchers may identify and follow a group of patients over many years to assess the possible in
uences
of behavior on cancer risk.

27
Q

What is a retrospective study?

A

Retrospective studies collect data after events have taken place, e.g. researchers may review past events in medical records.

28
Q

What are the four random sampling techniques?

A

simple, stratified, cluster, and multistage sampling

29
Q

What is a simple random sampling?

A

In general, a sample is referred to as \simple random”
if each case in the population has an equal chance of being included in the nal sample and knowing
that a case is included in a sample does not provide useful information about which other cases are
included.

30
Q

What is stratified random sampling?

A

Stratied sampling is a divide-and-conquer sampling strategy. The population is divided
into groups called strata. The strata are chosen so that similar cases are grouped together, then a
second sampling method, usually simple random sampling, is employed within each stratum. In the
baseball salary example, the teams could represent the strata, since some teams have a lot more
money (up to 4 times as much!). Then we might randomly sample 4 players from each team for a
total of 120 players.
Stratied sampling is especially useful when the cases in each stratum are very similar with
respect to the outcome of interest. The downside is that analyzing data from a stratied sample
is a more complex task than analyzing data from a simple random sample. The analysis methods
introduced in this book would need to be extended to analyze data collected using stratied sampling.

31
Q

What is a cluster sample?

A

In a cluster sample, we break up the population into many groups, called clusters. Then
we sample a xed number of clusters and include all observations from each of those clusters in the
sample.
Sometimes cluster or multistage sampling can be more economical than the alternative sampling
techniques. Also, unlike stratied sampling, these approaches are most helpful when there is a lot of
case-to-case variability within a cluster but the clusters themselves don’t look very different from one
another.

32
Q

What is a multistage sample?

A

A multistage sample is like a cluster sample,
(In a cluster sample, we break up the population into many groups, called clusters. Then
we sample a xed number of clusters and include all observations from each of those clusters in the
sample), but rather than keeping all observations in
each cluster, we collect a random sample within each selected cluster.
Sometimes cluster or multistage sampling can be more economical than the alternative sampling
techniques. Also, unlike stratied sampling, these approaches are most helpful when there is a lot of
case-to-case variability within a cluster but the clusters themselves don’t look very different from one
another.

33
Q

Randomized experiments are generally built on which four principles?

A

Controlling, randomization, replication and blocking.

34
Q

What is the controlling principle about?

A

Researchers assign treatments to cases, and they do their best to control any other
differences in the groups.27 For example, when patients take a drug in pill form, some patients
take the pill with only a sip of water while others may have it with an entire glass of water. To
control for the eect of water consumption, a doctor may ask all patients to drink a 12 ounce
glass of water with the pill.

35
Q

What is the randomization principle about?

A

Researchers randomize patients into treatment groups to account for variables
that cannot be controlled. For example, some patients may be more susceptible to a disease
than others due to their dietary habits. Randomizing patients into the treatment or control
group helps even out such differences, and it also prevents accidental bias from entering the
study.

36
Q

What is the replication principle about?

A

The more cases researchers observe, the more accurately they can estimate the effect
of the explanatory variable on the response. In a single study, we replicate by collecting a
suffciently large sample. Additionally, a group of scientists may replicate an entire study to
verify an earlier finding.

37
Q

What is the blocking principle about?

A

Researchers sometimes know or suspect that variables, other than the treatment, influence the response. Under these circumstances, they may rst group individuals based on this
variable into blocks and then randomize cases within each block to the treatment groups. This
strategy is often referred to as blocking. For instance, if we are looking at the eect of a drug
on heart attacks, we might rst split patients in the study into low-risk and high-risk blocks,
then randomly assign half the patients from each block to the control group and the other half
to the treatment group, as shown in Figure 1.16. This strategy ensures each treatment group
has an equal number of low-risk and high-risk patients.

38
Q

What is a blind study?

A

When researchers keep the patients uninformed about if they are in the treatment- or the control group.

39
Q

What is a double-blind study?

A

The patients are not the only ones who should be blinded: doctors and researchers can accidentally
bias a study. When a doctor knows a patient has been given the real treatment, she
might inadvertently give that patient more attention or care than a patient that she knows is on
the placebo. To guard against this bias, which again has been found to have a measurable eect
in some instances, most modern studies employ a double-blind setup where doctors or researchers
who interact with patients are, just like the patients, unaware of who is or is not receiving the
treatment.