Chapter 1, Intro to Data Flashcards

1
Q

What is a summary statistic?

A

a single number summarizing a large amount of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a proper data set called? and what makes is “proper”?

A

data matrix, each row corresponds to a unique case and each column corresponds to a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formal name for a row?

A

case or observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do columns represent? and what is important to know about them?

A

characteristics, called variables (imp to understand what each variable means, as well as units of measurement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are 2 types of variables?

A

Numerical and Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 2 kinds of numerical variable?

A

Discrete and continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 kinds of categorical variable?

A

Ordinal and nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a discrete numerical variable?

A

a number value that can only be a whole number, e.g. population, since you can’t have half a person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a continuous numerical variable?

A

a number value that can be in between whole numbers, e.g. an hourly pay rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an ordinal categorical variable?

A

a categorical variable that involves an ordering, e.g. educational level attained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a nominal categorical variable?

A

a categorical variable that doesn’t involve an ordering, e.g. color

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are possible categorical variables called?

A

levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What makes 2 variables “associated” or “dependent”?

A

When they show some connection with one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a scatterplot graph useful for?

A

Showing whether or not 2 variables are associated, as well as trends in the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a positive correlation between 2 variables?

A

a relationship where if one variable increases, the other also increases or vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a negative correlation between 2 variables

A

a relationship where if one variable increases, the other decreases or vice versa

17
Q

What are independent variables?

A

variables that aren’t associated

18
Q

What 2 words express whether or not one variable affects another?

A

an explanatory variable (might affect) a response variable

19
Q

What are the 2 primary types of data collection?

A

observational and experimental

20
Q

What makes a study observational? Any why use this method?

A

Research do not interfere directly with how the data arise. (Surveys, collect data from existing records, follow a cohort of similar individuals in studies of diseases). Can provide evidence of association between variables, but can’t show a causal connection. Can give rise to hypotheses to be checked using experiments.

21
Q

Why use an experiment?

A

to investigate the possibility of a causal connection

22
Q

What is a sample?

A

A subset of the population to be studied.

23
Q

Define anecdotal evidence. Why is it a problem?

A

Data collected in a haphazard fashion. May not be representative of the population.

24
Q

What is a non-response rate and why is it important?

A

Non-response rate is the rate at which people in the sample population do not respond. A high non-response rate can skew the results.

25
Q

What is a confounding variable

A

a variable that is correlated with both explanatory and response variables. E.g. sun exposure is related to both the use of sunscreen and skin cancer. Also called a lurking variable, confounding factor, or a confounder.

26
Q

What are the 2 kinds of observational study?

A

Prospective, which identifies individuals and collects info as events unfold
Retrospective, which collects data after events have taken place (e.g. studying medical records)

27
Q

What are the 4 sampling methods?

A
  1. Simple random (SRS): like a lottery
  2. Stratified: divide population into groups of similar individuals (strata), then take a random sample from each group.
  3. Cluster: divide population into groups of dis-similar individuals, then use data from a sample of the clusters
  4. Multistage: involves more than one stage of sampling, 1. cluster sample, then 2 SRS within the selected cluster.
28
Q

What is sampling variability?

A

The natural variation in samples. Unavoidable, doesn’t usually cause problems.

29
Q

What are 2 biased sampling methods? (not everyone in the population has an equal chance of being part of the sample)

A
Convenience sample (people who are easy to reach, like standing on a street and collecting data from people who walks by)
Voluntary response sample: people who have chosen to include themselves in the sample, people with a strong interest in the topic are most likely to respond.
30
Q

What is a population parameter?

A

a number that describes something about an entire group or population

31
Q

What makes a study an experiment?

A

researchers assign treatments to cases. When treatments are assigned randomly, it’s called a randomized experiment.

32
Q

What are the 4 principles randomized experiments are built on?

A
  1. controlling for possible confounding variables such as how much water a person takes a pill with
  2. randomization into treatment and control groups order to account for variables that can’t be controlled
  3. replication: the more often a result is replicated (though a sufficiently large sample), the more accurately the effect of the explanatory variable on the response variable can be estimated
  4. Blocking: using strata within the experimental population, i.e., grouping population into blocks that have certain characteristics, if they suspect those characteristics (variables) may influence the response.
33
Q

What are the 2 ways to employ randomization and what is the benefit of using them?

A
  1. random sampling allows you to generalize results to the target population
  2. random assignment to treatment or control group strengthens the suggestion of causality between explanatory and response variables.