Chapter 1, Intro to Data Flashcards by Deanna Kemler

What is a summary statistic?

a single number summarizing a large amount of data

How well did you know this?

Not at all

Perfectly

What is a proper data set called? and what makes is “proper”?

data matrix, each row corresponds to a unique case and each column corresponds to a variable.

How well did you know this?

Not at all

Perfectly

What is the formal name for a row?

case or observational unit

How well did you know this?

Not at all

Perfectly

What do columns represent? and what is important to know about them?

characteristics, called variables (imp to understand what each variable means, as well as units of measurement)

How well did you know this?

Not at all

Perfectly

What are 2 types of variables?

Numerical and Categorical

How well did you know this?

Not at all

Perfectly

What are the 2 kinds of numerical variable?

Discrete and continuous

How well did you know this?

Not at all

Perfectly

What are the 2 kinds of categorical variable?

Ordinal and nominal

How well did you know this?

Not at all

Perfectly

What is a discrete numerical variable?

a number value that can only be a whole number, e.g. population, since you can’t have half a person

How well did you know this?

Not at all

Perfectly

What is a continuous numerical variable?

a number value that can be in between whole numbers, e.g. an hourly pay rate.

How well did you know this?

Not at all

Perfectly

What is an ordinal categorical variable?

a categorical variable that involves an ordering, e.g. educational level attained

How well did you know this?

Not at all

Perfectly

What is a nominal categorical variable?

a categorical variable that doesn’t involve an ordering, e.g. color

How well did you know this?

Not at all

Perfectly

What are possible categorical variables called?

levels

How well did you know this?

Not at all

Perfectly

What makes 2 variables “associated” or “dependent”?

When they show some connection with one another.

How well did you know this?

Not at all

Perfectly

What is a scatterplot graph useful for?

Showing whether or not 2 variables are associated, as well as trends in the relationship

How well did you know this?

Not at all

Perfectly

What is a positive correlation between 2 variables?

a relationship where if one variable increases, the other also increases or vice versa

How well did you know this?

Not at all

Perfectly

What is a negative correlation between 2 variables

Study These Flashcards

a relationship where if one variable increases, the other decreases or vice versa

What are independent variables?

Study These Flashcards

variables that aren’t associated

What 2 words express whether or not one variable affects another?

Study These Flashcards

an explanatory variable (might affect) a response variable

What are the 2 primary types of data collection?

Study These Flashcards

observational and experimental

What makes a study observational? Any why use this method?

Study These Flashcards

Research do not interfere directly with how the data arise. (Surveys, collect data from existing records, follow a cohort of similar individuals in studies of diseases). Can provide evidence of association between variables, but can’t show a causal connection. Can give rise to hypotheses to be checked using experiments.

Why use an experiment?

Study These Flashcards

to investigate the possibility of a causal connection

What is a sample?

Study These Flashcards

A subset of the population to be studied.

Define anecdotal evidence. Why is it a problem?

Study These Flashcards

Data collected in a haphazard fashion. May not be representative of the population.

What is a non-response rate and why is it important?

Study These Flashcards

Non-response rate is the rate at which people in the sample population do not respond. A high non-response rate can skew the results.

What is a confounding variable

a variable that is correlated with both explanatory and response variables. E.g. sun exposure is related to both the use of sunscreen and skin cancer. Also called a lurking variable, confounding factor, or a confounder.

What are the 2 kinds of observational study?

Prospective, which identifies individuals and collects info as events unfold Retrospective, which collects data after events have taken place (e.g. studying medical records)

What are the 4 sampling methods?

1. Simple random (SRS): like a lottery 2. Stratified: divide population into groups of similar individuals (strata), then take a random sample from each group. 3. Cluster: divide population into groups of dis-similar individuals, then use data from a sample of the clusters 4. Multistage: involves more than one stage of sampling, 1. cluster sample, then 2 SRS within the selected cluster.

What is sampling variability?

The natural variation in samples. Unavoidable, doesn't usually cause problems.

What are 2 biased sampling methods? (not everyone in the population has an equal chance of being part of the sample)

``` Convenience sample (people who are easy to reach, like standing on a street and collecting data from people who walks by) Voluntary response sample: people who have chosen to include themselves in the sample, people with a strong interest in the topic are most likely to respond. ```

What is a population parameter?

a number that describes something about an entire group or population

What makes a study an experiment?

researchers assign treatments to cases. When treatments are assigned randomly, it's called a randomized experiment.

What are the 4 principles randomized experiments are built on?

1. controlling for possible confounding variables such as how much water a person takes a pill with 2. randomization into treatment and control groups order to account for variables that can't be controlled 3. replication: the more often a result is replicated (though a sufficiently large sample), the more accurately the effect of the explanatory variable on the response variable can be estimated 4. Blocking: using strata within the experimental population, i.e., grouping population into blocks that have certain characteristics, if they suspect those characteristics (variables) may influence the response.

What are the 2 ways to employ randomization and what is the benefit of using them?

1. random sampling allows you to generalize results to the target population 2. random assignment to treatment or control group strengthens the suggestion of causality between explanatory and response variables.

Chapter 1, Intro to Data Flashcards

(33 cards)