Exam 1 Terms Flashcards

Question 1

Q

What is a case?

Answer

A

An individual unit that is often a person, place, or thing. A row of data usually represents a case.

Question 2

Q

What are variables?

Answer

A

Variables are a characteristic or measurement that describes the cases. Typically, a column of data represents a variable. Examples are height, weight, age, temperature, time, etc.

Question 3

Q

What are the two main types of variables?

Answer

A

categorical and quantitative

Question 4

Q

What is a categorical variable?

Answer

A

A variable comprised of 2 or more categories. (ex. gender)

Question 5

Q

What is a quantitative variable?

Answer

A

A variable that measures a numerical quantity. (ex. GPA, pulse rate, height)

Question 6

Q

What are the two subsets of quantitative variables?

Answer

A

Continuous and discrete

Question 7

Q

What is a continuous variable?

Answer

A

A type of quantitative variable that can take on an infinite set of values within some range. (ex. temperature, life expectancy, food calories)

Question 8

Q

What is a discrete variable?

Answer

A

A type of quantitative variable that has a finite set of possible values. (ex. number of babies born in a pregnancy, number of courses you are taking next semester)

Question 9

Q

What is a population?

Answer

A

The entire set of cases.

Question 10

Q

What is a sample?

Answer

A

A subset of the population. We collect data for the sample.

Question 11

Q

What is a parameter?

Answer

A

Describes the population. (ex. population mean, GPA for an entire class)

Question 12

Q

What is a statistic?

Answer

A

Describes the sample. (ex. sample mean, GPA for a selection of students in a class)

Question 13

Q

What is statistical inference?

Answer

A

The process of using data from a sample to gain information about the population.

Question 14

Q

Why should we take random samples?

Answer

A

A random sample should be selected from a population, otherwise it may be prone to bias. The goal is to obtain a sample that is representative of the population.

Question 15

Q

What is a representative sample?

Answer

A

A subset of the population from which data are collected that accurately reflects the population.

Question 16

Q

What is bias?

Answer

A

The systematic favoring of certain outcomes.

Question 17

Q

What is sampling bias?

Answer

A

Systematic favoring of certain outcomes due to the methods employed to obtain the sample.

Question 18

Q

What is simple random sampling? Why do we do it?

Answer

A

A method of obtaining a sample where every member of the population has an equal chance of being selected (similar to drawing names from a hat). Samples are selected without replacement.

SRS is done to avoid sampling bias and to obtain a sample that’s representative of a population.

ex. if we wanted to research how long PSU students sleep at night, it would be best to randomly select students for the sample rather than only surveying students in an 8 AM class.

Question 19

Q

What is a convenience sample?

Answer

A

A method of obtaining a sample by ease of accessibility. These samples are NOT random and they may NOT represent the intended population.

Question 20

Q

Besides convenience sampling, what are other sources of bias?

Answer

A

non-response bias
response bias

Question 21

Q

What is non-response bias?

Answer

A

Individuals who do not participate in a study differ from those who do participate.

inability to contact individual
individual chose not to participate

Question 22

Q

What is response bias?

Answer

A

Individuals participate, but do not respond truthfully.

may do so to align with social norms
may do so to appease the researcher

Question 23

Q

What is a confounding variable?

Answer

A

A third variable that may explain the association between two other variables.

Ex. when ice cream sales increase, so do shark attacks. This is is association only, not causation. Temperature is a confounding variable here because as it increases, so do ice cream sales/going to the beach

Question 24

Q

What are the two main types of studies?

Answer

A

observational and experimental

Question 25

Q

What is an observational study?

Answer

A

Researchers simply observe the data as they occur. We cannot say that there is a cause and effect based on this type of study because there can be confounding variables.

These studies almost always have confounding variables.

Observational studies can almost never be used to establish causation.

ex. Question: Does coffee cause hyperactivity in college students?
A researcher randomly samples students and surveys them about their coffee intake and hyperactivity

Question 26

Q

What is an experimental study?

Answer

A

Researchers actively control one or more of the variables of interest. These studies can be used to prove cause and effect by manipulating the parameters of a study.

Ex. Question: Does coffee cause hyperactivity in college students?
A researcher randomly samples students and randomly assigns them to drink coffee with or without caffeine.

Question 27

Q

How can confounding variables be avoided?

Answer

A

By using a randomized experiment.

Question 28

Q

What is a randomized experiment?

Answer

A

When the treatment for each case is randomly assigned.

Question 29

Q

What are the two types of randomized experiments?

Answer

A

Comparative experiments and matched pair experiments

Question 30

Q

What is a comparative experiment?

Answer

A

Cases are randomly assigned to different treatment groups

Question 31

Q

What is a matched pair experiment?

Answer

A

Each case gets BOTH treatments

Question 32

Q

What is a control group?

Answer

A

A group of cases that do not receive treatment; serve as a comparison group

Question 33

Q

What is a placebo?

Answer

A

A fake treatment; used to control placebo effect

Question 34

Q

What is a single-blind study?

Answer

A

When participants do not know to which group they belong

Question 35

Q

What is a double-blind study?

Answer

A

When participants and researchers interacting with the participants BOTH do not know which participants were assigned to which group.

Question 36

Q

How can we summarize one categorical variable?

Answer

A

can use a frequency table
can take a proportion (relative frequency)
can make a relative frequency table (does not include counts)
bar chart
pie chart

Question 37

Q

What is a proportion?

Answer

A

A relative frequency

Proportion = count for category of interest/ total counts in sample

Question 38

Q

How can we summarize two categorical variables?

Answer

A

use a two way table
use a segmented (stacked) bar chart
use a side-by-side bar chart

Question 39

Q

How can we summarize 1 quantitative variable?

Answer

A

Can use a …
- dotplot
- histogram

Question 40

Q

When are histograms ideal?

Answer

A

This is the ideal graph when there are 30 or more cases.

Question 41

Q

What shapes can histograms be?

Answer

A

bell shaped/symmetric
left-skewed
right-skewed

Question 42

Q

What is the mean?

Answer

A

The mean, or average, is the sum of data values/ number of values.

Question 43

Q

What is the median?

Answer

A

The middle value when the data are ordered.

Question 44

Q

Describe the mean and median when the data is symmetric.

Answer

A

mean roughly equals median

Question 45

Q

Describe the mean and median when the data is right skewed.

Answer

A

Mean > median

right tail pulls data in that direction

Question 46

Q

Describe the mean and median when the data is skewed to the left.

Answer

A

Mean < median

Question 47

Q

When is the mean meaningless?

Answer

A

When the data is skewed in a certain direction.

Question 48

Q

What is an outlier?

Answer

A

A data point that is notably distant from the other values in a data set.

Question 49

Q

What is resistance?

Answer

A

A statistic is resistant if it is relatively unaffected by extreme values such as outliers.

Question 50

Q

Is the median resistant to outliers?

Question 51

Q

Is the mean resistant to outliers?

Question 52

Q

What is standard deviation?

Answer

A

A measure of how spread out the data are.
Notated by “s.”

Question 53

Q

What does a larger standard deviation mean?

Answer

A

The larger the standard deviation, the more variability there is, and the more spread out the data are.

Question 54

Q

Is standard deviation resistant to outliers?

Answer

A

No, because it uses the mean in its calculation.

Question 55

Q

What is the 95% rule?

Answer

A

For a bell shaped distribution, about 95% of the data falls within two standard deviations of the mean. (i.e. are between x bar - 2s and x bar + 2s)

Question 56

Q

What is a z-score?

Answer

A

The number of standard deviations a value is from the mean. A higher magnitude z-score means the particular data point is more unlike the mean.

Question 57

Q

How can we estimate standard deviation by looking at a histogram?

Answer

A

Pick two broad values, subtract them and divide by 4.

Question 58

Q

What is a percentile?

Answer

A

The percentile is the value that is greater than p% of the data.

Ex. if your height is the 40th percentile, 40% of people are shorter than you

Question 59

Q

What does the five number summary include?

Answer

A

minimum, Q1, median, Q3, maximum

Question 60

Q

What is Q1 (first quartile)

Answer

A

Median of values below the median (25th percentile)

Question 61

Q

What is Q3 (third quartile)

Answer

A

Median of values above the median (75th percentile)

Question 62

Q

What is the range?

Answer

A

Maximum - minimum

Question 63

Q

Is the range resistant to outliers?

Answer

A

No, because the range could be calculated WITH outliers.

Question 64

Q

What is IQR?

Answer

A

Interquartile Range
Q3 - Q1

Answer 63

A

Yes, because it is NOT calculated with outliers. The IQR only captures the middle 50% of data.

Answer 64

A

Preferred for skewed distributions (rather than the mean and standard deviation)

Answer 65

A

Boxplots are used for one quantitative variable and they display the five number summary.

Answer 66

A

side-by-side histogram
side-by-side dotplot
side-by-side boxplot