Exam 1 Terms Flashcards

1
Q

What is a case?

A

An individual unit that is often a person, place, or thing. A row of data usually represents a case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are variables?

A

Variables are a characteristic or measurement that describes the cases. Typically, a column of data represents a variable. Examples are height, weight, age, temperature, time, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main types of variables?

A

categorical and quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a categorical variable?

A

A variable comprised of 2 or more categories. (ex. gender)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a quantitative variable?

A

A variable that measures a numerical quantity. (ex. GPA, pulse rate, height)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two subsets of quantitative variables?

A

Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a continuous variable?

A

A type of quantitative variable that can take on an infinite set of values within some range. (ex. temperature, life expectancy, food calories)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a discrete variable?

A

A type of quantitative variable that has a finite set of possible values. (ex. number of babies born in a pregnancy, number of courses you are taking next semester)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a population?

A

The entire set of cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a sample?

A

A subset of the population. We collect data for the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a parameter?

A

Describes the population. (ex. population mean, GPA for an entire class)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a statistic?

A

Describes the sample. (ex. sample mean, GPA for a selection of students in a class)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is statistical inference?

A

The process of using data from a sample to gain information about the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why should we take random samples?

A

A random sample should be selected from a population, otherwise it may be prone to bias. The goal is to obtain a sample that is representative of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a representative sample?

A

A subset of the population from which data are collected that accurately reflects the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is bias?

A

The systematic favoring of certain outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is sampling bias?

A

Systematic favoring of certain outcomes due to the methods employed to obtain the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is simple random sampling? Why do we do it?

A

A method of obtaining a sample where every member of the population has an equal chance of being selected (similar to drawing names from a hat). Samples are selected without replacement.

SRS is done to avoid sampling bias and to obtain a sample that’s representative of a population.

ex. if we wanted to research how long PSU students sleep at night, it would be best to randomly select students for the sample rather than only surveying students in an 8 AM class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a convenience sample?

A

A method of obtaining a sample by ease of accessibility. These samples are NOT random and they may NOT represent the intended population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Besides convenience sampling, what are other sources of bias?

A
  • non-response bias
  • response bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is non-response bias?

A

Individuals who do not participate in a study differ from those who do participate.

  • inability to contact individual
  • individual chose not to participate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is response bias?

A

Individuals participate, but do not respond truthfully.

  • may do so to align with social norms
  • may do so to appease the researcher
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a confounding variable?

A

A third variable that may explain the association between two other variables.

Ex. when ice cream sales increase, so do shark attacks. This is is association only, not causation. Temperature is a confounding variable here because as it increases, so do ice cream sales/going to the beach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the two main types of studies?

A

observational and experimental

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is an observational study?

A

Researchers simply observe the data as they occur. We cannot say that there is a cause and effect based on this type of study because there can be confounding variables.

These studies almost always have confounding variables.

Observational studies can almost never be used to establish causation.

ex. Question: Does coffee cause hyperactivity in college students?
A researcher randomly samples students and surveys them about their coffee intake and hyperactivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is an experimental study?

A

Researchers actively control one or more of the variables of interest. These studies can be used to prove cause and effect by manipulating the parameters of a study.

Ex. Question: Does coffee cause hyperactivity in college students?
A researcher randomly samples students and randomly assigns them to drink coffee with or without caffeine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How can confounding variables be avoided?

A

By using a randomized experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a randomized experiment?

A

When the treatment for each case is randomly assigned.

29
Q

What are the two types of randomized experiments?

A

Comparative experiments and matched pair experiments

30
Q

What is a comparative experiment?

A

Cases are randomly assigned to different treatment groups

31
Q

What is a matched pair experiment?

A

Each case gets BOTH treatments

32
Q

What is a control group?

A

A group of cases that do not receive treatment; serve as a comparison group

33
Q

What is a placebo?

A

A fake treatment; used to control placebo effect

34
Q

What is a single-blind study?

A

When participants do not know to which group they belong

35
Q

What is a double-blind study?

A

When participants and researchers interacting with the participants BOTH do not know which participants were assigned to which group.

36
Q

How can we summarize one categorical variable?

A
  • can use a frequency table
  • can take a proportion (relative frequency)
  • can make a relative frequency table (does not include counts)
  • bar chart
  • pie chart
37
Q

What is a proportion?

A

A relative frequency

Proportion = count for category of interest/ total counts in sample

38
Q

How can we summarize two categorical variables?

A
  • use a two way table
  • use a segmented (stacked) bar chart
  • use a side-by-side bar chart
39
Q

How can we summarize 1 quantitative variable?

A

Can use a …
- dotplot
- histogram

40
Q

When are histograms ideal?

A

This is the ideal graph when there are 30 or more cases.

41
Q

What shapes can histograms be?

A
  • bell shaped/symmetric
  • left-skewed
  • right-skewed
42
Q

What is the mean?

A

The mean, or average, is the sum of data values/ number of values.

43
Q

What is the median?

A

The middle value when the data are ordered.

44
Q

Describe the mean and median when the data is symmetric.

A

mean roughly equals median

45
Q

Describe the mean and median when the data is right skewed.

A

Mean > median

right tail pulls data in that direction

46
Q

Describe the mean and median when the data is skewed to the left.

A

Mean < median

47
Q

When is the mean meaningless?

A

When the data is skewed in a certain direction.

48
Q

What is an outlier?

A

A data point that is notably distant from the other values in a data set.

49
Q

What is resistance?

A

A statistic is resistant if it is relatively unaffected by extreme values such as outliers.

50
Q

Is the median resistant to outliers?

A

yes

51
Q

Is the mean resistant to outliers?

A

no

52
Q

What is standard deviation?

A

A measure of how spread out the data are.
Notated by “s.”

53
Q

What does a larger standard deviation mean?

A

The larger the standard deviation, the more variability there is, and the more spread out the data are.

54
Q

Is standard deviation resistant to outliers?

A

No, because it uses the mean in its calculation.

55
Q

What is the 95% rule?

A

For a bell shaped distribution, about 95% of the data falls within two standard deviations of the mean. (i.e. are between x bar - 2s and x bar + 2s)

56
Q

What is a z-score?

A

The number of standard deviations a value is from the mean. A higher magnitude z-score means the particular data point is more unlike the mean.

57
Q

How can we estimate standard deviation by looking at a histogram?

A

Pick two broad values, subtract them and divide by 4.

58
Q

What is a percentile?

A

The percentile is the value that is greater than p% of the data.

Ex. if your height is the 40th percentile, 40% of people are shorter than you

59
Q

What does the five number summary include?

A

minimum, Q1, median, Q3, maximum

60
Q

What is Q1 (first quartile)

A

Median of values below the median (25th percentile)

61
Q

What is Q3 (third quartile)

A

Median of values above the median (75th percentile)

62
Q

What is the range?

A

Maximum - minimum

63
Q

Is the range resistant to outliers?

A

No, because the range could be calculated WITH outliers.

64
Q

What is IQR?

A

Interquartile Range
Q3 - Q1

65
Q

Is IQR resistant to outliers?

A

Yes, because it is NOT calculated with outliers. The IQR only captures the middle 50% of data.

66
Q

When is the five number summary preferred?

A

Preferred for skewed distributions (rather than the mean and standard deviation)

67
Q

What do boxplots display?

A

Boxplots are used for one quantitative variable and they display the five number summary.

68
Q

How do we represent data with both quantitative AND categorical variables?

A
  • side-by-side histogram
  • side-by-side dotplot
  • side-by-side boxplot