Exam 1 Real Flashcards

1
Q

What is statistics?

A

Study of methods for measuring aspects of populations from samples and for quantifying the uncertainty of the measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a population versus a sample?

A

A population is all of the individual units of interest and a sample is a subset of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are variables?

A

Characteristics that differ among individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a parameter?

A

A quantity describing a population (real)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an estimate or statistic?

A

A related quantity calculated from a sample (a subset of the population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does error value of an estimate or statistic depend on?

A

Depends on the variability within the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is estimation?

A

The process of inferring an unknown quantity of a population using sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a random sample?

A

In a random sample each member of the population has an equal and independent chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do random samples achieve?

A

Minimizes bias and makes it possible to measure the amount of sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a sample of convenience?

A

A collection of individuals that are easily available to the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the parameter?

A

The truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sampling error?

A

The difference between an estimate and population parameter being caused by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is bias?

A
  • Bias is a systematic discrepancy between estimates we would obtain if we could sample a population again and again, and the true population
  • Error in the same direction if you repeated the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is volunteer bias?

A

Resulting from systematic differences between the pool of volunteers and the population to which they belong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is accurate?

A

Closer the statistic or estimate is to the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is precise?

A

Describing how repeatable an estimate is - could be due to low variability in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data can be___|_____

A

Categorical or numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Categorical data can be ________ or ________

A

Nominal - no inherent order
Ordinal - inherent order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Numerical data can be ________ or ______

A

Continuous - any real number
Discrete - indivisible units (# of children)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a frequency distribution?

A

The number of times each value of a variable occurs in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are two types of studies?

A

Experimental and observational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are two types of variables?

A

explanatory and response variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How are variables graphed?

A

Explanatory variable on the x axis and response variable on the y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a lurking/confounding variable?

A

A variable that masks or distorts the causal relationship between measured variables in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are 3 problems with 3D bar graphs?

A

Takes the average, difficult to make comparisons because of the way data is displayed, magnitudes are distorted making the differences out of proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is good about graphs?

A

Good when you want to show trends or patterns in values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When are tables good?

A

When you want to report/compare specific values with precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a bar graph used for?

A

Uses the height of rectangular bars to display the frequency distribution of a categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a grouped bar graph?

A

Uses the height of rectangular bars to display the frequency distributions of two or more categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Which is better a bar graph or a pie chart?

A

Textbook prefers bar graph to pie chart, pie chart only if there are only two categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a histogram?

A

Like a bar graph but the x axis has numerical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Describe the aspects of a histogram shape.

A
  • the mode is the highest peak in the frequency distribution
  • skew refers to asymmetry in the shape
  • outlier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a plot with area of rectangles?

A

mosaic plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does a mosaic plot display?

A
  • uses the area of rectangles to display the relative frequency of occurence of two categorical variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a scatter plot?

A

graphical display of two numerical variables, each observation a point on a graph of two axes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is a strip plot?

A

a graphical display of a numerical variable and a categorical variable in which each observation is represented as a dot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is useful about a strip plot?

A

Gives a good idea of sample size

38
Q

What is a box plot?

A

a graph that uses lines and a rectangle box to display the median, quartiles, range, and extreme measurements of the data

39
Q

What is a violin plot?

A

a graph that shows an approximation of the frequency distribution of a numerical variable in each group and its mirror image

40
Q

What does the width of a violin plot indicate

A
  • distribution of the data
  • width is proportional to the density of data points
41
Q

What is a good tip for multiple histograms?

A
  • better to stack vertically rather than side by side because it is easier to compare groups
    -use same scale for x axis
42
Q

What is interquartile range?

A

upper quartile - lower quartile

43
Q

What describes the spread of a distribution?

A

standard deviation and variance

44
Q

What is deviation?

A

difference between a data point and the mean

45
Q

What is the sum of squares?

A

the sum of squared deviation

46
Q

What is variance?

A

s.d. squared

47
Q

What is standard deviation?

A
48
Q

Why is there a preference for standard deviation?

A
  • never negative
  • in the same units as the observation
  • helpful rule of thumb properties
49
Q

What are the rule of thumb properties standard deviation?

A
50
Q

What is the issue with comparing the spread of distributions in different populations?

A

mouse vs elephant weights, just because there is a larger deviation value doesnt mean there is a bigger relative spread

51
Q

What is useful for comparing the spread of distributions in different populations?

A

coefficient of variation

52
Q

What is the coefficient of variation?

A
  • the standard deviation expressed as a percentage of the mean
  • CV = s / mean x 100%
  • Larger CV = wider spread
53
Q

What is the median?

A

the middle measurement of a set of observations

54
Q

What do percentiles indicate?

A

xth percentile is the sample below which x percent of the observations lie

55
Q

What is the line in the middle of a box plot?

A

the median

56
Q

Explain the box and whiskers in a box plot.

A
  • Box covers entire IQR
  • The upper whisker is the highest point within the quartile 3 + 1.5*IQR
  • The lower whisker is the lowest point within the quartile 1 – 1.5*IQR
  • If there is a data point lower than the floor there are dots – outliers
57
Q

Where should the median be in a bell shaped curve?

A

right in the middle of the box

58
Q

What is the plot with frequency lines?

A
  • Cumulative relative frequency at a given measurement is the fraction of observations less than or equal to that measurement
    -A steep jump indicates the clustering of a lot of data points
  • A horizontal line indicates a gap in data points
59
Q

What is the IQR?

A

the difference between the third and first quartiles of the data. It is the span of the middle 50% of the data

60
Q

Median is ____ mean is_____

A

Median is the middle value, while the mean is the center of gravity

61
Q

What is proportion?

A
  • Proportion of observations in a given category
  • P = num in category / n
  • The p has a little hat on it when you are estimating the proportion in a sample
62
Q

Describe how sampling distributions change with different numbers of samples.

A
  • The spread of the sampling distribution depends on the number of samples
  • As you increase (observations/sample) the spread (sd) decreases
63
Q

What is the standard error of an estimate?

A
  • The standard error of an estimate is the standard deviation of the estimate’s sample distribution
  • SE_Y=s/√n
  • Reflects the precision of the estimate
  • The smaller the standard error the less uncertainty there is in the estimate of the target parameter
64
Q

What is the standard error of the mean?

A

σ=σ/√n
- we usually don’t know the actual population standard deviation so we approximate with sample standard deviation as an estimate of σ

65
Q

σ

A

population standard deviation

66
Q

s

A

sample standard deviation

67
Q

What is a confidence interval?

A

a range of values surrounding the sample estimate that is likely to contain the population parameter

68
Q

What is the normal confidence interval?

A

The 95% confidence interval provides a most plausible range for a parameter.

69
Q

How do you describe confidence interval certainty?

A
  • Right: We are 95% confident that the true mean lies between ___ and ____
  • Wrong: there is a 95% probability that the true mean falls between 2827.8 and 3828.4
70
Q

What are error bars?

A
  • lines on a graph extending outward from the sample estimate to illustrate uncertainty about the value of the parameter being estimated
  • used to display the uncertainty, not the spread of the data
71
Q

What is the 2SE rule?

A

A rough approximation of the 95% confidence interval for a mean can be calculated as the sample mean plus and minus two standard errors

72
Q

What is a random trial?

A
  • a process or experiment that has two or more possible outcomes
  • die, coins
73
Q

What is an event in a random trial?

A
  • Event (of interest): any potential subset or all possible outcomes
  • Flipping coin: heads
  • Rolling die: 3
74
Q

What is probability?

A

the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions

75
Q

How do you abbreviate probability?

A

Pr[A] means “the probability of event A”

76
Q

What does mutually exclusive mean?

A

Two events are mutually exclusive if they cannot occur at the same time

77
Q

What is probability distribution?

A

a list of the probabilities of all mutually exclusive outcomes of a random trial

78
Q

How do you represent the probability distribution of different variables?

A
  • A discrete variable is measured in indivisible units
  • All categorical variables (present or absent) and many numerical variable (number of mates)
  • Continuous variables can take on any real number value within some range
  • Probability of Y being in some range is indicated by the area under the curve
79
Q

What is the addition rule?

A

if two events A and B are mutually exclusive then Pr[A or B] = Pr[A] + Pr[B]

80
Q

What is the general addition rule?

A
  • Not all events are mutually exclusive, so extra term is needed so you don’t double count outcomes
  • Pr[A or B] = Pr[A] +Pr[B] – Pr[A and B]
81
Q

What are independent events?

A
  • Two events are independent if the occurrence of one does not inform us about the probability that the second will occur
  • Two flips of a coin or roll of a die
82
Q

What is the multiplication rule?

A

If two events are independent then the probability that they both occur is the probability of the first event multiplied by the probability of the second event

83
Q

What are dependent events?

A

the probability of a particular event in the second trial depends on what happened in the first trial

84
Q

What is the general multiplication rule?

A
  • Finds the probability that both of two events occur even if the two are dependent
  • Pr[A and B] = Pr[A]Pr[B|A]
85
Q

Standard deviation, standard error, 95% confidence interval

A

SD > 95% > SE

86
Q

Explain the difference between a bar plot and a histogram.

A

Bar graphs are used to show the frequency distribution of a categorical variable whereas histograms are used to show the frequency distribution of a numerical variable.

87
Q

How do you identify a skew?

A

where ever the tail is

88
Q

sd

A
89
Q

The standard error of a sample mean is ___.

A

the standard deviations of the means of randomly drawn samples from the population

90
Q

Select the proper interpretation of a confidence interval for a mean at a confidence level of C%.

A range of values _____.

A

produced by a method such that C% of confidence intervals produced by the same method contain the population mean