Penn State Stats Review Flashcards

1
Q

Define a population

A

A population is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired.

https://newonlinecourses.science.psu.edu/statprogram/reviews/statistical-concepts/terminology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define a parameter

A

A parameter is any summary number, like an average or percentage, that describes the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the symbol for and pronunciation of the population mean?

A

μ (the greek letter “mu”)

Ex We might be interested in learning about , the average weight of all middle-aged female Americans. The population consists of all middle-aged female Americans, and the parameter is µ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the population proportion?

A

Symbol p
Ex. We might be interested in learning about p, the proportion of likely American voters approving of the president’s job performance. The population comprises all likely American voters, and the parameter is p.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a sample

A

A sample is a representative group drawn from the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define a statistic

A

A statistic is any summary number, like an average or percentage, that describes the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the symbol for the sample mean?

A

X-bar, or x̄

We might use x̄, the average weight of a random sample of 100 middle-aged female Americans, to estimate µ, the average weight of all middle-aged female Americans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the symbol for the sample proportion?

A

P-hat or p̂

We might use p̂, the proportion in a random sample of 1000 likely American voters who approve of the president’s job performance, to estimate p, the proportion of all likely American voters who approve of the president’s job performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you learn about a population parameter?

A

Two ways

1) We can use CONFIDENCE INTERVALS to estimate parameters.

“We can be 95% confident that the proportion of Penn State students who have a tattoo is between 5.1% and 15.3%.”

2) We can use HYPOTHESIS TESTS to test and ultimately draw conclusions about the value of a parameter.

“There is enough statistical evidence to conclude that the mean normal body temperature of adults is lower than 98.6 degrees F.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the principle behind confidence intervals?

A

uppose we want to estimate an actual population mean . As you know, we can only obtain , the mean of a sample randomly selected from the population of interest. We can use to find a range of values:

Lower value < population mean (μ) < Upper value

that we can be really confident contains the population mean . The range of values is called a “confidence interval.”

In general, the narrower the confidence interval, the more information we have about the value of the population parameter. Therefore, we want all of our confidence intervals to be as narrow as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define the general form for most confidence intervals

A

Sample estimate +/- margin of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define the t-interval for population mean

A

The formula for the confidence interval in words is

Sample mean +/- (t-multiplier x standard error)

The quantity to the right of the ± sign, i.e., “t-multiplier × standard error,” is just a more specific form of the margin of error. That is, the margin of error in estimating a population mean µ is calculated by multiplying the t-multiplier by the standard error of the sample mean.

The formula is only appropriate if a certain assumption is met, namely that the data are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define the t-multiplier

A

Denoted as (symbols after t are subscript):

t α/2,n-1

Depends on the sample size through n - 1 (called the “degrees of freedom”) and the confidence level (1-α) x 100 through α/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define “degrees of freedom”

A

n-1
n = sample size

Another way to say this is that the number of degrees of freedom equals the number of “observations” minus the number of required relations among the observations (e.g., the number of parameter estimates). For a 1-sample t-test, one degree of freedom is spent estimating the mean, and the remaining n - 1 degrees of freedom estimate variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define standard error

A

The “standard error,” which is s divided by square root of n, quantifies how much the sample means vary from sample to sample. That is, the standard error is just another name for the estimated standard deviation of all the possible sample means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define alpha (α)

A

With respect to estimation problems , alpha refers to the likelihood that the true population parameter lies outside the confidence interval . Alpha is usually expressed as a proportion. Thus, if the confidence level is 95%, then alpha would equal 1 - 0.95 or 0.05.

With respect to hypothesis tests , alpha refers to significance level , the probability of making a Type I error .

17
Q

What is a common way of measuring the confidence interval?

A

The t-interval for population mean

18
Q

What factors effect the width of the confidence interval?

A

As the sample mean increases, the length stays the same. That is, the sample mean plays no role in the width of the interval.

As the sample standard deviation s decreases, the width of the interval decreases. Since s is an estimate of how much the data vary naturally, we have little control over s other than making sure that we make our measurements as carefully as possible.

As we decrease the confidence level, the t-multiplier decreases, and hence the width of the interval decreases. In practice, we wouldn’t want to set the confidence level below 90%.

As we increase the sample size, the width of the interval decreases. This is the factor that we have the most flexibility in changing, the only limitation being our time and financial constraints.

19
Q

Describe the general idea of hypothesis testing

A

The general idea of hypothesis testing involves:

  1. Making an initial assumption.
  2. Collecting evidence (data).
  3. Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.

Every hypothesis test — regardless of the population parameter involved — requires the above three steps.

20
Q

Describe how the null hypothesis is approached in statistics

A

Similar to “the defendant is innocent until proven guilty.” In statistics, we always assume the null hypothesis is true. That is, the null hypothesis is always our initial assumption.

(subscripts after H)
H0: Null hypothesis
HA: Alternate hypothesis

We either reject or do not reject the null hypothesis.

We do not “prove” the null hypothesis. We “behave as if” it is right/wrong.

21
Q

Define a type I error

A

The null hypothesis is rejected when it is true.

22
Q

Define a type II error

A

The null hypothesis is not rejected when it is false.

23
Q

What are the two types of hypothesis tests?

A
  1. Critical value approach

2. P-value approach

24
Q

Describe the critical value approach to hypothesis testing

A

The critical value approach involves determining “likely” or “unlikely” by determining whether or not the observed test statistic is more extreme than would be expected if the null hypothesis were true. That is, it entails comparing the observed test statistic to some cutoff value, called the “critical value.” If the test statistic is more extreme than the critical value, then the null hypothesis is rejected in favor of the alternative hypothesis. If the test statistic is not as extreme as the critical value, then the null hypothesis is not rejected.

25
Q

What are the four steps involved in using the critical value approach to hypothesis testing?

A
  1. Specify the null and alternative hypotheses.
  2. Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. To conduct the hypothesis test for the population mean μ, we use the t-statistic which follows a t-distribution with n - 1 degrees of freedom.
  3. Determine the critical value by finding the value of the known distribution of the test statistic such that the probability of making a Type I error — which is denoted (greek letter “alpha”) and is called the “significance level of the test” — is small (typically 0.01, 0.05, or 0.10).
  4. Compare the test statistic to the critical value. If the test statistic is more extreme in the direction of the alternative than the critical value, reject the null hypothesis in favor of the alternative hypothesis. If the test statistic is less extreme than the critical value, do not reject the null hypothesis.
26
Q

Define the P-value approach to hypothesis testing

A

The P-value approach involves determining “likely” or “unlikely” by determining the probability — assuming the null hypothesis were true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P-value is small, say less than (or equal to) α, then it is “unlikely.” And, if the P-value is large, say more than α, then it is “likely.”

If the P-value is less than (or equal to) α, then the null hypothesis is rejected in favor of the alternative hypothesis. And, if the P-value is greater than , then the null hypothesis is not rejected.

27
Q

What are the 4 steps involved in the P-value approach to hypothesis testing?

A
  1. Specify the null and alternative hypotheses.
  2. Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. Again, to conduct the hypothesis test for the population mean μ, we use the t-statistic which follows a t-distribution with n - 1 degrees of freedom.
  3. Using the known distribution of the test statistic, calculate the P-value: “If the null hypothesis is true, what is the probability that we’d observe a more extreme test statistic in the direction of the alternative hypothesis than we did?” (Note how this question is equivalent to the question answered in criminal trials: “If the defendant is innocent, what is the chance that we’d observe such extreme criminal evidence?”)
  4. Set the significance level, , the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P-value to α. If the P-value is less than (or equal to) α, reject the null hypothesis in favor of the alternative hypothesis. If the P-value is greater than α, do not reject the null hypothesis.