Class Created Flash Cards

1
Q

What is a p-value?

A

A p-value is a probability you would obtain a result at least as extreme as the observed result (for a Monte Carlo simulation, assuming the null hypothesis is true).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a z-score?

A

A z-score is the observed result’s distance from the mean, in standard deviations.

It may be positive or negative depending on whether the observed result is above or below the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Do you expect to get 5 heads and 5 tails every time you flip a fair coin? Why or why not?

A

No. This is because randomness is present in the real world. The probability model for generating 1 variable (1 flip of the coin) will not always match the observed results perfectly. However, as the number of trials approaches infinity, the difference between the proportions of the observed results and the theoretical probability model should go to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we describe the shape of a distribution?

A

Describe the skew (if applicable), number of peaks, symmetry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the measures of center?

A

The mode, median, and mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you establish statistical significance?

A

p-value < 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a population and a sample?

A

A population is the total group you are drawing conclusions about. A sample is the group from this population that you have selected to analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does “variability” mean? Explain like you are talking to a 3rd grader. Give an example.

A

Variability is when you have 1 situation but different outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Doessamplesize impact bias?

A

No, because the sampling method, which determines bias, remains the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a “uniform probability model”?

A

a model in which every outcome has equal probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate a z-score?

A

(Observed result - mean of distribution)/standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you calculate standard deviation?

A

√((∑x_i−μ)^2/N), with x_i as each value in the population, μ as the mean, and N as the total population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a “hypothesis test”?

A

A hypothesis test is a statistical method where you compare the no effect hypothesis to the observed result to determine whether there IS NOT no effect (NOT whether there IS an effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the meaning of statistical significance?

A

When the result from testing or experimentation is likely to be attributed to a specific cause

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are a few examples of questions we can answer using hypothesis testing?

A

Does Congress proportionally represent the U.S. population on the basis of gender? Are kids more likely to take a toy or candy on Halloween? Is seven really the most popular favorite number (between 1-10)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Does sample size impact sampling variability?

A

Yes, the larger the sample, the less likely the data in the sample is likely to be wildly different from the population or original data set; larger samples lead more normalized samples. If there is a smaller sample size, the sampling variability can be higher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a good way to think about what the “median” means?

A

The Median is the centermost value in a distribution graph. Think of it as though each data point has a number from 1-500 (regardless of the actual value it represents), and the median would be the data point with the number 250. Then when you see the value that the data point represents, that’s the actual median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Does population size impact bias?

A

Yes, if a population is too small, it will be more difficult to make unbiased samples.

CORRECTION (?): Bias is only determined by sampling method, not sample or population size. Increasing the population will NOT introduce systematic bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is an Alpha level?

A

a measure of the strength of the evidence that must be present in your sample before you will reject the null hypothesis and conclude that the effect is statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the symbol in statistics for z-score

A

z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the mu ( µ) symbol mean in statistics?

A

µ is the greek letter that stands for mu which is used to denote a population mean or expected value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why do we sample?

A

We sample because it isn’t always realistic to study an entire population. A sample, as long as it’s a representative one, can be generalized to the source population.

Some examples of samples are simple random sampling, stratified sampling, and block sampling, to name a few.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the symbol in statistics for the mean of a sample distribution?

A

The sample mean symbol is x̄, pronounced x bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the symbol r represent?

A

(This one doesn’t have an answer!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does p represent in statistics?

A

The variable p represents probability in statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the difference between a one and two-tailed test in hypothesis testing?

A

One-tailed tests allow for the possibility of an effect in one direction. Two-tailed tests test for the possibility of an effect in two directions—positive and negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What do you have to do to adjust for a simulation when you calculate a p-Value? Why?

A

You must adjust the p-value for a simulation because random chance has to be taken into account. There is a chance the simulation didn’t produce a data point as or more extreme than the observed value due to sampling variability. Therefore, if there are no data points at the observed value or past it (away from the mean) that would mean your calculation would be 0/500, but because we have to account for random chance, add 1 to both sides (1/501) because you can’t have a 0% likelihood that something could occur.

28
Q

How do you calculate an interquartile range?

A

first, find the median of the lower half and upper half of the dataset. These are the first quartile (Q1) and the third quartile (Q3). The IQR is then calculated as Q3 - Q1. This provides a measure of the spread of the middle 50% of the data.

29
Q

How do you decide on the number of repeats to run when you are modeling the null hypothesis for some situation using TinkerPlots?

A

The repeat value should be set to the number of data found in the real world so that random chance will effect both equally.

30
Q

How do you decide if you want to do a one-sided or two-sided hypothesis test?

A

1-sided test:
if you want to detect a directional change (more/less) from the probability model

2-sided test:
if you want to detect any change from the probability model

31
Q

What does the sigma (σ) symbol mean in statistics?

A

Standard Deviation

32
Q

How do you represent the null and alternate hypothesis using symbols?

A

H0 and Ha/H1

33
Q

How do you calculate the MAD?

A
  1. Calculate the absolute difference between each point and the mean.
  2. Sum all of the absolute differences.
  3. Divide the sum by the number of data points.
34
Q

How do you write a null hypothesis?

A
  1. Saying there is no effect
  2. Probability model
  3. Any variation is due to random chance
35
Q

What is a “stem plot” (also known as a “stem and leaf plot”)?

A

A stem-and-leaf display or stem-and-leaf plot is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution.

More useful: a stem and leaf plot consists of a tee chart where the left is the first few digits of a number and the right is all of the rest of the digits, numbers with the same first few digits will appear as a list.

36
Q

How do you make a stem plot?

A
  1. The stem is the first digit or digits;
  2. The leaf is the final digit of a value;
  3. Each stem can consist of any number of digits; but.
  4. Each leaf can have only a single digit.
37
Q

Why might you want to make a stem plot— what statical need would a stem plot satisfy?

A

A stem and leaf plot, or stem plot, is a technique used to classify either discrete or continuous variables.

38
Q

What is a 5 number summary?

A

A five-number summary is especially useful in descriptive analyses or during the preliminary investigation of a large data set. A summary consists of five values: the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median.

39
Q

What are the meanings of “parameter” and “statistic”? How are they the same, how are they different?

A

parameter: number describing a whole population
statistic: a number describing a sample
similarity: they are both describing a group
difference: refer to definitions

40
Q

What is a sample?

A

A sample is a small quantity of collected data meant to represent what the whole would be like.

41
Q

What is a population?

A

A population is a finite or infinite collection of items under consideration.

42
Q

What is the term for the standard deviation of a sample distribution?

A

standard error

43
Q

what is an interquartile range

A

The interquartile range (IQR) contains the second and third quartiles, or the middle half of your data set.

44
Q

how do you calculate a p value

A

The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-tailed test, or two-sided test).

45
Q

what are one sided and two sided hypothesis tests

A

One-tailed tests allow for the possibility of an effect in one direction. Two-tailed tests test for the possibility of an effect in two directions—positive and negative.

46
Q

what is a monte carlo simulation

A

A Monte Carlo simulation is a method used to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables. It is a computational algorithm that uses random sampling to simulate the behavior of a system and to estimate the probability distribution of its outcomes.

47
Q

how do you conduct a monte carlo simulation

A
  1. Establish the mathematical model.
  2. Define an equation that brings the output and input variables together
  3. Determine the input values
  4. Create a sample dataset
  5. Set up the Monte Carlo simulation software. …
  6. Analyze the results.
48
Q

What are the three different ways to determine if a point is an outlier?

A
  1. creating a box plot or scatter plot to see if a point falls outside the overall pattern of the data.
  2. z-score or the interquartile range (IQR) to identify points that fall outside a defined threshold or range.
  3. clustering or density-based methods, which can identify points that are far away from the other points in the dataset.
49
Q

How do you find the quartiles of a distribution?

A
  1. Find the median of the dataset. This is known as the second quartile.
  2. Split the dataset into two groups: the lower half and the upper half.
  3. Find the median of the lower half of the dataset. This is known as the first quartile,
  4. Find the median of the upper half of the dataset. This is known as the third quartile,
  5. If the dataset has an even number of observations, the quartiles are calculated as the average of the two middle values.
50
Q
A

Positive skew is when the tail on the right side of the probability density function is longer than the tail on the left side. This means that the mean is greater than the median, and the mode is usually less than the mean. The majority of the data will be concentrated on the left side of the distribution, with a few large values on the right.

Negative skew, occurs when the tail on the left side of the probability density function is longer than the tail on the right side. This means that the mean is less than the median, and the mode is usually greater than the mean. In a negatively skewed distribution, the majority of the data will be concentrated on the right side of the distribution, with a few small values on the left.

51
Q

What are different statistical interpretations of “range”?

A

Range” typically refers to the difference between the largest and smallest values in a dataset. T

Other interpretations of “range” in statistics include:

Interquartile range (IQR), which is the difference between the 75th and 25th percentiles of a dataset.
Range in probability and statistics is also used to refer to the difference between the minimum and maximum values of a probability distribution.
The range of a function is the set of all possible output values of the function when the input is chosen from the domain of the function.

52
Q

What do “quartile 1”, “quartile 2”, and “quartile 3” mean in terms of a distribution?

A

Quartile 1 (Q1) is the value that separates the lowest 25% of the data from the rest of the dataset.
Quartile 2 (Q2) is the middle value of the dataset, with 50% of the data falling below it and 50% of the data falling above it.
Quartile 3 (Q3) is the value that separates the lowest 75% of the data from the top 25% of the dataset.

53
Q

What does p hat (sorry- can’t put in the correct symbol for the hat in Sheets) mean in statistics?
How are Greek and Roman letters used in statistics?

A
54
Q

Does population size impact sampling variability?

A

Yes, a larger population entails more sampling variablity.

55
Q

What is a two-tailed hypothesis test?

A

A test regarding whether a sample is as great as or larger the mean, or at least as small as the mean

56
Q

Are there patterns in randomness? Explain.

A

Yes, there are patterns in randomness, different things have different probabilities of happening making some outcomes more likely than others.

57
Q

How do we measure the spread of a distribution?

A

The spread of a distribution is typically measured with the Average Distance from the Mean (ADM) or the Standard Deviation (SD).

58
Q

In statistical hypothesis testing, what is the symbol for the research question?

A

The research question does not get a symbol. The symbol for the null hypothesis is typically H0.

59
Q

What does p̂ mean in statistics?

A

p̂ refers to the sample proportion, or the “observed p-value” meaning the proportion of individual observed data with a given trait.

60
Q

What is a confidence interval?

A

A confidence interval is the mean of your estimate plus and minus the variation in that estimate. This is the range of values you expect your estimate to fall between if you redo your test, within a certain level of confidence.

61
Q

What is a good way to think about what “mode” means?

A

The mode is the most commonly found value in a dataset

62
Q

What is a good way to think about what “mean” means?

A

If all the values in a dataset are weights on a balance scale, the mean is the point at which the scale is balanced and the sum of the weights on either side of the mean is equal.

63
Q

What are some things that students get confused about regarding box plots?

A

Students seem to get confused about where and what the different quarters are. Q1 and Q3, are not ranges, they are specific points on the box plot. Q1 is the left border of the interquartile range, and Q3 is the right border of it.

64
Q

A student says, “I wanted to see if the results were due to random chance.” How might a student go about doing this? What is the reasoning behind that approach— why does it work?

A

A student can perform a statistical test to determine if the results are due to random chance. Using a null hypothesis and an alternative hypothesis, create a hypothesis test. The student would then collect data and calculate a test statistic to determine the likelihood of obtaining the results under the null hypothesis. If the p-value is low, the student would reject the null hypothesis and conclude that the results are not due to random chance.

65
Q

What are the Four Pillars of Inference?

A
  1. Representativeness: The degree to which a sample is similar to the population from which it was drawn.
  2. Randomness: The degree to which the sample was selected randomly and independently from the population.
  3. Size: The size of the sample, as larger samples tend to be more representative and less susceptible to random fluctuations.
  4. Unbiasedness: The degree to which the sample is free from systematic error or bias.
66
Q

What’s an “alpha level” and how do you determine one?

A

An “alpha level” is the threshold at which a null hypothesis is rejected in a hypothesis test, and is commonly set at 0.05. This means that if the p-value is less than or equal to 0.05, the null hypothesis is rejected and the results are considered statistically significant.

67
Q

What is a good rule of thumb for how large a sample needs to be to predict a population parameter?

A

A sample should be approximately 10% of the population size. Of course, if the population is extremely large, a smaller sample may be taken.