Class Created Cards Flashcards

1
Q

What is the difference between one and two-tailed tests in hypothesis testing?

A
  • one-sided tests measure specifically how much the actual data falls above OR below the distribution based on the null hypothesis BUT NOT BOTH
  • two-sided tests measure how much the actual data falls above AND below the distribution based on the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is Block 1 the best stats block?

A

Obviously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the symbol “r” represent

A

It represents the correlation coefficient or r = cov(x,y)/(σx*σy) and generaly tells you how close a set of theoretical points are to being fit by a line (for positive slope 1 is a line, 0 is just random points, and for negative slow -1 is a line.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does population size
impact bias?

A

No. Bias is only affected by the method of sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the sigma (σ) symbol mean in statistics?

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Does rejecting the null hypothesis mean that the null hypothesis is false?

A

A null hypothesis is saying that we are accepting the alternate hypothesis (the effect of the hypothesis does exist in the population). This does not prove that the null hypothesis is a false statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a good way to think about what the “mean” means?

A

The mean can be thought of as the fulcrum of a scale, like the balance point of the data. In other words, it’s the most typical value. If you were to select random points from a distribution, the mean would be the closest on average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a z-score?

A

number of standard deviations away from the mean a certain value is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a good way to think about what the “median” means?

A

The median is the number that, when organizing data in numerical value, is in the center. Example - 1 4 6 8 9 - median would be 6.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we use hypothesis testing?

A

We use a hypothesis test to account for the uncertainty caused by sampling variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a good way to think about what the “mode” means?

A

The mode is the number in the data that appears the most amount of times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are greek and roman letters used in statistics?

A

To represent concepts/words
-greek tends to represent the general population and roman tends to represent the sample population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does sample size impact bias?

A

No, it doesn’t matter how big or small the sample size is, the results will still be random either way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between a population and a sample?

A

A population is the entire group that you are trying to draw a conclusion about while the sample is the group within the population that you collect data from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Are there patterns in randomness? Explain.

A

Randomness is characterized by the lack of patterns or predictability in the sequence of events. A random process, such as the output of a truly random number generator, will not exhibit any recognizable patterns or regularities in its output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does p represent in statistics?

A

p is short for the p-value, which represents the probability of obtaining results at least as extreme as the observed result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a population?

A

A population is a pool which a sample is being drawn from to study and interpret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the mu (µ) symbol mean in statistics?

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the symbol in statistics for the mean of a sample distribution?

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the Four Pillars of Inference?

A

These are four types of conclusions to take away from data: significance, estimation, generalization, and causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What happens to the variation of your sampling distribution as you increase the number of trials from which you are collecting the statistic of interest when you are modeling the null hypothesis for some situation using TinkerPlots?

A

The variation of the sampling distribution decreases with a larger number of trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A student says, “I wanted to see if the results were due to random chance.” How might a student go about doing this? What is the reasoning behind that approach— why does it work?

A

A good way to do this would be a Null Hypothesis Test through a Monte Carlo simulation. The first step would be to construct a sampler that corresponds to the probability model suggested in the null hypothesis. Note that the null hypothesis assumes a no-effect model.

Next, you would want to run your sampler at least 500 times and graph the resulting distribution. From this graph, you can get an idea of how likely various results are to be obtained from random chance.

Finally, we compare our observed result to the sample distribution. If the p-value is less than our alpha value than we reject the null hypothesis and say it’s unlikely that this result was generated due to random chance. Thank you for attending my TED talk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a p-value?

A

A p-value is the probability of having a result be at least as extreme as the observed result if the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a two-tailed hypothesis test?

A

A hypothesis test where you check if the observed result is significantly higher or lower than the simulated results (both sides)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can you establish statistical significance?

A

To establish statistical significance, a common approach is hypothesis testing. It involves creating a null hypothesis (no significant difference) and an alternative hypothesis (significant difference), then using a test statistic (such as a t-value or p-value) to determine likelihood of sample data given the null hypothesis. P-value compares the significance level (often 0.05) to determine rejection or non-rejection of the null hypothesis. If p-value is less, it’s considered statistically significant and the null hypothesis is rejected, otherwise it’s not statistically significant and not rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the term for the standard deviation of a sample distribution?

A

The standard deviation of a sample distribution is usually called the standard error and written SE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do you determine an appropriate alpha level?

A

1 - confidence level. For exmaple, if you want to be 90% certain your analysis is accurate do 1 - .9 = .1 or 10%. This is for one tailed tests, for two tailed tests, divide this number by 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Doespopulationsize impact sampling variability?

A

The variability will depend on the population size. This is because larger populations are more likely to produce means that are closer to the actual mean of the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a sample?

A

A sample is a set of data that is taken from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the difference between a simulation and a theoretical approach to statistical hypothesis testing?

A

Simulation and theoretical are two methods of hypothesis testing. Simulation uses simulated data, while theoretical uses mathematical formulas to calculate test statistics and p-values. Simulation is useful when data generating process is complex, while theoretical is suitable when underlying assumptions of the model are met.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a Hypothesis Test

A

Testing a hypothesis in comparison to statistical values. One might compare a hypothesis to a null hypothesis, and then make inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are one-sided and two-sided hypothesis tests?

A

One-sided -A hypothesis test where you check for statistical significance in one direction
Two sided-A hypothesis test where you check for statistical significance in both directions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you decide if you want to do a one-sided or two-sided hypothesis test?

A

A one-sided hypothesis test is when you are interested in seeing if the result is either greater than or less than 50% (looking at a difference between groups in a certain direction). A two-sided hypothesis test is when you are interested in seeing if the result is or is not equal to 50% (to check if there is any difference between groups you are comparing).

34
Q

What do you have to adjust for a simulation when you calculate a p-Value? Why?

A

You have to add one to the denominator and numerator as if you ran one extra trial which obtained your observed result. We do this to ensure we never get a p-value of zero, as there will always be a chance (albeit small) to generate an observed result from random chance.

35
Q

What does variability mean?

A

The differences between data points in a data set as related to each other or the mean. Examples of ways we quantify variability are range and standard deviation

36
Q

Does sample size impact sampling variability?

A

Sample size affects sampling variability

37
Q

What’s an “alpha level” and how do you determine one?

A

An alpha level is a pre-determined value that we compare results to to see if the results are within reason

38
Q

What is a “uniform probability model”?

A

A probability model which assigns an equal probability to each possible outcome.

39
Q

How do you calculate the ADM (or MAD)?

A

(Σ|score-mean|)/(total number of scores)

40
Q

How do you calculate the standard deviation?

A

Find the mean. For each point find the square of its distance to the mean. Sum the values. Divide by the number of data points. This is the variance, the standard deviation is the square root of the variance.

41
Q

What are a few examples of questions we can answer using hypothesis testing?

A

Is the mean value of a certain measurement (e.g. blood pressure) different between two groups of people (e.g. smokers vs. non-smokers)?

Is the proportion of customers who buy a certain product significantly different between two different marketing campaigns?

Is the average time to complete a task significantly different between two different training methods?

42
Q

How do we measure the spread (variability) of a distribution?

A

The measure of the spread is the standard deviation

43
Q

What symbols is used in statistics for the sample distribution standard deviation?

A

SE, standard error.

44
Q

How do we describe the shape of a distribution

A

The shape of a distribution of data is determined by both the measures of central tendency of the distribitonand the measures of spread of the data. Distribution is symmetric if the data is evenly distributed around the center or is a mirror image on both sides of the center. A bell-shaped: single peak or concentration of data. Right skew: concentrated on the right. Left skew: concentrated on the left.

45
Q

What is the meaning of “statistically significant”

A

Statistical significance tells us whether a result is due to random chance or some non-random factor

46
Q

What is a parameter of interest in a hypothesis test?

A

The Parameter of Interest gives you more information about the sample or the population. It is often the mean.

47
Q

In statistical hypothesis testing, what is the symbol for the research question?

A

In statistical hypothesis testing, the symbol for the research question is typically denoted as H₀ (null hypothesis) and H₁ (alternative hypothesis).

48
Q

How do you calculate the standard error?

A

Divide the standard deviation by the sample size’s square root.

49
Q

Why do we sample?

A

We sample because it is difficult to collect data from the entire population.

50
Q

What symbols is used in statistics for the sample distribution standard deviation?

A

The sigma symbol

51
Q

How do you calculate a p-value?

A

Numerator: number of points at least as extreme as the observed result + 1
Denominator: total number of simulated results + 1

52
Q

How do you write a null hypothesis?

A

To write a null hypothesis, restate the research question, add probability, and then state that any variation is due to random chance.

53
Q

In statistics, what symbol is used for the population standard deviation?

A

σ (lowercase sigma)

54
Q

What does Q1, Q2, and Q3 mean in terms of a distribution (and boxplot).

A

Q1, Q2, and Q3 are each single points on the boxplot. Q1 is the leftmost side of the boxplot, Q2 is the median of the dataset, and Q3 is the rightmost side of the boxplot. Q1 and Q3 are also the lower and higher edges of the middle 50% range of the distribution.

55
Q

How do you decide on the number of repeats to run when you are modeling the null hypothesis for some situation using TinkerPlots?

A

The number of repeats should be equal to the sample size of the sample we’re analyzing. The symbol for this number is n

56
Q

What can, and cannnot, be inferred about a population by looking at it’s box plot?

A

Can: Median, interquartile range, range, min and max

Cannot: Mean, Standard Deviation

57
Q

What are the measures of center?

A

Mean, median, mode

58
Q

How do you conduct a Monte Carlo Simulation?

A

Three steps: Set up, Simulate, Evaluate. Run at least 500 trials.

59
Q

How do you calculate a z-score?

A

Calculate the number of standard deviations the item is away from the mean.

60
Q

How to calculate the interquartile range (IQR)

A

The IQR is the middle 50% of the data spanning from Q1 to Q3. Once a box plot is divided into 4 sections, the IQR is the middle 2.

61
Q

What is the confidence interval?

A

range of values that is used to estimate an unknown population parameter.

62
Q

How do you make a stem plot?

A

To make a stem plot, take the first digit(s) of your numbers and place them sequentially in the left column (don’t repeat values). Then, list out the rest of the digits of each number in the right column.

63
Q

Do you expect to get 5 heads and 5 tails every time you flip a fair coin? Why not?

A

No. A fair coin has an equal probability of landing heads or tails, but that is a probability not guarantee. If flipped many times you would expect about 50% Heads and Tails

64
Q

What is an interquartile range?

A

It is the range between the third an first quartile. It is the middle 50%

65
Q

How do we find the range of “typical values”?

A

The range of typical values of a distribution is defined by the mean of the distribution + or - 2 * standard deviation.

66
Q

What are some things that students get confused about regarding box plots.

A

The quartiles are not the actual section of data, but is the divider between them. Q1 refers to a line, alike Q2 and Q3. There is technically no “4th quartile” despite having a 4th section. Calling these quartiles is confusing as there are only three.

67
Q

What are stem (stem and leaf) plots?

A
68
Q

What does p hat (p̂) mean in statistics?

A

The hat character “^” is usually used to denote that we are talking about a statistic (for a sample) or an estimate/predicted value. The P(x) refers to the proportion of x in the population but the p̂(x) refers to the proportion within a sample. The character π is also sometimes used rather than P.

69
Q

What do students get confused about when describing the skew of a distribution?

A

When describing the skew of a distribution, students often get confused between left and right skew, since the side with the longer tail defines the skew (rather than the peak).

70
Q

What is “positive skew”? What is “negative skew”?

A

“positive skew” is when a bell curve has a longer tail on the positive (right) side than the other side.
“negative skew” is when a bell curve has a longer tail on the negative (left) side than the other side.

71
Q

How do we find the range of “typical values”?

A

To find the range of “typical values,” go two standard deviations to each side of the mean, those two numbers define the range of “typical values.”

72
Q

What is a “false positive”?

A

A “false positive” is when a result incorrectly assumes something is there/happening

73
Q

What is a “false negative”?

A

A “false negative” is when a result incorrectly assumes that nothing is there/happing

74
Q

What is an alpha level?

A

An alpha level is a percentage that one sets at the beginning of a hypothesis test as a measure of the amount of evidence needed to reject the null hypothesis, it’s usually set to 5%.

75
Q

What are three different ways of determining if a point is an outlier?

A
  1. If you display the data, you can identify outliers just by choosing the points that seem too far removed from the rest of the data, especially when considering the variable in question.
  2. Any point that is at least 3 SD away from the mean can be considered an outlier.
  3. If you use a box plot, then any point at least 1.5 IQRs away from the nearest quartile can be considered an outlier.
76
Q

How do you represent the null and alternative hypothesis using symbols?

A

Null hypothesis (H0): There’s no effect in the population. Alternative hypothesis (Ha or H1): There’s an effect in the population

77
Q

What is the symbol in statistics for the z-score, aka standard score?

A

z

78
Q

What are different statistical interpretations of “range”?

A

The distance between the highest and lowest data points

79
Q

What can, and cannot, be inferred about a population by looking at it’s box plot?

A

By looking at a box plot, you can infer likely ranges and means

80
Q

What do “quartile 1”, “quartile 2”, and “quartile 3” mean in terms of a distribution?

A

Quartiles 1 and 3 are the ends of the range of likely values
Quartile 2 is the mean

81
Q

What are some good uses of a boxplot? What statistical need might a boxplot satisfy?

A

Box plots are good to represent data and to compare to other sets of data

82
Q

What is a “stem plot” (also known as a “stem and leaf plot”)?

A

A stem and leaf plot is a plot used to represent a set of data. It takes a value and splits it up at the 10s place to show how many of a certian interval there are

for example the values 32 and 34 would be represented by a stem of 3 and leafs of 2 and 4