Quiz Incorrect Flashcards
If an independent variable has a p-value of 0.0050, which of the following could represent the Lower 95% and the Upper 95% for that variable?
-235.62; -5.64
The p-value, 0.0050, is less than 0.05 so the independent variable is significant at the 5% significance level. Therefore, the 95% confidence interval for the coefficient of the independent variable does not include zero. The interval between -235.62 and -5.64 does not contain zero.
For a standard normal distribution (µ=0, σ=1), the area under the curve less than 1.5 is 93.32%. What is the approximate percentage of the area under the curve less than -1.5?
6.68%
1–93.32%=6.68% is the area under the curve greater than 1.5. Since the normal distribution is symmetric, 6.68% is also the area under the curve less than -1.5.
1 - area under the curve
If we remove OUTLIERS from the data set, what would happen to the standard deviation?
The standard deviation would decrease.
The standard deviation gives more weight to observations that are further from the mean. Therefore, removing the outliers would decrease the standard deviation.
A journalist wants to determine the average annual salary of CEOs in the S&P 1,500. He does not have time to survey all 1,500 CEOs but wants to be 95% confident that his estimate is within $50,000 of the true mean. The journalist takes a preliminary sample and estimates that the standard deviation is approximately $449,300. What is the minimum number of CEOs that the journalist must survey to be within $50,000 of the true average annual salary? Remember that the z-value associated with a 95% confidence interval is 1.96.
CALCULATE SAMPLE SIZE
(z s M)2
(1.96 (z value) x 449300/50000)2
Regression Analysis Y & X input to analyze the effect of the number of visits to the website on a given day and the number of visits to the website on the previous day on in-store sales.
Y range is in-store sales
X range is # of visits to the webiste
You report a confidence interval to your boss but she says that she wants a narrower range. In what ways can you reduce the width of the confidence interval?
- Increase the sample size
Increasing the sample size provides a more accurate representation of the population and therefore, reduces the width of the confidence interval. Note that another option is also correct. - Decrease the confidence level
Decreasing the confidence level reduces the width of the confidence interval. Note that another option is also correct.
A streaming music site changed its format to focus on previously unreleased music from rising artists. The site manager now wants to determine whether the number of unique listeners per day has changed. Before the change in format, the site averaged 131,520 unique listeners per day. Now, beginning three months after the format change, the site manager takes a random sample of 30 days and finds that the site now has an average of 124,247 unique listeners per day. Using the data provided below, calculate the p-value for the following hypothesis test:
To use Excel’s T.TEST function for a hypothesis test with one sample, you must create a second column of data that will act as a second sample.
Then, the p-value of the two-sided hypothesis test is T.TEST(array1, array2, tails, type)=T.TEST(A2:A31,B2:B31,2,3), which is approximately 0.0743.
Calculate the p-value when sample size is 60
=CONFIDENCE.NORM
One sample hypothesis test we use Excel function:
T.TEST
A researcher at a university wants to better understand how study habits affect student grades. To determine whether there is a link between hours spent studying and exam scores, the researcher recruits fifty students in an advanced calculus class to record their study hours over the semester, and also obtains permission to see their scores on the final exam.
Perform a regression analysis, where the average number of hours spent studying per week over the two-month data collection period is the independent variable and the score on the final exam is the dependent variable.
X and Y axis
Y = dependent variable = score on exam X = independent variable = avg hours studied
What shows correlation coefficient in single regression table?
Multiple R value. Remember that for single variable linear regression, Multiple R, which is the square root of R2, is equal to the absolute value of the correlation coefficient.
The regression coefficient for Average Weekly Hours Studying (0.03, as shown in the bottom table of the output) is positive, so the slope is of the regression line is positive. Therefore, the correlation coefficient must also be positive.
heteroskedasticity shows
The residuals form a funnel shape, which indicates that they are heteroskedastic. That is, the size of the residuals grows (in absolute value) as the average weekly hours studying decreases.
The linear model does not appear to be a good fit because the residuals are not randomly distributed.
Which of the following formulas would calculate the statistic that is MOST APPROPRIATE for comparing the variability of two data sets with different distributions?
Standard Deviation/Mean
This is the formula for the coefficient of variation, the best statistic to compute to compare the variability of two data sets with different distributions. Dividing by the mean provides a measure of the distribution’s variation relative to the mean.
The sports bar owner runs a regression to test whether there is a relationship between Red Sox away games and daily revenue.
Based on the regression output, what proportion of the variability in revenue can be accounted for by whether the Red Sox are playing away?
[what to look at?]
The R Square value measures how much of the total variation in the dependent variable (in this case, revenue) that is explained by the independent variable (in this case, away game).
So we take R Square and turn it into %
According to the Central Limit Theorem, the means of random samples from which of the following distributions will be normally distributed, assuming the samples are sufficiently large?
According to the Central Limit Theorem, if we take large enough samples, the distribution of sample means will be normally distributed regardless of the shape of the underlying population.
What does the R-square value tell us?
R-square indicates what percentage of the variability in the dependent variable is explained by the regression line
E.g. 0.7059 R Square shows That 71% of the variability in the number of chairs produced can be explained by whether the shift is in the morning or evening and whether it is a weekday shift or weekend shift.
Using regression model, how would you FORECAST the number of sales when 2 independent variables are present?
Dependent Variable + (Dummy X Value) + ((Dummy X Value)
A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business majors and finds the sample’s average score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample’s average score is 89. What is the alternative hypothesis of this test?
μBusiness≠μLiberal Arts
The alternative hypothesis is the claim that is being tested. Since the student wants to test whether there is a difference between business school majors’ and liberal arts majors’ trivia scores, the alternative hypothesis is that the mean scores are not equal.
A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business school majors and finds the sample’s average score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample’s average score is 89. Using the data provided below, calculate the p-value for the following hypothesis test:
[explain the Formula; and how you would assign tails, and type]
The p-value of the two-sided hypothesis test is T.TEST(array1, array2, tails, type)
=T.TEST(A2:A31,B2:B31,2,3), which is approximately 0.0524.
You must designate this test as a two-sided test (that is, assign the value 2 to the tails argument) and as a type 3 test (an unpaired test with unequal variances) because you are testing two different samples. You must link directly to values in order to obtain the correct answer.
A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business school majors and finds the sample’s average test score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample’s average quiz score is 89. The student finds that the p-value for the hypothesis test equals approximately 0.0524. What can be concluded at αα =5%?
The student should fail to reject the null hypothesis and conclude that there is insufficient evidence of difference between business and liberal arts majors’ knowledge of trivia.
Since the p-value, 0.0524, is greater than the significance level, 0.05, the student should fail to reject the null hypothesis and conclude that there is insufficient evidence of difference between business and liberal arts majors’ knowledge of trivia. Because the null hypothesis is that there is no difference between the two types of majors, this answer is correct.