Quiz Incorrect Flashcards by Beatrice Jurskyte

If an independent variable has a p-value of 0.0050, which of the following could represent the Lower 95% and the Upper 95% for that variable?

-235.62; -5.64
The p-value, 0.0050, is less than 0.05 so the independent variable is significant at the 5% significance level. Therefore, the 95% confidence interval for the coefficient of the independent variable does not include zero. The interval between -235.62 and -5.64 does not contain zero.

How well did you know this?

Not at all

Perfectly

For a standard normal distribution (µ=0, σ=1), the area under the curve less than 1.5 is 93.32%. What is the approximate percentage of the area under the curve less than -1.5?

6.68%
1–93.32%=6.68% is the area under the curve greater than 1.5. Since the normal distribution is symmetric, 6.68% is also the area under the curve less than -1.5.

1 - area under the curve

How well did you know this?

Not at all

Perfectly

If we remove OUTLIERS from the data set, what would happen to the standard deviation?

The standard deviation would decrease.
The standard deviation gives more weight to observations that are further from the mean. Therefore, removing the outliers would decrease the standard deviation.

How well did you know this?

Not at all

Perfectly

A journalist wants to determine the average annual salary of CEOs in the S&P 1,500. He does not have time to survey all 1,500 CEOs but wants to be 95% confident that his estimate is within $50,000 of the true mean. The journalist takes a preliminary sample and estimates that the standard deviation is approximately $449,300. What is the minimum number of CEOs that the journalist must survey to be within $50,000 of the true average annual salary? Remember that the z-value associated with a 95% confidence interval is 1.96.

CALCULATE SAMPLE SIZE

(z s M)2

(1.96 (z value) x 449300/50000)2

How well did you know this?

Not at all

Perfectly

Regression Analysis Y & X input to analyze the effect of the number of visits to the website on a given day and the number of visits to the website on the previous day on in-store sales.

Y range is in-store sales

X range is # of visits to the webiste

How well did you know this?

Not at all

Perfectly

You report a confidence interval to your boss but she says that she wants a narrower range. In what ways can you reduce the width of the confidence interval?

Increase the sample size
Increasing the sample size provides a more accurate representation of the population and therefore, reduces the width of the confidence interval. Note that another option is also correct.
Decrease the confidence level
Decreasing the confidence level reduces the width of the confidence interval. Note that another option is also correct.

How well did you know this?

Not at all

Perfectly

A streaming music site changed its format to focus on previously unreleased music from rising artists. The site manager now wants to determine whether the number of unique listeners per day has changed. Before the change in format, the site averaged 131,520 unique listeners per day. Now, beginning three months after the format change, the site manager takes a random sample of 30 days and finds that the site now has an average of 124,247 unique listeners per day. Using the data provided below, calculate the p-value for the following hypothesis test:

To use Excel’s T.TEST function for a hypothesis test with one sample, you must create a second column of data that will act as a second sample.

Then, the p-value of the two-sided hypothesis test is T.TEST(array1, array2, tails, type)=T.TEST(A2:A31,B2:B31,2,3), which is approximately 0.0743.

How well did you know this?

Not at all

Perfectly

Calculate the p-value when sample size is 60

=CONFIDENCE.NORM

How well did you know this?

Not at all

Perfectly

One sample hypothesis test we use Excel function:

T.TEST

How well did you know this?

Not at all

Perfectly

A researcher at a university wants to better understand how study habits affect student grades. To determine whether there is a link between hours spent studying and exam scores, the researcher recruits fifty students in an advanced calculus class to record their study hours over the semester, and also obtains permission to see their scores on the final exam.

Perform a regression analysis, where the average number of hours spent studying per week over the two-month data collection period is the independent variable and the score on the final exam is the dependent variable.

X and Y axis

Y = dependent variable = score on exam
X = independent variable = avg hours studied

How well did you know this?

Not at all

Perfectly

What shows correlation coefficient in single regression table?

Multiple R value. Remember that for single variable linear regression, Multiple R, which is the square root of R2, is equal to the absolute value of the correlation coefficient.

The regression coefficient for Average Weekly Hours Studying (0.03, as shown in the bottom table of the output) is positive, so the slope is of the regression line is positive. Therefore, the correlation coefficient must also be positive.

How well did you know this?

Not at all

Perfectly

heteroskedasticity shows

The residuals form a funnel shape, which indicates that they are heteroskedastic. That is, the size of the residuals grows (in absolute value) as the average weekly hours studying decreases.

The linear model does not appear to be a good fit because the residuals are not randomly distributed.

How well did you know this?

Not at all

Perfectly

Which of the following formulas would calculate the statistic that is MOST APPROPRIATE for comparing the variability of two data sets with different distributions?

Standard Deviation/Mean
This is the formula for the coefficient of variation, the best statistic to compute to compare the variability of two data sets with different distributions. Dividing by the mean provides a measure of the distribution’s variation relative to the mean.

How well did you know this?

Not at all

Perfectly

The sports bar owner runs a regression to test whether there is a relationship between Red Sox away games and daily revenue.

Based on the regression output, what proportion of the variability in revenue can be accounted for by whether the Red Sox are playing away?

[what to look at?]

The R Square value measures how much of the total variation in the dependent variable (in this case, revenue) that is explained by the independent variable (in this case, away game).

So we take R Square and turn it into %

How well did you know this?

Not at all

Perfectly

According to the Central Limit Theorem, the means of random samples from which of the following distributions will be normally distributed, assuming the samples are sufficiently large?

According to the Central Limit Theorem, if we take large enough samples, the distribution of sample means will be normally distributed regardless of the shape of the underlying population.

How well did you know this?

Not at all

Perfectly

What does the R-square value tell us?

R-square indicates what percentage of the variability in the dependent variable is explained by the regression line

E.g. 0.7059 R Square shows That 71% of the variability in the number of chairs produced can be explained by whether the shift is in the morning or evening and whether it is a weekday shift or weekend shift.

How well did you know this?

Not at all

Perfectly

Using regression model, how would you FORECAST the number of sales when 2 independent variables are present?

Dependent Variable + (Dummy X Value) + ((Dummy X Value)

How well did you know this?

Not at all

Perfectly

A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business majors and finds the sample’s average score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample’s average score is 89. What is the alternative hypothesis of this test?

μBusiness≠μLiberal Arts

The alternative hypothesis is the claim that is being tested. Since the student wants to test whether there is a difference between business school majors’ and liberal arts majors’ trivia scores, the alternative hypothesis is that the mean scores are not equal.

How well did you know this?

Not at all

Perfectly

A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business school majors and finds the sample’s average score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample’s average score is 89. Using the data provided below, calculate the p-value for the following hypothesis test:

[explain the Formula; and how you would assign tails, and type]

The p-value of the two-sided hypothesis test is T.TEST(array1, array2, tails, type)

=T.TEST(A2:A31,B2:B31,2,3), which is approximately 0.0524.

You must designate this test as a two-sided test (that is, assign the value 2 to the tails argument) and as a type 3 test (an unpaired test with unequal variances) because you are testing two different samples. You must link directly to values in order to obtain the correct answer.

How well did you know this?

Not at all

Perfectly

A college student is interested in testing whether business majors or liberal arts majors are better at trivia. The student gives a trivia quiz to a random sample of 30 business school majors and finds the sample’s average test score is 86. He gives the same quiz to 30 randomly selected liberal arts majors and finds the sample’s average quiz score is 89. The student finds that the p-value for the hypothesis test equals approximately 0.0524. What can be concluded at αα =5%?

The student should fail to reject the null hypothesis and conclude that there is insufficient evidence of difference between business and liberal arts majors’ knowledge of trivia.
Since the p-value, 0.0524, is greater than the significance level, 0.05, the student should fail to reject the null hypothesis and conclude that there is insufficient evidence of difference between business and liberal arts majors’ knowledge of trivia. Because the null hypothesis is that there is no difference between the two types of majors, this answer is correct.

How well did you know this?

Not at all

Perfectly

To calculate conditional mean, we use formula:

Study These Flashcards

AVERAGE IF

IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. What percent of the population has an IQ over 115?

EXCEL

Study These Flashcards

To find P(x>115), the percent of the population has an IQ over 115, first compute the cumulative probability, P(x≤115), using the Excel function NORM.DIST(x, mean, standard_dev, TRUE).

Here NORM.DIST(115,B1,B2,TRUE)=NORM.DIST(115,100,15,TRUE)=0.84, or 84%. Thus, P(x>115)=1–P(x≤115)=1–0.84=0.16, or 16%.

IQ scores are known to be normally distributed. The mean IQ score is 100 and the standard deviation is 15. The top 25% of the population (ranked by IQ score) have IQ’s above what value?

[what EXCEL Function]

Study These Flashcards

NORM.INV(probability, mean, standard_dev)

A business school professor is interested to know if watching a video about the Central Limit Theorem helps students understand it. To assess this, the professor tests students’ knowledge both immediately before they watch the video and immediately after. The professor takes a sample of students, and for each one compares their test score after the video to their score before the video. Using the data below, calculate the p-value for the following hypothesis test:

[HOW TO ASSIGN TAILS & TYPE]?

Study These Flashcards

The p-value of the one-sided hypothesis test is T.TEST(array1, array2, tails, type)=T.TEST(B2:B31,C2:C31,1,1), which is approximately 0.0128. You must designate this test as a one-sided test (that is, assign the value 1 to the tails argument) and as a type 1 (a paired test) because you are testing the same students on the same knowledge at two points in time. You must link directly to values in order to obtain the correct answer.

A curious student in a large economics course is interested in calculating the percentage of his classmates who scored lower than he did on the GMAT; he scored 490. He knows that GMAT scores are normally distributed and that the average score is approximately 540. He also knows that 95% of his classmates scored between 400 and 680. Based on this information, calculate the percentage of his classmates who scored lower than he did. EXPLAIN EXCEL FUNCTION

use the Excel function NORM.DIST(x, mean, standard_dev, TRUE). Here, NORM.DIST(B4,B1,71.4,TRUE) = NORM.DIST(490,540,71.4,TRUE) = 0.24, or 24%. We find STD = (B3-B1)/1.96 = (680-540)/1.96 = 71.4) upper bound - lower bound / 1.96

P-value(s) at which you would reject the null hypothesis for a two-sided test at the 90% confidence level.

to REJECT, P-value must be < 0.1

Calculate the correlation coefficient EXCEL

=CORREL

What can be concluded from the fact that the correlation coefficient between the acceptance rate at the top 100 U.S. MBA programs and the percent of students in those programs who are employed upon graduation is -0.32?

On average, as the acceptance rate decreases, the percent of students employed upon graduation increases. -0.32 is negative which indicates that, on average, as acceptance rate decreases, the percent of students employed upon graduation increases.

Is multicollinearity an issue when the regression model is only being used for forecasting?

Multicollinearity is typically not a problem when the model is being used for forecasting, especially if the predicative power of the model is increased by the additional variable(s).

How to calculate histogram range?

Highest value bin - lowest value bin

The percent of variation can be explained by the number of factory workers is represented by the ____ in regression table

R2 value. E.g. The R2 value is 57.56%.

A reasonable estimate of the prediction interval is _______________

the point forecast (131,958) plus or minus the z-value times the standard error of the regression (14,994.93).

What is the z-value for 68% prediction intreval?

Probability Expression: P(μ–σ≤x≤μ+σ) means

Approximately 68% of values are within one standard deviation of the mean. That is, P(μ–σ≤x≤μ+σ)=68%

Probability Expression: P(μ–2σ≤x≤μ) means

Approximately 95% of values are within two standard deviations of the mean, so 95%2=47.5%95%2=47.5% of values are between two standard deviations below the mean and mean. That is, P(μ–2σ≤x≤μ)=47.5%

Probability Expression: P(μ–σ≤x≤μ) means

Approximately 68% of values are within one standard deviation of the mean, so 68%2=34%68%2=34% of values are between one standard below the mean and the mean. That is, P(μ–σ≤x≤μ)=34%. Since P(μ–2σ≤x≤μ)=47.5%, P(μ–2σ≤x≤μ–σ)=P(μ–2σ≤x≤μ)–P(μ–σ≤x≤μ)=47.5%–34%=13.5% of values are between two standard deviations below the mean and one standard deviation below the mean.

Probability Expression: P(μ+2σ≤x)

Approximately 95% of values are within two standard deviations of the mean, so 5% of values are outside of that range. Thus, 5%2=2.5%5%2=2.5% are greater than or equal to two standard deviations above the mean.

The owner of an ice cream shop wants to determine whether there is a relationship between ice cream sales and temperature. The owner collects data on temperature and sales for a random sample of 30 days and runs a regression to determine if there is a relationship between temperature (in degrees) and ice cream sales. The p-value for the two-sided hypothesis test is 0.04. How would you interpret the p-value?

If there is no relationship between temperature and sales, the chance of selecting a sample this extreme would be 4%. Correct. The null hypothesis is that there is no relationship. The p-value indicates how likely we would be to select a sample this extreme if the null hypothesis is true.

Below is a partial regression output table, which of the following values most likely belongs in the Lower 95% cell for the independent variable in the output table?

Since the p-value, 0.3956, is GREATER than 0.05, the linear relationship is NOT SIGNIFICANT at the 95% confidence level. Therefore, the 95% confidence interval of the slope MUST CONTAIN ZERO. The confidence interval is centered around the slope of 1.78, so the lower and upper bounds must be equally distant from the slope. The Upper 95% minus the slope is 6.01–1.78=4.23, so the Lower 95% is 1.78–4.23=-2.45.

If the street fair organizer wanted to compare the explanatory power of the original model and the following new regression model, which value should he consult for the new model?

It is important to use the Adjusted R2 to compare two regression models that have a different number of independent variables. 0.9225 is the Adjusted R2 of the new model.

How to measure significance in regression table when P-value is not provided?

IF variable’s coefficient does not contain 0, IT's SIGNIFICANT, if it does, IT IS NOT

Coefficient of Variation=

STD / MEAN

Suppose we want to assign dummy variables to the seasons (Winter, Spring, Summer, Fall). How many dummy variables do we need?

3 ways to calculate mean:

The Descriptive Statistics tool AVERAGE(B2:B17) SUM(B2:B17)/COUNT(B2:B17)

Assuming that all else remains constant, what happens to a confidence interval around the mean if we raise the sample size from 25 to 100?

The width of the confidence interval narrows.

The linear relationship between two variables can be statistically significant but not explain a large percentage of the variation between the two variables. This would correspond to which pair of R^2 and p-value?

Low R-squared, Low p-value

A professor wants to know if the average exam score differs between students who attended a review session and those who did not attend. What null hypothesis should the professor use to test this claim?

µattended = µdid not attend

The mean score on a particular standardized test is 500, with a standard deviation of 100. To assess whether a training course has been effective in improving scores on the test, we take a random sample of 20 students from the course and find that the average score of this sample is 550. Which function would correctly calculate the 95% range of likely sample means under the null hypothesis?

500 ± CONFIDENCE.T(0.05,100,20)

The error term, ε, is the difference between ___

the actual observed value of the dependent variable y for a specified value of x and the expected value of y for that value of x. An error term in a regression analysis is also called a residual.

When a linear model is a good fit, the residuals are ...

randomly scattered above and below the horizontal axis. When a linear model is not a good fit, we see patterns, such as curves or heteroskedasticity, in the residuals.

Given the general regression equation, ŷ =a+bxy^=a+bx , which of the following describes ŷ y^ ? Select all that apply.

The expected value of the dependent variable | The value we are trying to predict

For each of the following variables, determine if dummy variables should be created for a regression analysis. SELECT ALL THAT APPLY Gender Age (in years) Income (in dollars) Level of Education (highest degree earned)

Gender | Level of Education (highest degree earned)

Quiz Incorrect Flashcards

(52 cards)