Inference Flashcards
Sample_size = 1250
p=450/1250
What is the 95% confidence interval estimate for the population proportion?
- p = 450/1250 = 0.36*
- The critical z-scores for 95% are ±1.96*
- So the answer is: 0.36 ± (1.96*sqrt(0.36*0.64/1250))*
A country is voting to choose a President; there are two candidates A and B. Every citizen has already decided if they are going to vote for A or B. You conduct a sample survey to predict the election result. What distribution will you use to model this problem?
A Bernoulli distribution
A country with a 100 million voters is voting for an election with 2 candidates A and B. A random sample of 100 voters shows that 57 people vote for And 43 people have voted for B. What is our estimate of the population variance in terms of B?
s2 = (57*(0-0.43)^2 + 43*(1-0.43)^2)/(100-1)
= 0.24757575757
We want to compare the effect of access to computers on student performance in a large high school. Two random samples of 10 students each are drawn; one sample with computers and another same without. A t-test is used to compare the GPAs of the two groups. Which of the following is a necessary assumption?
(A) The population standard deviations from each group is known.
(B) The population standard deviations from each group are unknown.
(C) The population standard deviation from each group are equal.
(D) The population of GPA scores from each group is normally distributed.
(E) The samples must be independent samples and for each sample np and n(1-p) must both be at least 10.
(D) Since the sample sizes are small, the sample must come from normally distributed populations. The sample should be independent; np and n(1-p) refer to conditions for test involving sample proportions, not means.
How does doubling the sample size change the confidence interval size?
(A) the interval size.
(B) Halves the interval size
(C) Multiplies the interval size by 1.414
(D) Divides the interval size by 1.414
(E) This question cannot be answered without knowing the sample size.
(D) If the sample sizes in increased by a factor d, the interval estimate is divided by sqrt(d)
Population mean estimate is 9250 with standard deviation =2575. A random sample of size 50 is found to have a sample mean more than 500 different from the population mean estimate. What is the probability we will mistakenly reject a true claim?
(A) 0.043
(B) 0.085
(C) 0.170
(D) .830
(E) .915
(C)
H0: µ=9250
Ha: µ≠9250
Standard deviation of the sample means is: 2575/sqrt(50) = 364.2. (Should we not be using the sample standard deviation?)
The critical z-scores are ± 500/363.2 = ±1.373
alpha - P(x ≤ -1.373) + P(x ≥1.373) = 1 - normalcdf(-1.373,1.373) =.170.
There is a 17% chance of a Type I error.
A 2007 survey of 980 American drivers concluded that 38% percent of the driving population would be willing to pay higher gas prices to protect the environment. Which of the following best describes what is meant by the poll having a margin of error of 3%?
(A) Thre percent of those surveyed refused to participate in the poll.
(B) It would not be unexpected for 3 percent of the population to readily agree to the higher gas price.
(C) Between 343 and 402 of the 980 drivers surveyed responded that they would be willing to pay higher gas prices to protect the environment.
(D) If a similar survey of 980 American drivers was taken weekly, a 3% change in each week’s results would not be unexpected.
(E) If is likely that between 35% and 41% percent of the driving population would be willing to pay higher gas prices to protect the environment.
(E) Using a measurement from a sample, we can never say exactly what the corresponding population proportion is. We can say that we have a certain confidence that the population proportion lies in a particular interval. (I think we can say that we are 95% certain that the answer lies between 38±3 percent.
A doctor wants to know if the number of patients he sees is related to the day of the week. Write out the expression for the expected value of the chi-square for the appropriate test?
Mon : 12, Tue: 5, Wed: 9, Thu: 4, Fri: 15
The expected value for each cell for a uniform distribution is:
(12+5+9+4+15)/5 = 45/5 = 9
The chi-square = ∑((obs - exp)2)/exp
= ((12-9)^2 + (5-9)^2 + (9-9)^2 + (4-9)^2 + (15-9)^2)/9
What is the chi-squared test?
A chi-squared test, also referred to as 𝛘2 test(or chi-square test), is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true.
What is the difference between a numerical variable and a categorical variable?
ategorical variable yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as “What is your major?” or Do you own a car?” are categorical because they yield data such as “biology” or “no.” In contrast, responses to such questions as “How tall are you?” or “What is your G.P.A.?” are numerical. Numerical data can be either discrete or continuous. T
Define: standard independent normal variable
The simplest case of a normal distribution is known as the standard normal distribution. This is a special case when μ=0 and σ=1, and it is described by this probability density function:
Phi(x) = (e^(-1/2*x)^2)/sqrt(2*π)
Why is the chi-squared test called the chi-squared?
- The chi-square is a reference to the square of the Greek letter chi which looks like X2.
- The X actually refers to the x as an independent normally distributed random variable.
- In other words: if you take take a independent normally distributed random variable and square it, you will get a X2 which follows its own distribution with degree of freedom 1.
- For example:
chi_square_3_degrees = (X1)2 + (X2)2 + (X3)2
where X1, X2 and X3 and three different normally distributed random variables.
A restaurant owner thinks that he gets customers in the following distribution:
Mon: 10%, Tue:10%, Wed:15%, Thu:20%, Fri:30%, Sat:15%.
The restaurant is closed on Sunday. Sal keeps track of the data for 1 week and gets the following numbers:
Mon: 30, Tue: 14, Wed: 34, Thu: 45, Fri: 57, Sat: 20.
Is the restaurant owner correct, setting alpha=5%?
Observed: Mon: 30, Tue: 14, Wed: 34, Thu: 45, Fri: 57, Sat: 20
Expected: 10% 10% 15%, 20% 30% 15%
Expected(abs): 20 20 30 40 60 30
X2 = ((30-20)^2)/20 + ((14-20)^2)/20 + ((34-30)^2)/30 + ((45-40)^2)/40 + ((57-60)^2)/60 + ((20-30)^2)/30
= 136/20 + 16/30 + 25/40 + 9/60 + 100/30 ≈ 11.44
Degrees of freedom = 6-1=5
Critical value for 5 degrees of freedom and alpha =5% is: 11.07.
Since 11.44 is more extreme than 11.07, we reject the null hypothesis that the owner is correct.
A survey of 1000 Americans reveals that 525 believe that whales are an endangered animal; in a survey of 750 Japanese, 325 believe that they are endangered. To test at 5% significance level whether or not the data are significant evidence that the proportion of Japanes who believe that whales need protection is less than the proportion of americans with this belief, a student sets up the following:
H0: p=0.525 and Ha: p
Which of the following is a true statement?
(A) The student has set up a correct hypothesis test.
(B) Given the large sample sizes, a 1% percent significance level would be more appropriate.
(C) A two sided test would be more appropriate.
(D) Given that (525+325)/(1000+750) = .486, Ha: p
(E) A two-population difference in proportions hypothesis test would be more appropriate.
(E) We are comparing two population proportions and thus the correct test involves: Ho: p1 - p2 = 0 and Ha: p1 - p2 > 0 where p1 and p2 are the respective proportions of Americans and Japanes who believe that whales need protection.
What is the probability of a Type II error when a hypothesis test is being conducted at the 5 percent significance level (alpha = .05)?
(A) .05 (B) .10 (C) .90 (D) .95
(E) There is insufficient information to answer the question.
(E) There is a different probability of Type II error for each possible correct value of the population parameter.
A fitness center advertises that the average pulse rate of its members is 68.4 bpm.
A SRS of 48 members, find a mean of 71.0 bpm with a standard deviation of 10.3 bmp. In which of the following interval is the P-value located?
(A) P
(B) .01
(C) .02
(D) .05
(E) P > .1
(D) Ho: µ = 684, Ha = µ = 68.4.
σs = 10.3/sqrt(48) = 1.487.
t-score for 71.0 is (71 - 68.4)/1.487 = 1.748
With df = 48 - 1 = 47 we have P/2 = tcdf(1.748, 1000, 47) = .0435 and P=.087
A guidance counselor wishes to determine the mean number of changes in academic major by college students to within ±0.1 at a 90% confidence level. What sample size should be chosen if it is known that the standard deviation is 0.45?
(A) 8 (B) 54 (C) 55 (D) 78 (E) 110
(C).
1.645*(0.45/sqrt(n)) ≤ 0.1
sqrt(n) ≥ 7.4 and n ≥ 54.8
So choose n = 55
For a one-sided hypothesis test for the mean, for a random sample of size 15, the t-score of the sample mean is 2.615. Is this significant at the 5 percent level? At the 1 percent level?
(A) Significant at the 1 percent level but not at the 5 percent level.
(B) Significant at the 5 percent level but not at the 1 percent level.
(C) Significant at both the 1 percent and 5 percent levels.
(D) Significant at neither the 1 percent nor 5 percent levels.
(E) Cannot be determined from the given information.
(B)
With df = 15 -1=14, the critical t-scores for the .05 and .01 tail probabilities are 1.761 and 2.624 respectively.
We have 2.615 > 1.761 but 2.615
Alternatively: P(t > 2.615) = tcdf(2.615,1000,14) = .0102
.0102 .01
A confidence interval estimate is calculated from the data of a random sample of size n. All other things being equal, which of the following will result in a smaller margin of error?
(A) A greater confidence interval
(B) A larger sample standard deviation
(C) A larger sample size
(D) Accepting less precision
(E) Introducing bias into sampling
C
The margin of error varies:
- directly with the critical z-value and
- directly with the sigma of the sample
- inversely with square root of the sample size.
Which of the following is true:
(A) Tests of significance (hypothesis tests) are designed to measure the strength of evidence against the null hypothesis.
(B) A well-planned test of significance should result in a statement that the null hypothesis is true or false.
(C) The null hypothesis is one-sided and expressed using either if there is interest in deviations in only one direct.
(D) When a true parameter value is farther from the hypothesized value, it becomes easier to reject the alternative hypothesis.
(E) Increasing the sample size makes it more difficult to conclude that an observed difference between observed and hypothesized values is significant.
A
- We attempt to show that the null hypothesis is unacceptable by demonstrating that it is improbable.
- We cannot show that it is definitely true or false.
- If the interest in deviations in only one direction, the alternative hypothesis is expressed using either .
- When a true parameter value is farther from the hypothesized value, it becomes easier to reject the null hypothesis.
- Increasing the sample size makes it easier to conclude that the observed difference between observed and hypothesized values is significant.