Freytag Lectures 6, 7 Flashcards
Which would be a dichotomous variable?
a. Car brand
b. Sex
c. Blood type
d. Age
b. Sex
When should a chi-square distribution test be used?
a. When a data set has more than 2 variables
b. When the sample is not random
c. Only when the degrees of freedom is >3
d. When you are comparing one parameter (variable) between groups
d. When you are comparing one parameter (variable) between groups
A contingency table…
a. Is not an appropriate way of presenting qualitative data
b. Displays the frequency distribution of variables
c. Allows for easy estimation of p values
d. Can only be used to display polytomous variables
b. Displays the frequency distribution of variables
Weldon rolled 12 dice 26,306 times. Assuming each side is equally likely to come up, how many 3’s would you expect to observe?
a. (12 X 26,306)/3 = 105,224
b. √(6 X 12 X 26,306) = 1376
c. (1/6) X 12 X 26,306 = 52,612
d. (1/6) X 26,306 = 4,384
c. (1/6) X 12 X 26,306 = 52,612
Consider the following statement: “there is no inconsistency between the observed and the expected counts. The observed counts follow the same distribution as the expected counts”
a. This is likely to describe an alternative hypothesis
b. If this statement was true, we could conclude that there is no effect
c. The distribution of the observed counts and expected counts must be normal
d. This statement would likely be symbolised by H0 (null hypothesis)
d. This statement would likely be symbolised by H0 (null hypothesis)
Never claim that there is NO effect
How can you control the significance level of your test?
a. By controlling for Type I errors (false positive)
b. By controlling for both Type I and Type II errors
c. By accepting the null hypothesis when p<0.05
d. By controlling for Type II errors (false negatives)
a. By controlling for Type I errors (false positive)
What does a Goodness of Fit test measure?
a. The P value required for H0 to be false
b. How well the observe data fits the expected distribution
c. How much observed values deviate from the expected values in a normal distribution
d. The test statistic of polytomous variables only
b. How well the observe data fits the expected distribution
How is the Goodness of Fit test statistic distributed?
a. With a X^2 with k-1 degrees of freedom
b. With a X^2 with X+1 degrees of freedom
c. Using the equation (O-E)/E
d. With a skew to the left
a. With a X^2 with k-1 degrees of freedom
Which statement about an X^2 distribution is FALSE?
a. The degrees of freedom is the only parameter
b. Shape, centre and spread are influenced by the degrees of freedom
c. They are always positive and often right skewed
d. They do not require random sampling
d. They do not require random sampling
What is a condition for an X^2 test for goodness of fit?
a. Observations must not be independent of one another
b. The sample size must be small
c. Random sampling cannot be used
d. Expected table cell count should be preferably more than 10
d. Expected table cell count should be preferably more than 10
The test of significance is designed to assess the strength of evidence AGAINST the null hypothesis.
True
A type II error can be controlled whilst a type I error cannot.
False
Type I and type II errors are used together to gauge statistical significance.
False. cannot be used together
The null hypothesis must be an equal, equal or greater or equal or lesser statement.
True
You can never ACCEPT the null hypothesis due to the influence of Type II errors (false negatives)
True
If the null hypothesis states that men and women suffering from heart attack in New York were equally likely to die (i.e. sex and death are independent), what would a p value of < 0.05 suggest?
a. The null hypothesis is accepted and each variable is independent
b. The null hypothesis is rejected and each variable is independent
c. The null hypothesis is rejected and the variables are dependent of each other
d. The null hypothesis is accepted and the variables are dependent of each other
c. The null hypothesis is rejected and the variables are dependent of each other
When performing an independence test in R…
a. A two variable contingency table must not be used
b. The degrees of freedom is (k-1)(l-1) where k is columns and l is rows
c. A Chi square test is irrelevant
d. A Yates-corrected chi-square test should be used by multiplying the T value by 0.5
b. The degrees of freedom is (k-1)(l-1) where k is columns and l is rows
Which statement is false?
a. A test of homogeneity can assess whether two or more multinomial distributions are equal
b. The same assumptions for a Goodness of Fit test apply to tests for homogeneity (Random sampling, independence, large sample size, cell count)
c. Fischer’s exact test calculates the probability of obtaining a contingency table with the observed counts using a hypergeometric distribution
d. A Fischer’s exact test can only be used on contingency tables with 1 variable
d. A Fischer’s exact test can only be used on contingency tables with 1 variable
No it’s used for 2 X 2 contingency tables
What should be included in Excel data provided to a statistician? (Select all that apply)
- Calculations to save the statistician having to examine Raw data
- No empty cells. “NA” should be used instead
- The data should be rectangular
- Data should be kept in multiple sheets when there are many values
- More rows than columns
- Well labelled data, columns and rows
- Documentation of the experiment
- No empty cells. “NA” should be used instead
- The data should be rectangular
- Well labelled data, columns and rows
- Documentation of the experiment
What is NOT a feature of tidy data?
a. Replicate values are placed in the same cell
b. Each variable forms a column
c. Each observation forms a row
d. Each type of observation unit forms a table
a. Replicate values are placed in the same cell
I have to label a variable in Excel for data that I will be sending to a statistician. Which name would be most suitable?
a. Maximum Temp (C)
b. Max Temp DegreesC
c. Max_Temp
d. Max/Temp.C
c. Max_Temp
Spaces can only be used when typing variable names in Excel.
False
It is okay to embed a graph in your Excel data if it will help the statistician better understand your project.
False
send separate documentation
Comma’s must not be used in excel data.
True