6 | Statistics for Proportions and Frequencies I Flashcards
(POLL)
You would like to characterize the penguin species distribution on an island, what measures you could use?
- median
- mean
- modus
- proportions
- standard deviation
modus, proportions
(POLL)
Which of the following statements is true about an independence table?
* it shows the observed numbers of our real data
* it shows the expected numbers of our data
* it shows the Pearson residuals of our data
* is used to calculate the Pearson residuals
* shows expected numbers if both variables are not related
- it shows the expected numbers of our data
- is used to calculate the Pearson residuals
- shows expected numbers if both variables are not related
not these:
* it shows the observed numbers of our real data (no, this is in contingency table)
(POLL)
The Pearson residual(s) show the …
* strength of the association between two variables
* normalized deviation from the expected values
* raw deviation from the expected values
* is one number for a 2x2 table
* are 4 numbers for a 2x2 table
- normalized deviation from the expected values
- are 4 numbers for a 2x2 table
(POLL)
A good way to show the distribution of very similar count data of a single variable is the ….
* Assocplot
* barplot
* dotchart
* Histogram
* piechart
* Xyplot
- barplot
- dotchart
no:
* piechart ?
* Assocplot (no, usually used for 2 variables!)
* Histogram (no, for numerical data)
* Xyplot (no, for 2 numerical)
(POLL)
Which of the following distributions can be used to for the statistics of univariate data?
* Bernoulli distribution
* Binominal distribution
* Chisq distribution
* Normal distribution
* Poisson distribution
- Bernoulli distribution
- Binominal distribution
- Poisson distribution
no:
* Chisq distribution (2 variables)
* Normal distribution (numerical)
(POLL)
Which of the following distributions can be used to for the statistics of bivariate data?
* Bernoulli distribution
* Binominal distribution
* Chisq distribution
* Normal distribution
* Poisson distribution
- Chisq distribution - most appropriate.
yes but not the best:
* Bernoulli distribution (yes but not best)
* Binominal distribution (yes but not best)
* Poisson distribution (yes but not best)
no:
* Normal distribution (no not this)
What are contingency tables? What kind of data are they used for?
- tables with counts (of occurrences of certain level, or combination of levels)
- normally used on factors/categorical data
- [numerical data can be categorized with cut (“good” break → use quantiles for cutting)]
How can numerical data be categorized so that one can create a contingency table? What is a good way to do this?
- numerical data can be categorized with cut
- (“good” break → use quantiles for cutting)
What is the frequency in the context of categorical data?
- How often a category is counted nc
- For k categories the sum of all frequencies = n : Σ1≤c≤k nc = n
What is the relative frequency in the context of categorical data?
- sample proportion of a single category, p̂c = nc / n
- for k categories the sum of all proportions = 1: Σ1≤c≤k p̂c = 1
- percentages: relative frequencies * 100
Frequency table vs contingency table?
- frequency: 1 variable
- contingency: > 2 variables
What can we calculate from contingency tables?
- Expected values
- Residuals
How can we calculate expected values?
- Contingency table → Margin table → independence table
- Independence table contains expected number: Rowtotal * Columntotal / Total
What is a margin table?
- The contingency table with total sums for rows and columns added
What is an independence table?
- Calculated from the margin table
- Number of observations if there would be no dependencies
Which bracket means inclusive? ( or [
- [
Which bracket means exclusive: ( or [ ?
(
What is a Chi Square test? (statscast)
- A statistic that checks for patterns or relationships in categorical variables
- It checks whether any observed variations from evenly spread data are meaningful or just a coincidence
With how many variables can you do a Chi Square test? (statscast)
- One or more
Give some examples of things you could test with Chi Square and state the number of variables and levels (statscast)
- Whether a die is fair? 1 variable with 6 levels → one way chi square
- Whether participating in a study group is related to passing an exam?
- Does gender vary across educational majors? Female, male vs engineering, business, psychology.
Consider this question: Does gender vary across educational majors? Female, male vs
engineering, business, psychology. (statscast)
What test could you use for this? What are the possibilities we want to determine?
- Chi square
- No relationship → expect gender evenly spread across majors → H0
- Relationship → expect gender unevenly spread across majors → alternative hypothesis
As an _________, a chi square allows us to make __________ about the ________ beyond our data.
As an inferential statistic a chi square allows us to make inferences about the population beyond our data.
How is the chi square value calculated? (statscast)
- For each group: (expected – observed) 2 / expected
- Then add values for all groups
Chi Square: how to find the degrees of freedom? (statscast)
- DF = level of each categorical variable minus one, multipled
- From contingency table: take away 1 row and 1 column and see how many cells are left