lecture 6-chia statistic Flashcards

Question 1

Q

Categorical data

Answer

A

Entities are divided into distinct categories
Binary variable: there are only 2 categories
(e.g. dead or alive; yes or no)
Nominal variable: there are more than two categories
(e.g. vegan, omnivore, vegetarian, fruitarian)
Ordinal variable: A nominal variable that has a logical, ordered order
(e.g. H1, H2A, H2B, H3, Pass)

Question 2

Q

continuous data

Answer

A

Entities receive a distinct score on a measurement scale
Interval variable: equal intervals on the variable represent equal differences in the property being measured
(e.g. difference between 2 and 4 is the same as the difference between 20 and 22)
Ratio variable: Same as interval variable but the ratios are meaningful, with a true zero point
(e.g. response times to the appearance of a target)

Question 3

Q

This distinction can be blurry

Answer

A

We can measure continuous data as categories
Age (years)
We can treat categorical variables as if they were continuous
Average number of boyfriends that women in their 20s have is 4.6 (.6 of a boyfriend?)

Question 4

Q

Analysing categorical data

Answer

A

We want to quantify the relationship between two categorical variables
(We can’t use the mean because a mean of categorical data is meaningless)
We analyse the number of things that fall into each category,
i.e. the count
Also known as the frequency

Question 5

Q

Frequency perspective

Answer

A

Frequency perspective: take a population and measure each person’s height*.
Graph this data on a histogram (or frequency distribution).
Height follows a normal (bell-shaped) (Gaussian) curve/distribution

Question 6

Q

Probability perspective

Answer

A

Probability perspective: take a person at random and measure their height.
What is the probability that they will be ~170cm tall?
Another way of asking this question is “How big is the blue area compared with all the values of the bars?”
Total count: 53,298 people
170cm people: 8,700
= 8,700
53,298

= 0.16
= 16%

Size of the bars relate directly
to the probability of an event occurring
Probability of an event occurring ranges from 0 to 1

Question 7

Q

Z- scores

Answer

A

Distributions of data will have different means and SDs
We can make use of the already calculated probabilities associated with the normal distribution (phew!)
To do this, we need to convert our data so it has a mean of 0 and a SD of 1
Z = each score – group mean
group standard deviation
Our data is now fitted onto the normal curve

Question 8

Q

Null hypothesis testing

Answer

A

We assume the null hypothesis is true (i.e. there is no effect)
We fit a statistical model to the data that represents the alternative hypothesis and see how well the model fits the data (in terms of variance)
To determine the fit, we calculate the probability of getting that ‘model’ if the null hypothesis were true
If that probability is really small (.05 or less) then we conclude that the model fits the data well and we find support for the alternative/experimental hypothesis

Question 9

Q

chi square

Answer

A

The chi-squared distribution is one of the most widely used probability distributions in inferential statistics
This distribution can be used to calculate precisely the probability of obtaining a given score

Question 10

Q

Chi-square formula

Answer

A

Χ = Σ(observedij – modelij)2
modelij
Χ means chi
Σ (sigma) means sum all of the information in the bracket afterwards
Where i represents the rows in the contingency table and j represents the columns
The observed frequencies are our counts of what happened (in our contingency table)
The model (expected) frequencies are what we would expect if things happened by chance (see next slide for how to calculate this)

Question 11

Q

Expected frequencies

Answer

A

To calculate the expected frequencies for each cell in the table we use the column and row totals for a particular cell…

Modelij = Eij = row titlei x column totalj
n

Where n is the total number of observations (fish) (e.g. 100)

Question 12

Q

Cross tabulations

Answer

A

differences may represent chance – there will most likely be a difference between observed and expected counts just by chance, even if the variables are independent
Are these differences large enough to be confident about an association?
We need to know what happens at the population level and a statistic will help us to know this. Which one?
Chi-square! It estimates the difference between the observed data and what would be expected if the two variables were independent
If the chi-square is large enough, then we can say that the two variables are associated

Question 13

Q

Degrees of freedom

Answer

A

We need to know the degrees of freedom

df 	= (number of rows – 1)(number of columns – 1)
	= (2-1)(2-1)
	= 1

Question 14

Q

Residuals

Answer

A

We can conclude that there is an association between training of goldfish and food used, but which food was driving the association?
We need to calculate the “adjusted, standardised residuals” to be confident about this
Observed – expected is called the “residual” for each cell
Adjusted, standardised residuals are residuals that are standardised so they are equivalent to a z-score in a normal distribution

Question 15

Q

Z-score distribution

Answer

A

We need the Adjusted Standardised Residuals, as there could be a difference between the observed and expected values just by chance (!)
By placing the residuals onto the z-distribution, we can take chance into account, allow a certain amount of error, and agree that a score greater than 1.96 (positive or negative) is a significant effect

Question 16

Q

Handy summary of how to perform a chi-square

Answer

Study These Flashcards

A

Note the observed values in a contingency table
Calculate the expected values
Calculate the chi-square
Calculate the degrees of freedom
Look up the chi-square, with the appropriate df, on the chi-square distribution
Calculate the adjusted, standardised residuals
Draw your conclusion

Question 17

Q

Assumptions of chi-square

Answer

Study These Flashcards

A

-Sample is drawn randomly from the population
-The sample (whole contingency table) is sufficiently large
Within each cell, the sample is large enough (typically greater than 5 observations)
The observations are independent of each other

Question 18

Q

Fisher’s exact test

Answer

Study These Flashcards

A

There is one problem with Pearson’s chi-square test…the sampling distribution of the test statistic has an approximate chi-square distribution
The larger the sample is, the better this approximation becomes. In large samples we don’t need to worry about this approximation
In small samples, this is a worry
To use the chi-square test, the expected frequencies in each cell of the contingency table need to be greater than 5
Fisher’s exact test allows for small sample sizes

Question 19

Q

Summary

Answer

Study These Flashcards

A

The chi-square statistic estimates the difference between the observed data and what would be expected if the two variables were independent
If a chi-square statistic is large enough (and hence improbable, assuming independence), then there is evidence that the null hypothesis may not hold
More formally, if the probability of the chi-square statistic is less than .05, reject the null hypothesis

lecture 6-chia statistic Flashcards

(19 cards)