test 3 Flashcards
what are zener cards
cards that test psychic powers, 5 card choices total
success/correct = choose 1 symbol and its correct meaning you predicted it right
difference between N, n, and p
N: population size, total number of trials (ex. number of students)
n: number of experiments in which event of interest occurs (ex. number of cards)
p: probability of success
what is a binomial distribution
used for determining probability of getting a certain number of successes, where each trial has only 2 possible outcomes: success or failure (used in zener decks)
difference between normal and binomial distribution
binomial: used when you have a fixed number of trials (counting fails or successes)
normal: used for continuous data and can take on any interval, shaped like a bell-curve, no fixed number of trials
explain how this sapply function works (which is used inside of data frames):
sapply(0:n, function(X) sum(x==X))
- sapply applies the given function
- 0:n generates a sequence from 0 to n number
- function(X) sum(x==X) is applied to each element of this sequence
- inside function, X represents each element of sequence
- for each X, it calculates how many times X occurs in the vector x (which was generated earlier)
The sum(x==X) part counts how many times the value X occurs in the vector x.
so in essence: sapply function is used to count how many times each number from 0 to n occurs in the generated x data
what’s the code?
create a frequency distribution of the observed number correct in a zener deck with 25 cards and 100 students
first generate a set of data using the rbinom function
x <- rbinom(100, size=25, prob = 0.2)
prob of correct is 1/5 so prob. is 25, 100 students is N and size is 25 cards
now make a data frame for it:
df.zener <- data.frame(Count=0:25, Frequency = sapply(0:25, function(X) sum(x==X)))
this gives you the frequency distribution of the number correct
mean and standard deviation formulas in a binomial distribution + codes for mean and mean prob.
mean = np
n = number of trials
p = prob. of success
standard deviation = √(np(1-p))
- these formulas always work for finding the mean and standard deviation in a binomial distribution
codes:
mean(x)
mean(x)/n for prob.
sd(x) for standard deviation
95% confidence interval - why its used and how to code for it (using binomial distribution)
its used bc its reliable/precise and gives good info ab the data
- similar to 2 standard deviations above the mean
code:
qbinom(0.025, size= n, prob =p) #lower limit
qbinom(0.975, size=n, prob =p) #upper limit
q binom function is used for quartiles, and you are finding the 2.5th percentile and 97.5th percentile to get the 95% confidence interval
what’s the code?
use the probability distribution function to get the probability of observing more than a certain value
pbinom(qbinom(0.975, size =n, prob=p), size =n , prob=p, lower.tail=FALSE)
pbinom for the probability, lower.tail=FALSE to find the upper threshold, the upper tail
what’s the code?
what’s the probability that at least one person in N gets more than 9 cards correct?
prob. of no one getting more than 9 cards correct → (1-P)
then find the complement of that by subtracting it from 1 and raising to the N power → 1- (1-P)^N
code:
1 - (1 - pbinom(9, size=n, prob=p, lower.tail=FALSE))^N
what’s the code?
finding the 95% confidence interval using the normal distribution
qnorm(0.025, mean= np, sd=sqrt(np*(1-p))) #lower limit
qnorm(0.975, mean= np, sd=sqrt(np*(1-p))) #upper limit
formula to find Z score
Z = (X - μ)/σ
X = vector of observations
μ = mean
σ = standard deviation
Z score (what it is, what values are common + Z score code)
Z score: tells u how far a particular data point is from the average of a group of data points, measured in terms of standard deviations
- Z score of 1 means 1 standard deviation and so on
Z 0.025 = -1.96 & Z 0.975 = 1.96
Z- score code: qnorm(0.975) or qnorm(0.025)
what’s the code?
expected lower and upper 95% confidence limit of X using Z score
when it says expected, it has to do with proportions
qnorm(0.025) sqrt(np(1-p)) + np #lower limit
qnorm(0.975) sqrt(np(1-p)) + np #upper limit
formula: X =σZ + μ
whats the code?
one sided z-test for proportions using p and p0 & two-sided z-test
z <- (p - p0) / sqrt(p0 * (1-p0)/N)
one sided
pnorm(z, lower.tail=TRUE)
- in one sided tests, we only care about the left hand side so (bc observed value is less than what we expected it to be)
what u just calculated gives u p value and if its less than 0.05, you reject null hypothesis, greater = accept null hypothesis
two sided
2*pnorm(abs(z), lower.tail=FALSE)
- can do either FALSE or TRUE for tail but just be consistent
what are Q-Q plots used for?
to asses departures from normality
- whether or not data follows a theoretical distribution
whats the code to find the empirical and expected (were the data normal) 99th percentile of weight
empirical means observed and you use the quantile function for that
for expected, you use the classic formula of X = σZp + μ
quantile(weight, prob=0.99) #empirical/observed 99th percentile
type=1 at the end is there by default
sd(weight)*qnorm(0.99)+mean(weight) #expected 99th percentile
what you’re calculating is basically the number that 99% of the data fall below
how do you plot a Q-Q plot with the normal line too? (what’s the code for it)
use the weight example after attaching babies
qqnorm function creates the graph
qqnorm(weight, xlab=”theoretical quantiles”, ylab=”maternal weight (lbs)”) #q-q plot
qqline(weight, col=”red”) #normal line
what is a t test used for
used to compare the means of 2 different populations
- more helpful when there are relatively small sample sizes and you are dealing with a normal distribution
lower and upper limit values of the normal distribution
qnorm(0.025)
qnorm(0.975)
Z = -1.96 and 1.96
95% of the values on a standard normal distribution will be within this range