Week 2 Flashcards by oisin mcelwain

Write one line of code that would simulate three dice rolls

np.random.choice(arange (1,7), 3)

How well did you know this?

Not at all

Perfectly

Write a line of code that would decide whether this is a ‘poker’

if np.std(x) == 0:

How well did you know this?

Not at all

Perfectly

What does the following lines of code denote?

stats. norm.rvs(0, 1, 100)
stats. binom.rvs(1, .5, 100)

stats. norm.rvs(0, 1, 100) generates a normal distribution with a mean of 0, std of 1 and n of 100
stats. binom.rvs(1, .5, 100) generates a binomial distribution with a probability of 0.5 and an n of 100

How well did you know this?

Not at all

Perfectly

What do the following lines of code denote:

stats. norm.ppf(0.946)
stats. norm.rvs(0, 1, 50)
stats. norm.cdf(2)
stats. norm.pdf(2)

stats. norm.cdf(2) = the probability of being at the left side of the distribution at a z value of 2
stats. norm.pdf(2) = the height of the distribution at a z value of 2
stats. norm.ppf(0.946) = gets a z value for the probability area from the left of that z value
stats. norm.rvs(0, 1, 50) = generates data from the distribution

How well did you know this?

Not at all

Perfectly

How to get a pvalue using a t test in python

alpha = 0.05
n = 100
x = stats.norm.rvs(loc=0.0, scale=1.0, size=n)

pg. ttest(x, 0)
pg. ttest(x, 0)[‘p-val’][0] pg.ttest(x, 0)[‘p-val’][0] < alpha

How well did you know this?

Not at all

Perfectly

Show a code which computes the power of the simulated t test

Add effect to the values at the start:
n=100;effect=.3;std=1; alpha=.05;replications=500

rejections = np.zeros(replications)

for i in range(0, replications): x = stats.norm.rvs(loc=effect, scale=std, size=n) if pg.ttest(x, 0)[‘p-val’][0] < alpha: rejections[i] = 1

print(np.mean(rejections)) # power

essentially the mean of the rejections with the location as the effect size

How well did you know this?

Not at all

Perfectly

How can you compute the power analytically?

df=99;ncp=effect/(std/np.sqrt(n))

print(1 - stats.t.cdf(stats.t.ppf(1-alpha/2, df), df, loc=ncp))

or:

pg.power_ttest(effect, n, alpha=alpha, contrast=’one-sample’)

How well did you know this?

Not at all

Perfectly

What is the power to detect an effect = 0?

Alpha! (0.05)

How well did you know this?

Not at all

Perfectly

The Z value for 95% confidence is Z=1.96.

Some students answered qnorm(.95) or stats.norm.ppf(.95). What goes wrong?

We need 2.5 % on both sides to calculate a two-sided confidence interval, thus use qnorm(.975) in R and stats.norm.ppf(.975) in Python.

How well did you know this?

Not at all

Perfectly

What is the relationship between power and assumptions?

The more assumptions you have, the higher the power

How well did you know this?

Not at all

Perfectly

Compare the wilcox test to the t test

It carries out the same task but is based on ranks (no normal distribution assumed, non parametric). It just requires that the data is symmetrical

wilcox.test(x,mu=0)

How well did you know this?

Not at all

Perfectly

Name a similar technique to the t test and wilcox test and how it differs

proportion test - nominal with no assumptions at all!

prop.test(sum(x>0),n)

Checks whether each value is larger than 0 or not, if it is then it adds a 1

How well did you know this?

Not at all

Perfectly

For asssignment 6 we compared the power of the parametric, non parametric and nominal test in a solution for n=100,effect=3,std=1.

What was concluded?

power Parametric (t test) and nonparametric (wilcox) comparable but power nominal test (proportion test) seriously lower

How well did you know this?

Not at all

Perfectly

How is the cusp catastrophe related to Jaap’s research?

When people start smoking, relapse in depression, fall asleep etc

How well did you know this?

Not at all

Perfectly

How does Jaap relate a catastrophe to perception?

visual illusions e.g the cube which jumps in how you perceive it. Made a mathematical formula for it.

How well did you know this?

Not at all

Perfectly

What did we learn about power through carrying out regression analysis in python

Study These Flashcards

You need to have a large amount of participants to really find the underlying parameters

What is meant by Occam’s razor?

Study These Flashcards

When faced with two opposing explanations for the same set of evidence, our minds will natuirally prefer the explanation that makes the fewest assumptions.

How is occam’s razor related to stats?

Study These Flashcards

The problem of overfitting data to a model. When applying a model, a more complex model with more parameters will always do better, but the question is how much better it should do before we accept the more complex model

Describe the three basic ways of evaluating this more complex model

Study These Flashcards

Cross validation
Resampling (using simulation)
logistic regression

What does this line of code do when generating data?

y1 = 3 + 1*x + stats.norm.rvs(0, 1, N)

Study These Flashcards

Generates data along the equation of 3 + 1x + noise from a normal distribution

What do these lines of code doing a regression analysis?

design_matrix1 = np.vstack((np.ones(N), x)).T
design_matrix3 = np.vstack((np.ones(N), x, x**2, x**3)).T
lm1 = sm.OLS(y1, design_matrix1)
results1 = lm1.fit()
pred1 = results1.predict()
lm3 = sm.OLS(y1, design_matrix3)
results3 = lm3.fit()

Study These Flashcards

design_matrix1 = np.vstack((np.ones(N), x)).T
lm1 = sm.OLS(y1, design_matrix1)
results1 = lm1.fit()
Fits the linear model

design_matrix3 = np.vstack((np.ones(N), x, x2,x3)).T
lm3 = sm.OLS(y1, design_matrix3)
results3 = lm3.fit()
Fits the quadratic model

What did we learn from this overfitting assignment

Study These Flashcards

Although we generated the data with a linear model. The quadratic model we generated fit the data better. We kept half the data generated aside to cross check these models. The linear model fit better than the quadratic demonstrating the problem of overfitting. This is like a replication in terms of statistics

How do you interprit AIC and BIC

Study These Flashcards

The lower the score, the better the fit, as punished by the number of parameters. Used to compare the fit of two models based on the same data. Similar to ANOVA

Aic and Bic both penalize goodbess of fit with the number of parameters used in the model. What is their difference

Study These Flashcards

BIC takes sample size into account and AIC does not, the punishment is larger for BIC (multiplied by logn).

Logistic regression can also be used using the following code: x = stats.norm.rvs(0, 1, 1000) logit = 1 + 1*x # make logit data y = (stats.uniform.rvs(0, 1, 1000) < stats.logistic.cdf(logit)) g = sm.GLM(y, x, family=sm.families.Binomial()).fit()print(g. summary()) # results Why does Han not recommend this?

Because it 'gives this GLM thing' and you like to be sure that you do something sensible and if you are first able to generate data under that model, and feed it back and find your parameter values then you are in charge.

How can resampling (using simulation to do statistics) be useful? (3)

Resampling- using simulation to do statistics? Validating: Validating models by using random subsets (bootstrapping, cross validation) ■Regression example (cross validation) ○Precision: Estimating the precision of sample statistics (medians, variances, percentiles) by drawing randomly with replacement from a set of data points (bootstrapping) ■E.g. confidence interval of the mean ○Significance tests: Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests) ■T.test example (see next slides)

When doing statistical tests in r or python what assumptions do we rely on regarding the distribution of data?

instead of relying on assumption about the statistical distribution of data (e.g., normally distributed), we use simulation to generate distributions for comparison with the data. There are many different options! In the assignment we did a nonparametric bootstrap of test of correlation

In the significance test what does the following lines of code do? for i in range(N): rs[i] = np.corrcoef(x, np.random.choice(y, len(y), replace=False))[0, 1] np.sum(rs >= r)/N

How often is my correlation exceptional against this distribution of simulated correlations. How often are these simulated correlations higher than my correlation/ N

Week 2 Flashcards

(28 cards)