Simulations Flashcards

Question

How to generate random samples from beta distribution?

Answer 1

stats.beta.rvs

Answer 2

stats.chi2.rvs

Answer 3

stats.gamma.rvs

Answer 4

special normal distribution where mean=0, sd = 1

Answer 5

test statistic used in t-test area under the curve to the right of a z score is the p value, and it’s the likelihood of your observation occurring if the null hypothesis is true.

Answer 6

probability of obtaining test results at least as extreme as results actually observed, under assumption that null hypothesis is true

Answer 7

false positive -> Type 1 error rate when we believe that there is genuine effect in population when in fact there isn’t in real data, we don't define alpha as part of the code -> instead look at p-value and compare it with alpha

Answer 8

stats.ttest_1samp (sample_data, population_mean) pg.ttest(data, population_mean)

Answer 9

probability that the test correctly rejects a false null hypothesis (H0). It’s the complement of the Type II error rate (i.e., Power=1−β).

Answer 10

refers to the magnitude of the difference that you expect to detect

Answer 11

pg.power_ttest parameters: n = sample size - if not provided calculated based on other parameters d = effect size alpha power - if not provided calculated based on other parameters

Answer 12

tests which don't require that your data follows normal distribution

Answer 13

parameter of a model is a variable that can take range of values that describe the data

Answer 14

to simulate models to then generate simulated data

Answer 15

1) report overall significance of main model first and compare to other models (F, p-values, BIC, AIC) 2) report findings of individual parameters (effects in linear regression) 3) include post-oc and additional analyses 4) do not report all non-significant results 5) round to 3 decimal places when reporting p-values/Bayes factors

Answer 16

OLS stands for Ordinary Least Squares which is method for estimating parameters in linear regression model goal is to find the line that minimizes the sum of squared differences between observed value and value predicted by the model it requires 2 main inputs: - dependent variable (observed data that you want to model -> goal is to predict them) - independent vairable which can be coded in design matrix

Answer 17

use function .fit()

Answer 18

They need to be simulated from f.ex. random distribution seperately SO you CANNOT use linspace!

Answer 19

generates evenly spaced values over specified interval for example: np.linspace(0, 2, N) generates N evenly spaced points between 0 and 2, stored in the variable x

Answer 20

when statistical model captures noise in the data -> it fits the data too well

Answer 21

when faced with 2 opposing explanations for the same set of evidence, preference is for the explanation making the fewest assumptions

Answer 22

More complex model -> becuase it is overfitting the data

Answer 23

Use predict() function and apply it to the results from model fit (calculating coefficients)

Answer 24

statistical measure in regression analysis that represents the proportion of the variance for a dependent variable explained by an independent variable or variables, with values ranging from 0 to 1

Answer 25

good model fit to the data however, it doesn't say anything about causation

Answer 26

r2 = np.round(1 - (np.var(prediction - y1, ddof=1)/np/var(y1, ddof=1)), 2) essentially 1 - variance of residuals/variance of original data

Answer 27

cross-validation! we can split initial data into seperate training and test subset then you train the model on training subset + test it on test subset

Answer 28

What is BIC? Bayesian Information Criterion you can calculate it when you fitted model with stats library it is enough to append results.bic lower BIC = better model

Answer 29

they are metrics of model comparison that penalize model complexity (having more parameters)

Answer 30

statistical procedure that resamples single dataset to create many simulated samples each of these simulated samples has its own properties - such as mean

Answer 31

x = stats.norm.rvs(0, 1, n) lower = np.mean(x) - 1.96 * np.std(x, ddof=1) / n

Answer 32

x = stats.norm.rvs(0, 1, n) upper = np.mean(x) + 1.96 * np.std(x, ddof=1) / n

Answer 33

1) draw random samples with replacement from original sample multiple times 2) for each re-sample, calculate mean and store it in means array 3) bootstrap confidence interval is derived by sorting the bootstrap sample means and selecting the 2.5th and 97.5th percentiles as the lower and upper bounds of the CI

Answer 34

Bootstrapping is non-parametric method (you don't draw from normal distribution) - it makes no assumptions about underlying distribution. It relies on resampling the data multiple times to approximate the sampling distribution to the mean.

Simulations Flashcards

(59 cards)