Statistics Flashcards

Question

a

Answer 1

Type 1 error rate; rate of incorrect rejection of null hypothesis **this is equal to the significance level (which is typically 0.05, meaning 5% probability of falsely rejecting the null hypothesis)

Answer 2

type II error rate; rate of incorrect acceptance of null hypothesis

Answer 3

- variability, as it is fixed depending on type of data - type I error

Answer 4

Ratio of the difference between two groups in relation to a measure of variability (standard error)

Answer 5

- non-paired: comparing cannabis treatment to placebo treatment in different groups - paired: comparing cannabis treatment to saline treatment in the same group

Answer 6

Analysis of variance Used to determine whether ≥3 means are significantly different Takes into account variance both between (treatment variance) and within (error variance) groups

Answer 7

Used when examining different treatments on different groups

Answer 8

Used when investigating different treatments on the same group

Answer 9

Used when investigating ≥2 independent variables and the interactions between then - provides a separate F value for each independent variable and interaction

Answer 10

Using multiple t-tests is advised against, as it increases the type I error - Bonferroni corrections counteracts this Hence why ANOVAs are preferred for ≥3 groups

Answer 11

aka multiple comparisons tests Used after completing an ANOVA to determine which groups are significantly different - Dunnett test - Tuker-kramer test

Answer 12

occurs when the number of measured values or data points exceeds the number of genuine replicates - eg: confusing # slices with # animals Leads to an inflation of sample size, thus artificial inflation of power

Answer 13

A statistical method used when data is not independent, and errors are correlated Assumptions: - does not assume independence of data - does not assume balanced design - does not assume homogenous variance - assumes random sampling - covariance structure must be specified

Answer 14

Provides the relationship between two variables from slope gradients and sample size - relationship defined by slope valence - does not inform gradient or derivation

Answer 15

Measures the degree of association between two variables - not sensitive to scale - quantifies strength of correlation Defined as a number, r where -1

Answer 16

The coefficient of determination: - A metric of correlation that allows comparison of two correlations = variance around mean - variance around line / variance around mean = 1 - RSS/TSS R^2 should be ≥ 0.80 eg: R^2 = 0.80 = the relationship between two variables accounts for 80% of the variation

Answer 17

Statistical method that allows examination of the relationship between 2+ variables of interest through the generation of a line of best fit - linear or non-linear - t-tests/ANOVA can be used to determine significance of regression

Answer 18

Total sum of squares (TSS) = variation of data about the mean Residual sum of squares (RSS) = variation not explained by the regression line sum of squared regression (SSR) = variance explained by regression

Answer 19

A statistical method that allows examination the relationship between two variables of interest - Calculate residual sum of squares - Smaller RSS indicates a better fit - used for all standard curves

Answer 20

The regression co-efficient (slope) / standard error of slope co-efficient = b/SE(b) - can also be expressed as a confidence interval = b ± taSE(b) - typically set at 95%, indicating that 95% of the data will fall between a set of values

Answer 21

Determines whether the amount of variation accounted for by the regression line (SSR/SSE) is greater than variation NOT explained (RSS) - signal > noise

Answer 22

- residuals are normally distributed - constant variance (SD) of residuals - independent samples If these are not fulfilled type I error increases

Answer 23

A statistical test that used calculus and matrix algebra to determine the line of best fit for a non-linear relationship - requires initial estimated parameters (mean, SD) - can be used to interpolate values - useful for obtaining Bmax, Ka, EC50 etc...

Answer 24

Data can be transformed so that it fits the assumptions for linear regression eg: - scatchard plots for binding data - lineweaver-burke plots for enzyme kinetics - logarithmic plots for kinetic data TRANSFORM DISTORTS THE ERROR - violates assumptions of regression of normal distribution of error and ~equal SE for each x value

Answer 25

X (bound drug) is often used to calculate Y (bound/free) i.e. the independent variable is part of the dependent) - results in inaccurate Y values - violates assumptions of linear regression (normal distribution and homoscedasticity; equal variance of errors)

Answer 26

Statistical analysis that are used when there are multiple dependent and/or independent variables - used commonly in clinical neuropharmacology - becoming more common in genomics and proteomics

Answer 27

An equation composed of multiple regression coefficients for different independent variables (x1,x2) but with a single dependent variable (y) y = b1x1 + b2x2 +... + c requires adjusted R^2 to take into account multiple variables as a function of sample size

Answer 28

Occurs when regression variables are highly correlated, resulting in an inflation estimate of variance through sum of squares - inaccurate coefficients - can lead to a significant F value but no significant differences between any specific groups The highly correlated variable should be removed as they are REDUNDANT

Answer 29

- identifies the most important features (principal components) that contribute to variation - Plots these variables in order of importance according to 'eigenvalue' - the second PC is always perpendicular to the first

Answer 30

A statistical method that helps you to identify the most important variables that distinguish the different groups in data - Principal component analysis - Factor analysis

Answer 31

Used to simplify complex data by identifying common factors that explain the relationships between dependent variables

Answer 32

A machine learning method that utilises multiple 'decision trees' and finds the average to give a final result - can be used to determine how good an independent variable is at predicting dependent - error plateaus after ~100 trees

Answer 33

‘components’ or ‘factors’ (mathematically known as ‘roots’) that explain most of the variation in the data - In an analogous way to ANOVA, these eigenvalues represent the major sources of variation in the covariance matrix

Answer 34

An exploratory technique often used on very large data sets to show variables that typically vary together, i.e. have a relationship - Results often shown using a ‘dendrogram’ - often requires a 'z' transform - Different algorithms can be used to determine clusteres

Answer 35

used to identify and measure the associations among two sets of variables. Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple intercorrelated outcome variables.

Answer 36

- few assumptions about data eg: random forest classification/regression, PCA, and cluster analysis

Answer 37

Way of interpreting data from PCA - plots each principal component in order based on amount of variation that it explains (eigenvalue)

Statistics Flashcards

(62 cards)