Lecture 1 - Basics Flashcards

Question

When residuals are non-normal, what has more power? non para or para?

Answer 1

non parametric

Answer 2

type 1 error

Answer 3

Doing parametric stats on rank-transformed data seems to give you the same Type I error rate and power as the custom non-parametric test … and has the bonus of more possible tests being available (i.e. any parametric test) But, you can’t interpret the exact form of a relationship sensibly, so beware

Answer 4

Regression of rank(Y) on rank(X) tells you they are related, but not the shape of curve robust regression ignores outliers

Answer 5

Lead to unbalanced design and so… | Loss of power (a shame but not tragic)

Answer 6

Missing Completely At Random - good Missing At Random = Unobserved data are not random but follow the same pattern as the observed values Not Missing At Random = Unobserved values are different from the observed ones – a bigger problem.

Answer 7

when they are missing completely at random, or they are not random but follow the same pattern as the observed values.

Answer 8

if not missing at random and if the unobserved values are different from the observed ones

Answer 9

ignore NA's, unless repeated measures.. Step 1 : investigate missing values, where are they Find whether the ‘NA’s are randomly distributed with respect to the predictors and responses can do Loglinear model on missing/non-missing to see if they are missing at random! or can Replace with values that don’t bias subsequent analyses - such as most common value for that variable if there is normal distribution BUT if data skewed use median!! Or using other variables that are correlated OR using 10 nearest neighbours

Answer 10

median for numeric variables | the mode for categorical variables

Answer 11

-Parametric Specify distribution, e.g. exponential, Weibull -Cox regression ‘semi-parametric’ – assumes survival curves have same shape, but different rates -Non-parametric Can only test one factor

Answer 12

delete if known cause. Use robust statistics to reduce their influence Or replace with ‘non-influential’ values

Answer 13

Data points where the actual value isn’t known but you can set boundaries on what it must have been Censoring is a special case of ‘partial information’ (not missing) that can be dealt with by survival analysis

Answer 14

Pattern observed = signal + noise the mean is a fixed signal, as is a slope, as is the difference between two means Random normal variation – the ‘noise’

Answer 15

not the data, but the population from which our sample came from

Answer 16

more parameters can mean better fit, but risk of overfiting and can just end up redescribing the data

Answer 17

no, better to start with the mean - can estimate the population mean from your sample mean and the likely range of values around this from your sample standard deviation

Answer 18

Because the values in a sample are more likely to come from near the mean So the sample is unlikely to have as many extreme values as the parent population So the variance (and stdev) of the sample are lower than that of the population

Answer 19

sample mean measured in “standard errors” by dividing your sample mean by the standard error

Answer 20

probablity of rejecting the null hyp if the null hyp is false i.e. correct rejection of H0 probability of a type II error need effect size and expected SD

Answer 21

Internal validity is the extent to which you believe the results of the study ``` Factors that increase internal validity: Homogeneous sample (e.g. single strain, single sex, single genotype) Homogeneous conditions (constant temperature, humidity, lighting regime, diet) Single experimenter administering treatments ```

Answer 22

yes if type 2 error is far worse than a Type I error

Answer 23

the chance of at least one test being ‘significant’ (p<0.05) is around 40% EVEN WHEN THE NULL HYPOTHESIS IS TRUE!

Answer 24

Fix primary and secondary dependent variables a priori Control experiment-wise (family-wise) alpha and adjust test-specific alpha accordingly Use multivariate methods Data reduction

Answer 25

Improper inflation of sample size due to non-independence of data

Answer 26

Single variables Arrays and matrices Structures (containing, e.g., different types of variables: text or numeric

Answer 27

sep "" = space delimited data sep '\t' = tab-delimited data sep '\n' = new line delimited data

Answer 28

correlation requires norm dist, regression - Residuals around the line relating y to x must be normally distributed, x need not be normal (nor even y)

Answer 29

we minimise the squares of the residuals, (+ve/−ve diffs have same effect)

Answer 30

``` Major Axis (MA) Regression if variances similar ``` Reduced Major Axis (RMA) Regression if variances unequal

Lecture 1 - Basics Flashcards

(54 cards)