Definitions Flashcards

1
Q

Statistical Inference

A

The process of drawing conclusions about the probability distribution function associated to one or more variables on a population from information obtained on a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Population

A

A set about which we wish to draw conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Defined on a population is some characteristic of elements of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Census

A

A study where the variables in question are measured for every member of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Survey

A

A study where the variables in question are measured on a SRS of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SRS

A

Sample taken with replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Subset

A

A sample taken without replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Levels of a variable

A

The possible outcomes you consider for a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Simple Random Sample (SRS)

A

An SRS of size N of a population is a vector of length N consisting of elements of the population, where every elements of the population has an equal chance of being chosen for each entry of the vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Observational study

A

The collection and analysis of data with the goal in mind of determining the characteristics of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Experiment

A

Occurs when a researcher is able to control which members of a sample receive one or more interventions or treatments (experimental group) and which do not (control group) or which receive some other comparison treatment (comparison group).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True experiment

A

An experiment where participants are randomly allocated to groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Quasi experiment

A

An experiment where participants are allocated to groups through some non-random process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Response/dependent variable

A

The variable whose values are to be predicted from other values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Predictor/independent variables

A

Variables whose values are used to predict values of another variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Lurking/confounding variables

A

Variables that are not measured in an observational study, but which influences both the prediction and response variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Nuisance/covariant variables

A

A variable that is recorded in a study because it may affect the response, but it not one of the primary variables of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Factors

A

The nuisance and predictor variables in a study of experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Probability Density Function (PDF)

A

A function that describes the likelihood of a random variable taking a given value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Statistic

A

Any quantity that may be calculated from the values of a set of random variables on a random sample of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Estimator

A

A statistic on a sample which is often taken to estimate some function of the parameters in a model for the random variables on the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Sampling distribution

A

The probability density function associated to a statistic calculated of size n from a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Most likely value

A

Of a statistic under a given null hypothesis is the maximum value for the sampling distribution for that statistic under the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Region of acceptance

A

Given a significance level, for the null hypothesis is the interval of possible values for the statistic on a given sample that wiki not lead you to reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

P-value

A

Tells you how likely a result is or more extreme that the one obtained from a given study or experiment is to have occurred purely by chance if the null hypothesis is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Categorical variable

A

A random variable whose possible values cannot be put in any meaningful order.

27
Q

Quantitative variable

A

Any random variable whose values can be put in a meaningful order.

28
Q

Ordinal variable

A

Variables that have word labels and can be put into order.

29
Q

Model

A

For a random variable is a choice of a standard form we know or assume that the probability density function associated to the variable we have.

30
Q

Bernoulli trial

A

A random variable, X, with two possible outcomes and a single parameter, representing p(X=1).

31
Q

Normal random variable

A

A random variable whose PDF is a normal distribution.

32
Q

Exploratory data analysis

A

A set of techniques involving summary statistics and graphical methods for exploring data before you do formal inference.

33
Q

Kth q-quantile

A

For a set of data or a distribution is the number below which k/q of the distribution lies.

34
Q

Q-Q plot

A

For a data set with n points against a model distribution is the plot of (X,Y) values where the kth y-value is the kth smallest datapoint in the set, and the kth x-value is the kth n quantile for the model distribution.

35
Q

Standard normal (Z)

A

A normal with mean = 0 and sd = 1.

36
Q

5 number summary

A

{lowest datapoint, lower quartile, median, upper quartile, highest datapoint}

37
Q

Robust

A

The mean and standard deviations are strongly influenced by changes to only a few data points.

38
Q

Interaction plot

A

Used when interested in studying the effect of two categorical predictor variables on a single response variable.

39
Q

Effect size

A

The difference between the actual value of the parameter, on the population, and the value of the parameter under the null hypothesis.

40
Q

Confidence interval

A

An X% confidence interval for a parameter theta is an interval (L,U) generated by some procedure that in repeated sampling has an X% probability of containing the true value of theta for all possible values of theta.

41
Q

Confidence procedure

A

An X% confidence procedure is any procedure that generates intervals containing theta in X% of repeated samples.

42
Q

Unstandardised effect size

A

The difference in means, m1 - m2.

43
Q

Type 1 error

A

Rejecting the null hypothesis when it shouldn’t be.

44
Q

Type 2 error

A

Not rejecting the null hypothesis when we should.

45
Q

Smallest relevant effect size

A

The smallest difference from the null hypothesis value of the parameter that we consider to be important.

46
Q

Power

A

1 - beta of a statistical test is the probability of rejecting the null when the null is false with some effect size greater that epsilon. (Probability of not making a type 2 error when the effect size is large enough to be of interest to us).

47
Q

T-statistic

A

The statistic we get by replacing the population standard deviation by the sample standard deviation in the z-statistic.

48
Q

Degrees of freedom

A

dF = n-1

49
Q

ANOVA

A

A generalisation of t-tests.

50
Q

Full model density

A

Mu + alpha(i) + epsilon, mu is a reference level, alpha(i) represents the deviations of the mean for the ith treatment group from the reference level mu.

51
Q

Reduced model density

A

Mu + epsilon

52
Q

SS(R)

A

Residual sum of squares from the reduced model.

53
Q

SS(F)

A

Residual sum of squares from the full model.

54
Q

One-way ANOVA

A

Used when there is one categorical predictor variable and one continuous response variable.

55
Q

F distributions

A

A measure used in ANOVA, the further away from 1 it is, the more wrong the null is.

56
Q

Non-parametric test

A

One that does not make any assumptions about the distribution of residuals.

57
Q

Ranks

A

If you have a list of data from quantitative or an ordinal variable, you can put it in order. The position of the datapoint in this ordered list is its rank. If several datapoints are equal, then the rank of each one is the average of their positions on the list.

58
Q

Wilcoxon signed-ranks test

A

Non-parametric version of a 1 sample or paired t-test. For a 1-sample test; it tests the null hypothesis. H0: the median of the population is m0. For a paired t-test; it tests the null hypothesis. H0: the medians of the two populations satisfy m1-m2=m0.

59
Q

Mann-Whitney u test

A

Non parametric version of the independent samples t-test. It tests the null hypothesis that the probability of an element of the first group being greater than an element of the second group is exactly 0.5.

60
Q

Kruskal-wallis test

A

Non-parametric version of one-way ANOVA carried out on ranks. H0: for any two groups you consider, the probability that a random element in the first group will yield a greater value of your variable than a random element in the second group of exactly 0.5. Ha: for at least two groups, the probability is different from 0.5.

61
Q

Chi-squared test

A

To compare the expected values in each cell of the tables with the observed values. H0: response is independent of condition. H1: response depends upon condition. If there is a big difference, we can conclude that it is unlikely that there is no difference in the population.

62
Q

Publication bias

A

Refers to the idea that the scientific studies which end up getting published are a biased sample of the total population of scientific studies.

63
Q

P-hacking

A

Incentives to find significant p values.