Theory Flashcards

1
Q

What is nominal data?

A

Assigns observations to unordered categories. Examples include hair colour, political party etc. They can be graphed with a pie or bar chart (being discrete) and summarised with number/percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ordinal data?

A

Assigns observations to ordered categories. Examples include level of education, meaningfullness categories etc. They can be graphed with a pie or bar chart (being discrete) and summarised with number/percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is interval/ratio data?

A

Assigns scores on a scale with quantitative information, with outcomes of calculations being sensible. Ratio scale also has a true meaningful zero-point. Examples include reaction time, IQ scores etc. They can be graphed with a histogram and boxplot and summarised with a 5 number summary and mean+stdev

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the special features of mode?

A

It is not sensitive to outliers, making it a resistant measure. There can also be multiple modes in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the special features of median?

A

It is a resistant measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the special features of mean?

A

It is not a resistant measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the special features of range?

A

it is not a resistant measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the special features of variance?

A

It is not a resistant measure and it is often expressed in the square of the unit of the variable (ie. cm^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the special features for standard deviation?

A

It is not a resistant measure and is expressed in the same unit as the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the 1.5 x IQR rule?

A

If a value is higher than 1.5 x IQR + Q3 or lower than 1.5 x IQR - Q, the value is an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some characteristics of a density curve?

A
  1. The curve is always above the horizontal axis
  2. The total area under the curve is equal to 1
  3. The area under the curve and above any range of values is the proportion of all observations that fall in that range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When does the empirical rule apply?

A

When the distribution is normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the z-score?

A

The number of stdevs that an observation is from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a correlation as a measure?

A

It is a measure that indicates how strong a linear relationship is between 2 quantitative variables. It is sensitive to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is it always beneficial to create a scatterplot before calculating a correlation?

A

To check whether the association is linear and whether there are outliers present in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an independent variable?

A

A variable that explains or causes changes in the response variable

17
Q

What is a dependent variable?

A

A variable for which a predication is made

18
Q

What is a regression line?

A

A straight line in a scatterplot that is as close as possible to the points in the graph

19
Q

What is the meaning of the slope and intercept in a regression line?

A

Slope is the amount by which y changes when x increases by 1 unit
Intercept is the value of predicted y when x = 0

20
Q

What is a residual and what is its correct notation?

A

The difference between the observed score yi and the predicted score y(hat)i that reflects the “error” you make in the prediction of observation i:

yi - y(hat)i

21
Q

What is the principle of ordinary least squares regression?

A

The best line is the line with the smallest sum of the squared residuals (vertical distances from the points to the regression line):

min∑(yi - y(hat)i)^2

22
Q

What is the meaning of correlation coefficient in relation to the regression line?

A

A measure of the linear association between x and y which indicates how well the point in a scatterplot follows the straight line.

23
Q

How can you use R^2 to examine the fit of the regression model?

A

The square of the correlation determines the percentage of variance in y that is explained by changes in x (percentage explained variance). The higher this percentage, the better prediction of y

24
Q

What is a residual plot and how can you use it to investigate the fit of the regression model?

A

A graphical representation (scatterplot) of the residuals with on x-axis, the value of the independent variabel x, and on the y-axis, the value of the residuals. In case of a good model fit, the residuals are small and have a value around 0; in case of perfect model fit, all residuals are equal to 0

25
Q

What is an outlier?

A

It is a value that appears to be unusually large or small, compared to the rest of the data

26
Q

What is an influential observation?

A

An observation that affects the results of the statistical outcomes considerably, with its removal resulting in considerable differences in the outcomes.

27
Q

What is a lurking variable?

A

A variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

28
Q

What is a prediction model?

A

The equation of the regression line that is used to predict scores on a dependent variable y based on a score on an independent variable x: y(hat) = a + bx

29
Q

What is Simpson’s paradox?

A

A situation in which a trend or relationship is observed between different groups, which disappears when the groups are combined. Depending on how the groups are combined, the conclusion is different and sometimes even the opposite.

30
Q

What are the similarities and differences between covariance and correlation coefficient?

A

They are both measures that indicate how strong two variables are linearly associated. However, covariance can be all values between -infinity to infinity as it is not standardised

31
Q

What is a monotonic graph?

A

One where as a value increases, its next value should increase.

32
Q

What are disjoint events?

A

Two events A and B are disjoint if they have no joint outcomes and can never happen at the same time

33
Q

What are independent events?

A

Two events A and B are independent if the knowledge about A’s performance does not say anything about the probability of performance B

34
Q

What are independent trials?

A

Two (or more) draws (of a random phenomenon) are independent if the outcome of one draw is not affected by the outcome of another draw

35
Q

What is the Central Limit Theorem?

A

If X has a normal distribution, the sampling distribution of the sample mean will also have a normal distribution. If X has a population mean and population standard deviation, the sampling distribution of the sample mean has an approximate normal distribution.

36
Q

When is binomial probability addressed?

A

When there are a fixed number of observations that are independent, fall into either success or failure and if the probability if a success is the same for each observation