Chapter 12 and 13 Flashcards

1
Q

Types of statistics

A

Descriptive and inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define both terms of statistics

A

Descriptive statistics: used to describe the characteristics of a sample or population.
Ex: class average

Inferential statistics: used to infer (estimate) population parameters (value within a population) from a subgroup (sample) of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Technical assumptions and the two parametric

A

Parametric statistics: built-in assumptions about the data distribution that must be met if the statistic is to be used

Non-parametric statistics: No built-in assumptions.

For ex, you could assume a normal distribution of the underlying populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Raw vs relative frequency

A

Raw: The results may indicate the actual number of cases

Relative: that take on each value or expressed as a percentage of the cases that take on each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the measures of central tendency?

A

group of statistics that present a single value that best represents the distribution of response
Mean, mode, median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measure of dispersion

A

group of statistics that indicate how well the measure of central tendency represents the distribution

Variation ratio, Range and Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measures of central tendency and dispersion of nominal variables

A

Mode: measure of central tendency used with nominal variables
Most frequent

Variation ratio: proportion of cases that do not fit within the modal category
Larger values indicate more variation, meaning the mode does not represent the distribution well
Smaller values indicate less variation, indicating the mode does a good job of representing the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measures of central tendency and dispersion of ordinal variables

A

Median: the most appropriate measure of central tendency. Value of observation that splits the distribution of cases in half

Range: the measure of dispersion used with ordinal-level variables. The range of possible values that the variable encompasses. Ignores all information except for the two most extreme scores

Interquartile range is more commonly used. The range between the 25th and 75th percentile. Not influenced by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is an outlier?

A

Outlier: a case that differs significantly from the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures of central tendency and dispersion for interval/ratio variables

A

Arithmetic mean: calculated by adding all of the values and then dividing by the total number of cases
The median is a better measure because it is not influenced by extreme cases

Standard deviation: estimates the average amount that each observation differs from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Positive vs negative skew

A

pulling it in the direction of extreme scores

positively skewed: extreme scores pull the mean above the median

Negatively skewed: extreme scores pull the mean below the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The greater the difference between the mean and median…

A

the more skewed the distribution is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the standard deviation?

A

Standard deviation: estimates the average amount that each observation differs from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Characteristics of standard deviation

A

The size of the standard deviation depends on how clustered the scores are around the mean

Smaller deviation if the scores are closer to the mean

The values of a standard deviation are always positive

If all scores are identical there would be no deviation. Meaning it is equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

standardized scores

A

scores expressed as the number of standard deviations that fall from the mean of the total distribution scores.

Standardized scores can be positive or negative depending on whether they fall above or below the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

contingency tables

A

Contingency tables: when working with ordinal or nominal variables, the cell in which the individual case is located is contingent upon its scores for each of the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

scatter plots

A

when working with interval/ratio variables. Graphs in which the point of an individual case lies are contingent upon its scores for each of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a perfect correlation?

A

when knowing the value of one variable always allows us to predict the value of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Measures of association

A

indicate the strength of the relationship with a single numerical value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the range of measures of association for each type of variable?

A

Nominal: 0 to 1
The closer the coefficient is to 0, the weaker the relationship

Ordinal and interval/ratio: -1 to +1
0 means a weaker relationship, while closer to +1 or - 1, means a stronger relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

positive vs negative coefficient

A

A positive coefficient means a positive relationship (change in the same direction)
A negative coefficient means a negative relationship (decrease + increase)

22
Q

If you want to compare to different data sets of different sizes. Would it be better to use…

A

relative frequency

23
Q

what are standardized scores referred to?

A

Z scores

24
Q

How to identify which Z score is less typical than the mean

A

If it is further away from the mean, it is less typical

25
Q

what is the alpha level also known as

A

confidence level

26
Q

p-value

A

The probability of observing a given sample statistic under the null hypothesis

The lower the p-value, the greater the confidence that the null hypothesis does not describe the
population from which our sample was drawn.

27
Q

what does it mean when a value is statistically significant?

A

If a p-value is lower than our pre-determined alpha level, we conclude the relationship to be
“statistically significantly different from the null hypothesis.”

Meaning there is a small chance that the null hypothesis would stand true.

28
Q

Type 1 vs Type II error

A

Type I means it is a false positive. When we infer that a relationship found in the sample exists in the population when in fact it does not

Type II means it is a false negative. when we do not find a relationship within the sample data

29
Q

What is the PRE measure of association for nominal data?

A

Lamba

Used when one or both bivariate variables are nominal

uses the mode of prediction

30
Q

PRE?

A

Proportional reduction in error measures: before and after comparison. Comparing the amount of error we have before knowing the value of an independent variable with the amount of remaining error after knowledge about the independent variable is taken into account.

31
Q

What is the non-pre measure of association for nominal data?

A

Cramer’s V

Comparing the number of cases that would be expected in each cell there was no relationship between the two variables to the actual number of cases observed.

32
Q

T-test

A

Comparing the means of two groups
Considers the difference between the mean scores of the two sample means and the amount of variation within each sample

33
Q

Chi-square test

A

Measures the association between two categorical variables.
tests the independence of two variables by assessing the likelihood that the relationship observed in the sample is due to chance.

34
Q

What are the measures for nominal data?

A

Gamma: A PRE measure that can be interpreted in terms of percentage reduction of error. Uses less information than Tau measures
Overstates the strength of the association
Can be used with both asymmetrical and symmetrical tables

Tau: use more information and less likely to inflate strengths of relationships
Selections between tau depend on the table dimensions

Tau B for symmetrical, Tau C for asymmetrical

35
Q

What is the measure of association for interval/ratio data?

A

Pearson’s r: measures the linear relationship between an independent and dependent variable.
Varies between -1 and +1

36
Q

Sampling distribution

A

all the possible sample means for a given sample size. Created by totalling the number of combinations that present the specified sample mean

37
Q

One-tailed vs two tail test

A

If the direction of the difference is not important, we use a two-tailed test

If the direction of difference does matter, we use a one-tailed test

38
Q

Basic linear regression

A

A statistical technique to estimate the location of this line for every value of the independent variable

In other words, understanding the value of X by determining the values of y

39
Q

standard error

A

analogous to the standard deviation of the mean; it provides a single value that summarizes how closely the regression line fits the data.

40
Q

What do we need to
minimize to get the best
linear regression line?

A

Unexplained variance

41
Q

How to analyze the data presented by the OLS formula?

A

First, look at the incept (a), which defines Y when X is 0. Demonstrated predicted growth

(b) incidents when growth moves by 1 unit, the predicted growth

42
Q

Explain the type I and II errors that can occur in the following scenario,

A new drug is proposed but must undergo a clinical trial. The null hypothesis is that there are no dangerous side effects

A

The drug is found to have dangerous side effects when it does not. and an opportunity for a promising drug is missed.

But, if a type II error is made, meaning there are no dangerous side effects when there are, this poses a danger to the public. Both are risks

43
Q

what are dummy variables?

A

We can enter dummy variables as independent variables to assess their effect on the dependent variable.

44
Q

what is insufficient evidence?

A

if the calculated value is less than the critical value, we must therefore accept the null hypothesis.

45
Q

how are statistical significance tests affected by sample size?

A

The bigger the sample size, the more likely you’ll find statistical significance

46
Q

what is the difference between substantive and significant statistics?

A

A relationship is substantively significant if it is theoretically important, if it plays a role in elaborating, modifying, or rejecting the theory. The need for substantive significance requires that the researcher fully examine the relationship between the variables

47
Q

The central limit theorem

A

The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution

48
Q

Coefficient of determination

A

Also known as R2. It is the proportion of variance in the outcome that can be explained by the predictor(s) in the model.
It is equivalent to the squared correlation between X and Y

In other words, R2 is the proportion of common variance between Y and
the other variables in the model.

The closer it is to 1 (range of 0 to 1) , the more the model fits the data.

49
Q

we reject the null hypothesis…

A

if the probability of chance is less than 5%.

Meaning, we are 95% certain that this is an accurate representation of the population parameter.

50
Q

In Chi-square testing, when would you reject the null hypothesis?

A

if the chi-square obtained exceeds the chi-square critical, we reject the null hypothesis of no relationship