Final Exam- Pearson's Correlation Flashcards

1
Q

Why Screen Data?

A
  1. Avoiding erroneous conclusions by checking accuracy of data
    - Use SPSS (PASW) frequency procedure
  2. Avoiding missing data (from entry, participants, equipment, etc.)
  3. Avoiding extreme values (outliers). So extreme that they distort results.
  4. Meeting assumptions of particular tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Stem and Leaf Display

A

Like a grouped frequency distribution without loss of information

  • Stem: the intervals on the left
  • Leaf: digits on the right side indicating frequency and number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why does data go missing?

A
  1. Measurement Equipment Fails
  2. Participants do not complete all trials or all items
  3. Errors occur during data entry
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Missing Data

A

If missing data are not randomly distributed, there can be systematic problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do you do with missing data?

A
  1. Analyze difference between groups (those with missing and those without)
  2. Delete cases and /or items
  3. Estimate missing values using
    - Prior knowledge
    - Calculating means using available data
    - Use regression analyses to predict values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we find missing data?

A
  1. Analyze -> Descriptive Statistics -> Frequencies and…

2. Analyze -> Descriptive Statistics -> Explore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Replacing Missing Data

A
  1. Transform -> Replace Missing Values

2. Have the option to replace with series mean, mean (and median) of nearby points, and other imputations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Causes For Outliers

A
  1. Data-Entry Errors were made by the researcher
  2. The participant is not a member of the population for which the sample is intended
  3. The participant is simply different from the reminder of the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are outliers problematic?

A
  1. Can have disproportionate influence on results (many tests take squared deviations from mean)
  2. Statistical Tests are sensitive to outliers
  3. Can create Type I and Type II errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we identify outliers in SPSS?

A
  1. Explore Menu (under Descriptive Statistics) can give you frequencies, highest and lowest scores, boxplots, and stem and leaf plots.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What should you do with outliers?

A
  1. Conduct analyses with and without

2. Some outliers are of interest (e.g., they can call attention to a poorly worded question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Are data normal?

A

Examine both univariate (individual variables) and multivariate (combination of variables) normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ways to assess normality

A
  1. Skewness: Degree of symmetry of a distribution around the mean
  2. Kurtosis: Degree of peakedness of distribution
  3. When normal, value for both are equals to zero
  4. Kolmogorov-Smirnov statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Kolmogorov-Smirnov statistic

A

Tests the null hypothesis that the population is normally distributed
-Significance of this test indicates non-normal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Normal distribution

A

A symmetrical, bell-shaped distribution having half the scores above the mean and half the scores below the mean

  • Most of the scores are clustered near the middle of the continuum of observed scores
  • Resembles bell shaped curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Variability

A

The extent to which scores spread out around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Range

A

A measure of variability that is computed by subtracting the smallest score from the largest score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Variance

A

A single number that represents the total amount of variation in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Standard Deviation

A

The standard deviation is the square root of the variance. It has important relations to the normal curve.

  • Most commonly used measure of dispersion
  • Approximately how far on the average a score is from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Skewed Distribution

A

Most of the scores are clustered on one end of the continuum

  • Positively skewed: scores cluster at the lower end of the continuum (higher than zero statistic)
  • Negatively skewed: scores cluster at the higher end of the continuum (lower than zero statistic)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Kurtosis

A

Measure of the degree of peakedness of a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Leptokurtosis

A

Distribution is too peaked with thin tall (higher than zero statistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Platykurtosis

A

Distribution is too flat with many cases in the tail(s) (lower than zero statistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Multimodal shapes

A

Scores tend to congregate around more than one point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Bimodal shapes

A

scores are clustered in two places

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Trimodal shapes

A

Scores are clustered in three places

27
Q

Mode

A

Most frequently occurring score

28
Q

Median

A

Midpoint

  • Identifying the value that splits the distribution into two halves, each half having the same number of values.
  • Best measure of central tendency when the distribution includes extreme scores because it is less influenced by the extreme scores than is the mean
29
Q

Mean

A

Average
-Most commonly reported measure of central tendency and is determined by dividing the sum of the scores by the number of scores contributing to that sum

30
Q

Range

A

Difference between the highest and lowest scores

31
Q

Interquartile range

A

Spread between the middle 50% of the scores

  • Upper quartile: top 25%
  • Lower quartile: bottom 25%
32
Q

Box-and-whisker plot

A

Summarizes the degree of variability with a picture

  • “Box”: middle 50% of scores
  • “Whiskers”: extend to highest score, 1.5 times the height of the rectangle, or to the 5th and 95th percentile
  • Line in the middle corresponds with median
  • Helps identify outliers
33
Q

Outliers

A

Scores that lie far away from the data set

-Can lead to understanding or overestimating relationship

34
Q

Why do outliers occur?

A
  • Subotage
  • Misunderstandings
  • Extreme thinking
  • Data Entry
  • Participant is not part of population from which sample is intended
  • Participant is different from rest of sample
35
Q

How can we address violations of normality assumption?

A

Data transformations

36
Q

Data transformations

A
  1. Application of mathematical procedures to make the data appear more normal
  2. Several different types of transformation exist. Appropriate one depends on shape of data.
37
Q

Linearity

A

Assumption that there is a straight-line relationship between two variables
-Important because most statistical tests only capture linear relationships

38
Q

How do we assess linearity?

A

Residuals

Bivariate Scatterplots

39
Q

Residuals

A

Examing the differences between the predicted value and the plotted (actual) values
-Also known as prediction errors

40
Q

Bivariate Scatterplots

A

Subjective method of assessing linearity

41
Q

Homogeneity of Variance

A
  1. Variance between groups is similar
  2. Assessed using Levene’s test
    - If significant at 0.05 level, homogeneity of variance can not be assumed
    - Done using General Linear Model menu
42
Q

Homoscedasticity (with two continuous variables)

A

Assumption that the variability in scores for one continuous variable is roughly the same at all values of another continuous variable

43
Q

Heteroscedasticity

A
  • This violation of the assumption of homoscedasticity can be assessed through the examination of the bivariate scatterplots
  • This violation will not prove fatal to an analysis
44
Q

Line Graph

A

A graph that is frequently used to depict the results of an experiment. The vertical or y axis is known as the ordinate and the horizontal or x axis is known as the abscissa.

45
Q

Correlational study

A

Measurement and determination of the relation between two variables

  • Used when data on two variables are available, but variables only able to be measured, not manipulated.
  • Cannot determine cause-and-effect
  • Correlation Coefficient
  • -Strength: Number
  • -Direction: Sign
46
Q

Pearson Product-Moment Correlation Coefficient (r)

A
  1. This type of correlation coefficient is calculated when both the X variable and the Y variable are interval or ratio scale measurements and the data appear to be linear
  2. Other correlation coefficients can be calculated when one or both of the variable are not interval or ratio scale measurements or when the data do not fall on a straight line.
  3. Involves two ratio or interval variables
47
Q

Correlation matrix

A

Used when we have multiple correlations. Summarizes all correlations

48
Q

Different types of correlation

A
  1. Pearson’s Product-Moment Correlation (Pearson’s r)
  2. Spearman’s Rho
  3. Coefficient of Determination
49
Q

Spearman’s Rho

A

Calculated for ordinal data

50
Q

Coefficient of Determination

A
  1. Result when r is squared
  2. Indicates proportion of variability in one variable that is associated with another variable
  3. Times result by 100 to get percentage of explained variability (or shared variance)
51
Q

Stregnths of R (effect size)

A
  1. 10 (or -0.10): small or weak
  2. 30 (or -0.30): medium or moderate
  3. 50 (or -0.50): large or strong
52
Q

Covariance

A

An association establishes that A B

53
Q

Temporal precedence (directionality problem)

A

Do we know which one came first in time??
Did A -> B
or Did B -> A
If we cannot tell which came first, we cannot infer causation.

54
Q

Internal validity (third-variance problem)

A

Is there a C variable that is associated with both A and B, independently?
-If there is plausible third variable, we cannot infer causation.

55
Q

Problems with correlation

A
  • Cause and effect?
  • Directionality
  • Third Variable Problem
56
Q

Pie Chart

A

Graphical representation of the percentage allocated to each alternative as

57
Q

Bar Graph

A

A graph in which the frequency for each category of a qualitative variable is represented as a vertical column. The columns of a bar graph do not touch.

58
Q

Histogram

A

A graph in which the frequency for each category of a quantitative variable is represented as a vertical column that touches the adjacent column.

59
Q

Frequency Polygon

A

A graph that is constructed by placing a dot in the center of each bar of a histogram and then connecting the dots.

60
Q

Data Analysis for an Experiment Comparing Means

A
  1. Getting to know the data
  2. Summarizing the data
  3. Using Confidence Intervals to Confirm what the Data Reveal
61
Q

Measures of central tendency

A

Mean, median, mode

-indicate the score that the data tend to center around

62
Q

Measure of dispersion (variability)

A

Indicate the breadth, or variability, of the distribution

  • Range
  • Standard deviation
63
Q

Standard error of the mean

A

the standard deviation of this theoretical sampling distribution of the mean
-Our ability to estimate the population mean on the basis of a sample depends on the size of the sample and on the variability in the population from which the sample was drawn, as estimated by the sample standard deviation

64
Q

Estimated standard error of the mean

A

Typically, we do not know the standard deviation of the population, so we estimate it using the sample standard deviation (s)