Module 2, Examining Data Flashcards

1
Q

Why Examine Data?

A
  1. gain an initial sense of the data
  2. detecting data entry errors or data coding errors
  3. to identify outliers
  4. to evaluate research methodology
  5. to determine whether data meet statistical criteria and assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Gain an initial sense of the data
A
  • histogram: helps us summarize how many people actually have that score - along the x axis we have score and along y axis we have frequency or count
  • dejections are variables
  • make sense of the graphs
  • we use pictures to make sense of the data
  • takeaway message: we need to know what the actual scale is otherwise we can end up with completely different stories and helpful to compare things side by side to see what the actual values are
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Detecting data entry errors or data coding errors
A
  • reverse coding: 5 becomes 1, 4 becomes 2, 2 becomes 4, 1 becomes 5 (the meaning is flipped) - we examine data to see if we made any mistake in coding with these type of scales
  • people respond strongly to negatively balanced items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. To identify outliers
A
  • rare, extreme scores that are outside the range of most other scores in the data set
  • in histograms you can identify outliers (where there are gaps from other ones, there will be an outlier)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. To evaluate research methodology
A
  • very similar scores may indicate problems with the measure used
  • if the scale is not sensitive enough we cannot tease out the variability, the scale may be too broad
  • similar scores equals a lack of variability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Examining Data using Tables - Frequency Distribution Tables

A

summarizes the number and percentage of participants for the different values of the variable
- another way to show what was in a histogram
- frequency informs us on total number of people and how many represent each category or ranking
- percent: include the total number of people in the study (divide by total number of people for percentage) - represent all that were part of the study, even those who did not report
- valid percent: total number of people who reported on that variable - changes values you are dividing by (you would essentially not include “missing system” in your calculations)
- cumulative percent does not make a lot of sense if it is nominal or categorical - makes more sense if there is some sort of ranking (ordinal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Creating Frequency Distribution Tabes

A
  1. identify all possible values for the variable
  2. determine the frequency of participants who report each value
  3. calculate the percentage for each value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Percentage Formula

A

% - frequency / total number of scores x 100
n = total number (look at frequency column for total number of scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Looking for Data Problems

A
  • frequency tables can identify “problem data”
    ◦ incorrect entry: e.g. BMI - 333
    ◦ restricted range (not much
    variability)
    ◦ highly skewed data
    ◦ missing data (want to figure
    out why there is missing data)
    ** in the absence of not knowing why you have problematic data, the best option is to remove it because you do not have the context (context helps make a decision)
    ◦ was the data not put in
    correctly or outlier (we do not
    know in this case)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cumulative Percent

A

cumulative percent: take valid percent and add the next one (this matters now that it is ordinal)
- if we say we want the percentage of those who smoked 5 or less we can take cumulative percent and find that (this would include 3 categories in this case
- beginning with the first valid percent and adding on from there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Group Frequency Distribution Table

A

a table that groups interval or ratio values of a variable into a smaller number of intervals (more manageable to look at visually)
- frequencies and percentages are calculated within the intervals
- can often change this into a histogram - we do a group frequency distribution table as it often helps us create a grouped histogram where the bars themselves will be representative of the range of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Group frequency distribution table: Real Lower Limit & Real Upper Limit

A

real lower limit: smallest value of a variable that would be grouped in a particular interval

real upper limit: largest value of a variable that would be grouped into a particular interval

ex. 10-12: 9.5 (RLL) & 12.4 (RUL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the RLL & RUL for interval 16-18?

A

RLL: 15.5
RUL: 18.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Creating Grouped Frequency Distribution Tables (rules)

A
  1. variables are grouped approximately 10 intervals (8-12)
  2. the numbers of interval should accurately represent the data
  3. intervals should be of equal size
  4. intervals should not overlap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bar Charts

A

nominal and ordinal data
- use bars to represent the frequency or percentage of values (is very similar to a histogram)
- do not care about the difference between bar chart and histogram
- looks a lot like histogram other than the fact that the bars are not touching
- often times they represent averages and bars will be means as opposed to frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pie Charts

A

nominal or ordinal (only when you have categories)
- represent the percentage of the sample corresponding to the value
- gives you an idea of proportions of a sample

17
Q

Histograms

A

interval and ratio data
- use bars to represent the frequency of values
- bars touch - indicates an interval variable
- score along x axis

18
Q

Frequency Polygons

A

interval and ratio data
- are line graphs that use data points to represent frequencies
- histograms gives more of a smooth shape of the actual distribution as compared to this
- you can smooth it out by creating a line overtop

19
Q

Inappropriate Conclusions from Figures

A
  • same data, but different y-axes!
  • the scaling of the y axis makes a big impact because it can be misleading causing you to draw wrong conclusion
20
Q

Modality

A
  • values with the highest frequency
  • with modality you need to see peaks and valleys
  • unimodal - one value that occurs with the highest frequency (histogram - one highest bar)
  • bimodal - two values that occur with the highest frequency
  • multimodal would include three or more peaks
21
Q

Symmetry

A
  • symmetric distributions have frequencies that change in a similar manner moving away from the mode
  • skewness (there are degrees of) - there needs to be nuance (slightly positively skewed for example)
  • symmetry: refers to how values of a variable change in relation to the most common or most frequent occurring values
22
Q

Symmetry: Asymmetry

A
  • asymmetric distributions have outliers that skew the shape of the distribution is
  • frequencies change in a different manner moving away in both directions from the most frequently occurring value
  • means that the the most highest values are located at one end rather than in the middle
  • it is oftentimes not even outliers that may create a skew in the distribution, however it certainly impacts the skew of the distribution
  • the portions of the distribution where the value with the lowest frequency and at ends of distributions are called tails of distributions (they are long and will have outliers which are skewing the shape)
  • based on the location of the long tail we can determine if the data is positively or negatively skewed
23
Q

Positively Skewed

A

data is said to be positively skewed when the long tail is on the right side of the distribution, with the high frequency values clustered on the left

24
Q

Negatively Skewed

A

data is said to be negatively skewed when the long tail is on the left side of the distribution, with the high frequency values clustered on the right

25
Q

Quantifying Skewness

A

skewness statistic
positive statistic = positive skew
negative statistic = negative skew
0 = perfectly normal distribution
the further the skewness statistic is from 0 the more skewed the distribution

26
Q

Variability

A
  • the amount of differences in the distribution of a variable (flatter distributions have more variability in their data)
  • are the scores different from or similar to one another?
  • kurtosis statistic helps us with variability
  • normal, peaked or flat
  • mesokurtic, leptokurtic or platykurtic
27
Q

Mesokurtic

A

neither peaked nor flat ~ kurtosis statistic would be zero (medium, middle)
- has more variability than peaked but less than flat

28
Q

Leptokurtic

A

more peaked relative to a normal distribution (two kangaroos back to back are pretty peaked and kangaroos leap)
- very little variability (for example all athletes achieved similar score on beep test)

29
Q

Platykurtic

A

flatter distribution relative to a normal distribution (platypus - pretty flat)
- the frequency of data is spread across values of a the variable

30
Q

Quantifying Kurtosis

A

Kurtosis Statistic
positive statistic = indicates leptokurtic distribution

negative statistic = indicates platykurtic distribution

0 = perfectly normal distribution

the further the kurtosis statistic is from 0 the more likely the distribution is to be not normal
- degree of is really important - may not be as leptokurtic or platykurtic as we think

30
Q

The Normal Curve

A
  • unimodal: one value that occurs with the highest frequency
  • symmetrical: right side and left side fall away in similar manner
  • neither peaked nor flat (mesokurtic)