Basic Descriptive Data Analysis Flashcards

1
Q

frequency distribution table

A

ranked order scores that shows the number of times each value occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

categorical data contains what in a frequency distribution table

A

raw and relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

continuous data contains what in a frequency distribution table

A

raw, relative, cumulative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

raw frequency

A

how many fall into that data, usually whole number

example: 5/27 people were age 0-10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

relative frequency

A

how does the number of data points relate to the entire sample, in %

example: 5/27 = 18.5 %

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

cumulative frequency

A

cumulative % up to indicated range you are looking at

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

class intervals (BIN)

A

defined range limits in which data is grouped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

categorical variables

A

separate due to lack of relationship to one another - ranking order tho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are two major statistical outputs of categorical data

A

frequency and percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what constitutes a normal distribution curve

A

bell shaped curve

symmetrical around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the statistical significance of normal distribution?

A

many datasets follow the bell shaped symmetrical around the mean shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does bimodal mean

A

bad, “two humps”

suggestive of 2 different populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

left skewed

A

negatively skewed

tail is to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

right skewed

A

positively skewed

tail is to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

stem and leaf plot

A

used with continuous data

good for showing individual data
bad for large amounts of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

histogram

A

continuous data

we can get distribution curves with a histogram
good for showing midpoint of data and large amounts of data
bad for showing individual data

17
Q

mean

A

equals sum of all values / total number of values

18
Q

when is mean commonly used

A

used for measuring central tendency

19
Q

when is mean less helpful

A

when outliers present or with skewed distribution

20
Q

median

A

equals value of middle of ranked data

21
Q

when is median most helpful

A

more helpful than mean when outliers present or with skewed distribution

22
Q

mode

A

equals value that occurs most often

less commonly used compared to mean and median

23
Q

what is the relationship between mean, median and mode with symmetrical data

A

mean = median = mode

24
Q

what is the relationship between mean, median and mode with right skewed data

A

mode < median

25
Q

what is the relationship between mean, median and mode with left skewed data

A

mode > median

26
Q

range

A

nominal and ordinal

2 extreme scores
can be listed as interval, misleading value is outliers present

27
Q

percentiles

A

describes a scores position within distribution

28
Q

interquartile range

A

Q3 - Q1

more helpful than range when outliers present

29
Q

Q1

A

value that occurs at the first quarter mark (25%)

30
Q

Q2

A

value that occurs at the second quarter mark = median

31
Q

Q3

A

value that occurs at the third quarter mark (75%)

32
Q

box plot

A

summary in 5 numbers

min, Q1, Q2/median, Q3, max

33
Q

pros and cons of box plots

A

large number set

do not keep exact values
lose info within data

34
Q

standard deviation

A

average absolute distance of each point from the mean

35
Q

what is standard deviation helpful for

A

distinguishing statistically significant data points from random fluctuations

36
Q

proportional area under a normal curve: 68 %

A

within 1 standard deviation from the mean

37
Q

proportional area under a normal curve: 95 %

A

within 2 standard deviation from the mean

38
Q

proportional area under a normal curve: 99.7 %

A

within 3 standard deviations from the mean

39
Q

coefficient of variation

A

unitless measures that depicts the size of the SD relative to the mean

especially helpful when comparing variation of greater than or equal to 2 variables measured in different units

often expressed as a %