Data Analysis Flashcards

1
Q

Quantitative or numerical variables

A

Result is a number (age, height, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical or nonnumerical variables

A

Result is something other than a number (eye color, person voted for, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Frequency or count

A

Number of times a variable appears in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relative frequency

A

Frequency of the variable appearing divided by the total number of data (appears as fractions, decimals, or percents)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Histograms (4 things)

A

Show interval data (often in percentage of relative frequency) and there are NO gaps between bars like in bar graphs. A gap indicates no data for that interval. Useful for identifying the shape or spread of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measures of central tendency

A

Goal: find the “center” of the data. Mean, median, and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Weighted mean

A

divide only the numbers that are DIFFERENT (not the frequencies for each one) multiplied by the frequencies

Ex: 2, 4, 5, 5, 6, 6, 6, 7, 9
(2) + (4) + 2(5) + 3(6) + (7) + (9) / 6 = 8.333

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which measure of central tendency is least affected by outliers?

A

The median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measures of position (6)

A

Least, greatest, median, quartiles, percentiles (99 to divide into 100 groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to calculate the 1st and 3rd quartiles

A

The median of the lower half of the data from the median as a whole, and the median of the upper half of the data (in an ordered list!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures of dispersion (3)

A

indicate the degree of spread of the data

range, interquartile range, standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interquartile range

A

difference between 3rd quartile and 1st quartile (measures the spread of the middle half of the data; less susceptible to outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find the standard deviation (5 steps)

A
  1. find the mean
  2. find the difference between the mean and each value
  3. square each difference
  4. find the average of the squared differences
  5. take the square root of the average
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The mean is X SD away from the mean.

A

The mean is 0 SD from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Most data fall within X SD of the mean

A

3 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many elements does set S have?

{1, 2, 3, 2,}

A

3

17
Q

How many elements does set S have?

{3, 1, 2}

A

3

18
Q

T/F: {1, 2, 3, 2} {3, 1, 2} are the same set

A

True

19
Q

In a set, repetitions…

and order…

A

repetitions are not counted

order does not matter

20
Q

In a list, repetitions…

and order…

A

repetitions are counted

order does matter

21
Q

T/F 1, 2, 3, 2 and 1, 2, 2, 3 are the same list

A

False

22
Q

A U C =

A

The union (overlap) between sets A and C

23
Q

A ^ C =

A

Sets A and C are mutually exclusive

24
Q

Inclusion-exclusion principle

A

The numbers of elements in a union of two sets equals the sum of their individual numbers of elements minus the elements in their intersection
(Think chemistry, algebra, physics problem)

25
Q

Multiplication principle

A

Two choices, made sequentially, second choice is independent of the first, k(m) = number of possibilities
Ex: 5 entrées, 3 desserts = 15 different meal combinations

26
Q

0!

A

1

27
Q

permutations of n objects taken k at a time (select and order k objects from a group of n objects)

A

n! / (n-k)!

28
Q

Permutations vs. combinations

A

Permutations–order DOES matter (can’t repeat or put back, etc.)
Combinations–order does NOT matter

29
Q

combinations of n objects taken k at a time (n choose k)

A

n! / k! (n-k)!

30
Q

Probability formula

A

the number of outcomes in the event (possible that fit parameters) / total possible outcomes
Ex: probability of rolling an even number on a die: 3 (2, 4, 6) / 6 = 1/2

31
Q

If event E is certain to occur, then P is…

A

1

32
Q

If event E is certain NOT to occur, then P is…

A

0

33
Q

If an event is possible but not certain, than P is…

A

between 0 and 1

34
Q

The probability that an event will NOT occur is equal to…

A

1 - probability that it will occur (E/TP)

35
Q

P(E or F) =

in general

A

P(E) + P(F) - P(E and F)

36
Q

P(E or F) =

are mutually exclusive

A

P(E) + P(F) if E and F are mutually exclusive

37
Q

P(E and F) =

A

P(E) P(F) if E and F are independent

38
Q

What is the link between data distributions and probability distributions?

A

For a random variable that represents a randomly chosen value from a distribution of data, the probability distribution of the random variable is the same as the relative frequency distribution of the data.

39
Q

4 properties of a bell curve

A
  1. mean, median, and mode are all nearly equal
  2. data are grouped fairly symmetrically around the mean
  3. 2/3 of data are within 1 SD of mean
  4. almost all of the data are within 2 SD of the mean