Data Analysis Flashcards
Distribution =
How frequently different values are observed in the data
Frequency =
Number of times value appears in the data
Frequency distribution =
Table or graph that shows values and their corresponding frequencies
Relative Frequency =
Frequency of a value/Total Number of Data Entries
Relative frequency distribution
Table or graph showing relative frequencies of each value
Make predictions with the slope of trend line of a scatter plot
- Take or estimate 2 points on the trend line
- Work out the slope
- Slope = The change in y axis per every value on the x axis
- Multiply slope if needed to change x axis unit for example (for every hour, for every week etc.)
Arithmetic Mean =
Sum of all the values/ No. Of Values
Weighted Mean =
Sum of All UNIQUE Values/ no. Of unique values
Weight of a value =
Frequency it appears
Median =
‘Middle Number’
- Order values from smallest to biggest
- If no. Of Values is Odd, Median = number in the middle of this list
If No. of Values is even, there are 2 numbers in the middle. Median = Mean of these 2 values
Mode
‘Most frequent’
Value that occurs most frequently in list
There can be more than one in a data set
Positions of data
(Order data from least to greatest)
L = Least
M = Median
G = Greatest
Quartiles
Q1, Q2(M), Q3 Split data in to 4 groups:
L - Q1
Q1 - Q2(M)
Q2(M) - Q3
Q2 - G
Percentiles
99 percentiles split data up in 100 groups
Group 1. L - 1 percentile
Group 100. 99 percentile - G
How to find Q1
Find median of 1st half of data (the data before median)
How to find Q3
Find median of Second half of data (data after the median)
Dispersion
Degree of spread of the data
Most common = range, interquartile range, standard deviation
Range =
G - L
Greatest - Least
(Show maximal spread of data but can be effected by outliers)
Interquartile Range =
Q3 - Q1
Shows spread of middle data. Is not effected by outliers
Standard Deviation - measure of
Measure of spread that depends on every number in the data set (unlike ranges).
The more data is spread away from the mean - the greater the standard deviation
Sometimes called Population Standard Deviation (differentiate it from sample standard Deviation)
How to calculate standard deviation =
- Find the mean
- Find the difference between each value and the mean and square it
- Find the mean of these squared differences
- Square root this number (take only the positive answer)
How to find the SAMPLE Standard Deviation
- Find the mean
- Find the difference between each value and the mean and square it
- Sum of these squared differences/ (no. of Values - 1)
- Square root this number (take only the positive answer)
(Sometimes preferred for a sample of data taken from a larger ‘population’ (set) of data)
1 , 2 , 3 Standard deviations above the mean =
Mean + 1d
Mean + 2d
Mean + 3d
d = standarde Deviation
1 , 2 , 3 Standard deviations below the mean =
Mean - 1d
Mean - 2d
Mean - 3d
d = standarde Deviation
How many standard deviations from the mean is X?
If X > Mean
Mean + Rd = X
If X < Mean
Mean - Rd = X
Where R = no. Of standard deviations
So re written:
R = (X - Mean) / d
OR
R = (X+ Mean)/ d
In any group of data all values are within ____ standard deviations of the mean
In any group of data all values are within 3 standard deviations of the mean
Set =
Collection of objects (aka members or elements)
Repetitions do not count as additional elements
Order does not matter
Finite set
All elements can be completely counted
Infinite Set
Can’t counts all elements
E.g.: set of all integers
Empty set
Has no elements/members
Denoted by ∅
Non Empty Set =
A set with 1 or more members/elements
Subset
Set of numbers that are also all featured in a larger set.
Example: A and B are Sets. All the elements in Set A are also in Set B. Therefore A is a SUBSET of B.
Set A - {2,8} Set B - {0,2,4,6,8}
∅ is a subset of ______
∅ is a subset of every set
List =
A set that is in order
Can have repeating elements
(Unlike a set)
Intersections
A set formed from the parts that appear in both of 2 other sets.
Example: intersection of X and Y (written as X
∩ Y) =
all the elements that appear in both Set X and Set Y
Union
A set that is made up of all of the elements in 2 other sets (don’t include elements twice)
Example: The union of X and Y (written X ∪ Y) = all of the elements of Set X and Set Y
If sets are mutually exclusive
X U Y = |X| + |Y|
If sets can intersect - inclusion-exclusion principle
If set have not elements in common they are said to be ____
If set have not elements in common they are said to be mutually exclusive (or disjointed).
Written as X ∩ Y = ∅
Inclusion-exclusion principle
IF THE SETS CAN INTERSECT
A U B | = |A| + |B| - | A
∩ B |
Multiplication principle
If K = different possibilities for first choice
M = different possibilities for second choice (that is independent of first choice)
KM = different possibilities for the pair of choices
Example - 5 meals 3 deserts = 15 combos
(Note can be more than 2 choices)
Permutation
An order of elements
Example : how many permutations of the letters A B and C are there?
Factorial
n! = n(n-1)(n-2)(n-3)….. 1
Example
3! = (3)(3-1)(3-2) = (3)(2)(1) = 6
Solving Permutation problems
- Find number of elements (n)
2. Calculate: n!
No. of Permutations ( objects are placed in rising order) of k Objects taken from Set n
also written as: permutations of n objects taken k at a time
nPk = n!/(n-k)!
Example: how many 5 digit positive integers can be using 1,2,3,4,5,6,7 if none can occur more than once?
- n = 7 k = 5
- 7!/(7-5)! = 2,520
No. of combinations (objects not placed in order) of k Objects taken from Set n
also written as: permutations of n objects taken k at a time
nCk = n!/k!(n-k)!
Example: How many ways to select a 3 person committee from group of 9?
- n= 9 k = 3
- 9!/3!(9-3)! = 84
Permutations nPk =
Combinations nCk =
Permutations = The number of ways to select AND ORDER k Objects from a set of n Objects
Combinations = The number of subsets of n that contain k objects
Sample Space
Set of all possible outcomes
Event
particular set of outcomes
Probability that event (E) occurs =
P(E) = no. of outcomes that satisfy E / Number of total possible outcomes
If event E is certain to occur P(E) =
P(E) = 1
If event E is certain not to occur P(E) =
P(E) = 0
IF event E is possible but not certain
0<p></p>
Probability Event E wont happen =
1 - P(E)
Sum of probabilities of all possible outcomes =
1
The probability that both event E and F occur =
IF events E and F are independent: P (E and F) = P(E)P(F)
IF events E and F are mutually exclusive (cannot occur at same time) : P (E and F) = 0 - impossible
The probability that event E or F or Both occur
IF events E and F are independent: P (E or F) = P(E) + P(F) - P (E and F)
IF events E and F are mutually exclusive (cannot occur at same time) : P (E and F) = P(E) + P(F)