Exam 1 Flashcards
Visual Displays, Probability and Frequency
Descriptive Statistics vs Inferential Statistics
Summarizing and Exploring Data vs Making inferences based on data
Population : WEAK
Sample: WEAK
Theory
Answers a how or why question. Unlike laws, theories have not been repeatedly verified. Note, PSET 1, Law of Supply and Demand is considered a widely agreed upon THEORY
Concepts: Weak definition
The elements and ideas behind a theory
Hypothesis
A prediction
Variable
Makes up a hypothesis.. A characteristic that can very among subjects within a population
instrument
a measurement device used to measure variables
unit of analysis
the thing about which we are collecting information. A unit of analysis has variable characteristics that we analyze
Units of Measurement
used to record measurements of the variable
Independent Variables vs Dependent Variables
Cause variables vs effect variables
4 types of relationships between variables
- Positive
- Negative
- Linear / Non linear
- Statistically Significant (p values)
Mutually Exclusive
You can only select ONE option ( I can only be born in New York)
Collectively Exhaustive **
All categories are there and included
Qualitative Variables
Scale of measurement is a set of unordered categories
Categories differ in quality not quantity
Quantitative
Numerical
Set of Ordered Categories
Categories differ in quantity and/or magnitude
Discrete
Integer Values
Continuous
Can be any real value, can be subdivided (measurement rather than counting typically)
NOIR
Nominal: Are there different values
Ordinal: Can we order the variables
Interval: Can we measure the distance between the variables
Ratio: Is there a meaningful zero so that you can say something is 2 times as large
Qualitative variables can only be ordinal
Cross Sectional Data
Observations on different units taken at a snap shot
Time Series
Observations on a variable over time
Pooled Cross Sectional
Data from multiple years (multiple snapshots)
Panel / Longitudinal Data
Follows units within a cross section over a given period of time
Visual Displays (5.)
- Unit of Analysis in Rows
- Variables in Columns
- Zero Origin (Non Zero Misleading)
- Proper Scaling of Axes
- Sourced specific data
Grouped Frequency Distribution: 50 - 59
49.5 : Lower Class Boundary
50: Lower Class Limit
59: Upper Class Limit
59.5: Upper Class Boundary
Ogive WEAK
Representation of cummulative frequencies
Stem and Leaf Plots
A Vertical Frequency Chart
Left side of the line: first one or two digits (creating a category)
Right Side of Line: last digits of all the numbers that are in the first 1-2 digit categories
Histogram: WEAK
Frequency Distribution in Bar form
Sturges Rules
Used to calculate appropriate # of bins for histogram
When you double n, you can add another bin
of bins: 1 + 3.3ln(n)
Mode
- Most frequent
- There can be multiple modes
- Works for NOIR (nominal data)
Median
- p(50) position
- can be determined for OIR
- Usually unique
- Generally uneffected by outliers
Mean
- Works for OIR
- Unique
- Can be affected by outliers
Box Plots
Illustrate frequency distributions
Box is p(25),p(50),p(75)
Lower fence is p(25) - 1.5(p(75)-p(25))
Upper fence is p(75) - 1.5(p(75)-p(25))
trimmed mean
mean calculated by removing outliers
range
difference between max and min
greatly impacted by outliers
average deviation from the mean
each datapoint’s difference from mean divided by n (problematic bc it will =0)
average absolute deviation
the absolute value of each datapoint’s difference from mean divided by n
Average Square Deviation: Variance
each datapoint’s difference from mean squared and then divided by n (problematic bc it will =0)
Standard Deviation
the square root of each datapoint’s difference from mean squared and then divided by n
Coefficient of Variation
How different is this value from the average?
stdev/mean * 100
used when you are comparing two or more variables OR two or more groups
z scores
how many standard deviations away from the mean is a value
value - mean / std dev
two or more individual VALUES on different scales
chebyshev’s theorem
for any set of data and any k>1 , at least 1 - 1/k^2 of data must lie within k standard deviations of the mean
Emprical Rule
With normal distributions (bell shaped)
z score 1: 68%
2: 95%
3: 99.7%
Combination
An unordered sample
(n r) = n!/r!(n-r)!
Permutations
Order matter!
n!/(n-r)!
Random Experiment
the process by which an observation is obtained. There must be at least 2 possible outcomes and there must be uncertainty
Basic Outcome
the result of a random experiment
Sample Space
set of all basic outcomes
Event
Combination of one or more basic outcomes
Complement
outcomes in a sample space not contained by the event
Empirical Probability
Possible w/ no prior knowledge of events (think medical data.. how many ppl are born etc)
Subjective Probability
Based on past experience, essentially a prediction
Classical Probability
Based on deduction
Think dice and coins
Subtraction Rule
P(A) + P(A comp) = 1
Statistical Independence
- the probability of one event is not impacted by the probability of another
P(A) = P(AIB) - Another way to determine is if the percentage of group A in the general pop is = to the percentage in group B
NOTE: mutually exclusive is NOT statistically independent
Multiplication Rule
P(AIB) * P(B) = P (A and B)
P(BIA) * P (A) = P(A and B)