Chapter 3: Graphical Descriptive Techniques II Flashcards
Classes
Sets of intervals that together cover the complete range of observation
- no overlap
- not essentially they be equally wide but it does make analysis easier
Histogram
Bar graph where bases of bars are the intervals and heights are frequencies
Good for for showing frequency distribution
Use excel data analysis function
- Excel counts numbers that are greater than lower limit and less than or equal to upper limit
Determining number of class intervals
Depends on number of observations in data set
Sturges’ formula:
For n observations, number of class intervals = 1+ (3.3*log(n))
Round up
Observations <50 : 5-7 classes 50-200 : 7-9 classes 200-500 : 9-10 classes 500-1000 : 10-11 classes 1000-5000 : 11-13 classes 5000-50000 : 13-17 classes >50,000 : 17-20 classes
To determine approximate class interval width
Width = (largest observation - smallest observation) / number of classes
Round to convenient value
Then define lower limit for first class from which all other limits will be determined (first class interval must contain smallest observation)
Consider ease of interpretation!! Exceptions apply if guidelines will not yield a histogram that is easy to interpret
Histogram shape: symmetrical
If, when a vertical line is drawn down the center of a histogram the two sides are identical in shape and size
Histogram shape: skewness
A histogram with a long tail extending in one direction (tail = fewer occurances)
Extending to the right: positively skewed
Extending to the left: negatively skewed
Mode
Observation which occurs with the greatest frequency
Modal class
The class in a histogram (frequency table) with the largest number of observations
Histogram shape: unimodal
A histogram with a single peak (one class contains the most data)
Histogram shape: bimodal
A histogram with two peaks. (Peaks do not have to be of equal hight)
Often indicate that two different distributions are present
Histogram shape: Bell
Symmetric, unimodal histogram
Return on investment
= gain (or loss)/ value of investment
Modal class
Class that contains the greatest frequency of observations
Factors that identify when to use a histogram
Objective is to describe a single set of data
Data type: interval
Cross-sectional data
Observations are measured at the same time
Time-series data
Observations are measurements at successive points in time
Data may be interval (Quantitative values) or nominal (frequency of value)
Line chart
Often used to graphically depict time-series data
Plots a values of a variable (vertical axis) over time (horizontal axis)
CPI
Basket at current year prices / basket at base year prices x 100
To remove the effect of inflation
Divide current prices by current cpi and then multiply by 100
Shows relative base- year price
Scatter diagram
Technique to describe the relationship between two variables
When one is dependent:
Y axis = dependent variable
X axis = independent variable
Otherwise which variable is on which axis is arbitrary
Most important characteristics are strength and direction of the linear relationship
Linearity
How closely the scattered data adheres to a linear line.
Can be:
strong (points are close to a line)
medium-strength
weak (points appear to be scattered randomly)
Least squares method used to objectively choose the linear line used
(Relationships might also be quadratic or exponential)
JUST BECAUSE VARIABLES HAVE LINEAR RELATIONSHIP DOES NOT MEAN THERE’S A CAUSAL RELATIONSHIP.
Correlation =/= causation
Positive linear relationship
One variable increases as the other increases (variables trend in the same direction)
Negative linear relationship
One variable decreases as the other increases (variables trend in opposite directions)
Factors that identify when to use a scatter diagram
When the objective is to describe a relationship between two variables
With interval-type data