LECTURE WEEK 2 Flashcards
What is a frequency distribution table and its characteristics?
Only numerical indicator of nominal data is frequency
A frequency distribution refers to a table that presents the categories and their counts
A relative frequency distribution refers to a table with categories and their proportion of total count
A cross-classification table is used to describe the relationship between two nominal variables. It lists the frequency of each combination of the values of the two nominal variables
All tables can be built using either the entire population or a sample selected from the population
What are bar charts?
Bars rising vertically from the horizontal axis to capture the level of frequency.
Bars moving horizontally from the vertical axis to capture the different categories.
Note that chart orientation can differ.
How to build a bar chart?
Build a frequency table by counting the number of occurrences (also known as frequencies) of each value. You can also convert these counts to proportions.
Use the number of frequencies or occurrences to draw the bar chart.
How to build a pie chart
A pie chart is a circle who is subdivided into slices whose areas are proportional to the frequencies.
To build a pie chart
- Build a frequency table that relates the frequency (in %) to find the angle needed for the pie chart
- Total some of the angles of all slices = 360 degrees. Therefore if 100% correspond to 360, then y% correspond to (y x 360)/100 or y x 3.6
- Use the angles to draw the bar chart
Histogram for numerical data
The most common graphical method used for numerical data (or quantitative, interval) is the histogram.
A histogram helps explain probabilities provided the area of the histogram is normalised to 1.
A histogram looks like a bar chart, however it contains more information, and there is no space between the bars.
The height of each bar captures the frequency or relative frequency of the class (or interval) it represents.
Building a histogram (class numbers)
Collect the data.
Create a frequency distribution table for the data.
Determine the number of classes to use (no wrong answer).
Sturges formula can be used (K = 1 + 33log(n) where K is the number of classes and n is the number of observations.
Building a histogram (class width)
After picking a number of classes, how to decide the boundaries between them, and as such, the class width?
Usually, equal class widths are the best, but sometimes unequal are necessary.
Unequal class widths are often used when the frequency of some classes is too low, as a result
- Several classes are combined together to form a wider and more populated class
- An open-ended class is formed at the higher or lower end of the histogram
- Class width = largest value – smallest value/number of classes
What are the different shapes of Histograms?
Symmetry of histograms
• A histogram is said to be symmetric if, when we draw a vertical line down the centre of the histogram, the two sides are identical in shape and size
Skewes of histograms
• A skewed histogram is one with a long tail extending either to the right or left (positive skewed is to the right and negatively skewed is to the left)
Modalities of histograms
• A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks
• A modal class is the class with the largest number of observations
Bell shape
• A special type of symmetric unimodal histogram is one that is bell shaped
• Many statistical techniques require that the distribution of the population be bell-shaped
What is a frequency polygon?
A frequency polygon is obtained by plotting the frequency of each class above the midpoint of that class and then joining he points with a straight line.
What is a CDF?
The cumulative relative frequency of a particular class is the proportion of measurements that are less than the upper limit of that class.
To obtain the cdf of a class we add the cumulative frequency of that class with the frequencies of all previous/lower classes.
The graph of a cdf is the graph of the cumulative relative frequency distribution, it is obtained by
- Calculating the relative frequencies
- Calculating the cumulative relative frequencies
- Graph the relative cumulative frequencies