Lectures 3- Tables and Charts Flashcards

1
Q

What is an experiment?

A

is any activity aimed at collecting data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we present data?

A

we can communicate the salient
features of the data to others with little or no
difficulty. Example-Let us assume for the moment that we collected data on total
household income from a random sample of 1000 households drawn from St. Lucia, St. Vincent, St. Kitts and Trinidad & Tobago. The 1000 values of household income so collected constitute a dataset. It is physically impossible to look at a dataset of size 1000 and
successfully identify the salient features of the dataset. Summarising the data is necessary for analysis of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Basic forms of a summary of a dataset

A

Summary Table and Frequency Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Other forms of summary

A

The Cumulative Frequency Table
–The Relative Frequency Table
–The Percentage Frequency Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Summary Table

A

provide users of the data with a summary
*bring together a ‘mass’ of connected information for digestion ‘at a glance’
*help both the researcher and the user to draw preliminary conclusions from the data
*can become cluttered very quickly and hence, confusing; be wary of truncated axes and changing the scales of either axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Frequency Table

A

are comprised of two columns
*the left column comprises a listing of the values assumed by the variable in the case of quantitative data or a listing of the categories/attributes in the case of qualitative data
*the listing of values in the quantitative case is presented either as grouped data or ungrouped data
*the right column displays the corresponding frequencies
*all the guidelines for table construction would apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Constructing the grouped frequency table

A

Divide the range of values into a finite number of equal sub-intervals otherwise called classes
*Classes must be defined so that no observation from the survey data could fall into more than one class
*Too many classes will give the table a cluttered appearance. Suggest that you use between 6 and 15 classes
*Class Limits – lower limit and upper limit of the class
*Class Mark – midpoint of the class
*Class Width – difference between two successive class marks
*Class Frequency – number of observations that fall into the class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Frequency table grouped data

A

Raw data for the recorded ages at diagnosis of 80 patients with stage 1 carcinoma of the cervix.
84 20 31 43 24 76 67 55 46 59 48 52 74 23 35 65 36 63 51 49 52 48 53 47 38 68 37 67 45 55 48 58 47 57 72 23 65 35 61 31 53 43 54 44 55 45 56 46 33 63 34 64 51 41 52 42 53 43 54 44 55 45 32 62 33 63 34 64 27 77 56 46 57 47 58 48 59 49 34 74
1.Scan the data for highest and lowest observations
2.Find the difference between the highest and lowest score to determine range of values which must be divided into class intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Class limits

A

Class limits, boundaries, width, midpoint
Age range (Years)
20 – 29
30 – 39
40 – 49
50 – 59
60 – 69
70 – 79
80 - 89
Consider the second interval 30 – 39 30 and 39 are referred to as the class limits. 30 is the lower limit of the second class interval. 39 is the upper limit of that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Class boundaries

A

Suppose an individual is 29 years and 10 months of age. In which interval would you record their age?
20 – 29
30 – 39
40 – 49
50 – 59
60 – 69
70 – 79
80 - 89
30 – 39 Now, suppose an individual is 29 years and 2 months of age. In which interval would you record their age?
20 - 29
This implies that the 30 – 39 interval records ages greater than and equal to 29.5 and up to ages less than 39.5
Let us again consider the second interval 30 – 39 The difference between the boundaries give the class width or class size Class size/width: UCB – LCB = 39.5 – 29.5 = 10
UCB = upper class boundary
LCB = lower class boundary
The values 29.5 and 39.5 are the class boundaries of the second interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Midpoint

A

What is the midpoint of the second interval 30 – 39?
We can determine the mid point by
30, 31, 32, 33, 34, 35, 36, 37, 38, 39
So midpoint = (34 + 35)/2 = 34.5
Even easier, the midpoint is equal to the sum of the limits divided by 2
Midpoint = (30+39)/2 = 34.5
15

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cumulative Frequency Table

A

A derivative of a Frequency Table
*Applicable to quantitative data only
*We can accumulate the values in the left column in any of four (4) ways
–less than
–less than or equal to
–greater than
–greater than or equal to
A corresponding accumulation of the frequencies in the right column must be done
*The result is a table comprised of two columns; the cumulative frequencies are displayed in the right column.
*Care should be taken to adjust the column headings
*Rules for table construction apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relative Frequency Table

A

derivative of a Frequency Table
*Applicable for both qualitative and quantitative data
*Relative Frequency is defined as Frequency divided by Total Frequency
*Convert all Frequencies in the right column into Relative Frequencies
*The result is a table comprised of two columns; the relative frequencies are displayed in the right column.
*Care should be taken to adjust the column headings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Charts and Graphs

A

Once we have constructed a Frequency Table,
we may seek to create a pictorial representation
of it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Types of Charts and Graphs

A

The Pie chart
*Histogram
*The Frequency Graph/Polygon
*The Ogive
*The Stem and Leaf Display
*The Box Plot
*The Scattergraph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pie Chart

A

Pictorial representation of a Frequency Table /Distribution
*Applicable to qualitative and quantitative data
*Consists of a ‘pie’ in 2-Dimension or 3-Dimension; subdivided into sectors or ‘slices’ – one for each line of the Frequency Table/Distribution
*The areas of the sectors or ‘slices’ are in the same proportion as the frequencies
*We simply want the angles of the sectors to be in the same ratio as the frequencies.
*Angle = Class Frequency x 360 Degrees
Total Frequency
*Avoid too many sectors or slices

17
Q

Bar Graph

A

Another pictorial representation of a Frequency Table/Distribution
*Very common format for presenting both qualitative and quantitative data
*Takes the form of a series of rectangular bars in either a vertical or horizontal configuration; one for each line of the frequency table
*The bars are of equal width or thickness
*The heights or lengths of the bars are in the same proportion as the frequencies
*Bar charts can show the actual frequencies whereas pie charts can only show relative frequencies.

18
Q

Histogram

A

Another pictorial representation of a frequency table/distribution
*Similar presentation to a bar chart
*Comprises a series of vertical or horizontal bars whose lengths/heights are in the same proportion as their frequencies
*Unlike the bar chart, however, the width of each bar must represent the class width
*The ends of the base of the bar represent the lower and upper class limit
*The midpoint of the base of each bar represents the class mark
*Unlike Bar Charts, Histograms do not have gaps in between successive bars.
*Plays a significant role in probability theory.

19
Q

Multiple Bar Graphs

A

Applicable when we wish to present the comparative frequencies of more than one variable/category/attribute over a finite number of time periods or circumstances
*Also applicable when there is a need to track the values of an aggregate and its component values over a finite number of time periods or circumstances

20
Q

Freuquency Polygon

A

A polygon is a multi-sided figure.
*If we proceed to join consecutive pairs of midpoints of the tops of the bars in a histogram by straight lines, the area below these straight lines assumes a shape that we refer to as a frequency polygon.
*The area under the frequency polygon approximates the area under the histogram.

21
Q

DotPlot

A

Special graphical presentation of the actual dataset from an experiment
*The range of the data is accommodated on the horizontal axis of the graph
*A ‘dot’ is inserted on the graph for each observation in the data set
*At a glance one can discern the lowest observation, the largest observation, the most frequently occurring observation

22
Q

Cumulative Frequency Chart/Curve (Ogive)

A

the graph derived from plotting the class marks/midpoint against their corresponding cumulative frequencies.

23
Q

Boxplot or Whisker Plot

A

It is a graphical display of five (5) key numbers in a dataset viz
–The minimum value
–The maximum value
–The first quartile
–The second quartile
–The third quartile
*The plot comprises a rectangular box (whose length equals the difference between the first and third quartiles) and two tails or whiskers (one from the minimum value to the first quartile and the other from the third quartile to the maximum value)
*The second quartile is the median and that is highlighted in the plot by a bar in the box.

24
Q

Quantiles

A

Any values obtained by subdividing a dataset.
–Quartiles divide a data set into 4 equal parts
–Deciles divide a data set into 10 equal parts
–Percentiles divide a data set into 100 equal parts.

25
Q

Quantiles Calculation

A

Data : 6 47 49 15 43 41 7 39 43 41 36
Ordered Data: 6 7 15 36 39 41 41 43 43 47 49
* Median Q2 = 41
* Upper quartile Q3 = 43
* Lower quartile Q1 = 15
Q3=3( n+1)th term by 4

Q 1=( n+1)th term by 4

26
Q

Scatterplot

A

Another graphical plot of the actual data in a dataset but applicable when each data point is an ordered pair
*It is a plot of two variables say X and Y on a cartesian plane
*The purpose of the plot is to give some preliminary idea as to the existence of a relationship between the two variables
Each data point is recorded on the plot by means of an ‘x’ or ‘
*The focus is on discerning a pattern of ‘scatter’. If the variables are not related , the points plotted on the graph will appear to be scattered fairly randomly over the Cartesian plane
*If there is significant cluster, then there may be a relationship between the variables

27
Q

Steam and Leaf display

A

Pay attention to
the size of the dataset N
*the size of the Leaf Unit
*the heading (as in all tables & charts)
The Stem-and Leaf Plot can be seen to be both
*a pictorial presentation
–an implicit bar chart
–an implicit histogram
*a tabular display
–an implicit frequency table
–an implicit cumulative frequency table
It provides additional information beyond the tables & charts mentioned
*It does not only use actual data; in fact, it is the only table/chart that allows the researcher to keep in touch with the actual data.

28
Q

Steam and Leaf Display Example

A

MTB > Stem-and-Leaf;
SUBC> By sex.
Stem-and-leaf of EC16A SEX = Male N = 44
Leaf Unit = 1.0
2 0 99
2 1
3 1 9
4 2 1
4 2
5 3 0
5 3
10 4 00233
12 4 79
17 5 01223
19 5 66
(8) 6 01112333
17 6 58
15 7 04
13 7 56778
8 8 011334
2 8 77

29
Q

More on Steam and Leaf Display

A

As the name suggests the Plot is characterized by Stems and Leaves.
*Each observation from the dataset is divided into two components:
–a Stem (leading digit)
–a Leaf (trailing digit)
*For Example:
the observation 57 will be represented by a stem 5 and a leaf 7The plot comprises three distinct columns, viz:
*the leftmost column (this provides the cumulative frequencies)
*the second column (this contains the stems)
*the third column (this contains the leaves)
Each row has one and only one stem but may have several leaves. By linking the stem with the leaves in the row we can reconstruct all observations from the dataset that fall in that row. e.g. The observations imbedded in Row 8 are 40, 40, 42, 43 and 43 The sole observation imbedded in Row 4 is 21

30
Q

Lastly on Steam and Leaf Display

A

Thus we can begin to see the Plot as a Frequency Table. Each row is essentially a class. Where then are the frequencies?
*The frequencies are located in the leftmost column. e.g. in row #8 we encounter the number 10 in the leftmost column. What does that represent? It is the cumulative frequency of rows #1 thru #8. Check the number of leaves from the top row to be sure.The frequencies are accumulated in two directions
*from the top row
*from the last row
*One of the entries in the leftmost column has brackets around it. This is the row/class in which we can locate the observation that is at the center of the dataset (i.e. when the dataset is listed in ascending order). Such an observation is called the median and the row/class is called the median class. The number in brackets is the frequency of that median class; it is not a cumulative frequency.
*From the Frequency Table, we can construct the Cumulative Frequency Table imbedded in the plot

31
Q

Steam and Leaf Display-Plot

A

*What about the Charts imbedded in the plot?
*Simply rotate the Plot through 90o and you would recognise that the leaves constitute a Simple Bar Chart
*With little additional effort we can convert that bar chart to a Histogram for the dataset. (Replace the entries in the left column by the corresponding class marks.)
The Plot allows us to directly identify the following
information:
*The smallest observation in the dataset
*The largest observation in the dataset
*The most frequently recurring observation in the dataset (otherwise called the mode)
*The observation which lies at the middle of the dataset (otherwise called the median)

32
Q

Continuation of Steam and Leaf Display-Plot

A

The Plot allows us to indirectly identify the following
information:
*The first quartile of the dataset
*The third quartile of the dataset
*Any other percentile of the dataset
*Recall that if we have the smallest observation, the largest observation, the first quartile, the median and the third quartile we can construct the box plot.

33
Q
A