Applied Economics & Statistics 1: An Introduction to Statistics, Measurement, and Presentation of Data* Flashcards
Explain why Statistics is important
- Statistics is not just used in economics.
- Required for many business subjects, social sciences, and hard
sciences. - Needed to process and analyse data, and data available increases
constantly. - Google tracks internet usage, supermarkets purchases, police crime
data… - Statistical techniques required to make sense of the data and enable
personal, business, and scientific decision making.
What’s a ‘statistic’?
A statistic is a number used to communicate a piece of information
The inflation rate is 5.2%.
The average mark on a module is 60%.
The price of a new Toyota Supra is £50,545.
What’s the name for ‘a number used to communicate a piece of information’?
A statistic
Define ‘statistics’
The science of collecting, organizing, presenting, analysing, and interpreting data to assist in making more effective decisions.
- How does this year’s rate of inflation compare with last year’s? Is
there a trend of increasing or decreasing inflation? Is there a
relationship between inflation and interest rates?
- How does the module mark vary compared to previous years, and
other modules? Does changing the lecturer teaching the module
affect average marks?
- The price of a new Toyota Supra is £50,545. The Mazda MX-5 is
cheaper, costing £25,825. What are the differences in the cars’ specs,
and how are they related to price? What other information would you
need for a purchase decision
What are the different types of statistics?
Descriptive and Inferential Statistics
What’s ‘Descriptive Statistics’?
- Methods of organizing, summarizing, and presenting data in an
informative way - Organize and summarize data with graphs and tables.
- Statistical measures describe the characteristics of a distribution
What’s the name for ‘methods of organizing, summarizing, and presenting data in an
informative way’?
Descriptive Statistics
Define ‘Inferential Statistics’
The methods used to estimate a property of a population on the basis of a
sample
What’s the name for ‘the methods used to estimate a property of a population on the basis of a
sample’?
Inferential Statistics
Define ‘population’ in statistics
The entire set of individuals / objects of interest or measurements
obtained from all individuals / objects of interest
Define ‘sample’ in statistics
A portion, or part, of the population of interest
Describe & explain the types of statistical variables
- Qualitative variable - The characteristic being studied is non-numeric. E.g.: gender, religion, eye
colour - Quantitative variable - The characteristic is numerical and the numbers have a meaning. E.g.
number of children in a family, hourly wage, minutes remaining in the
lecture.. It can be further divided into:
1. Discrete - These can assume only certain discrete values. There are usually “gaps” in
between the values. E.g. children in a family —this variable can only take
on a discrete set of values, 0, 1, 2, 3 etc.
2. Continuous - These can assume any value within a range and can be measured to any
required degree of precision. E.g.: weights, heights and time. The time it
takes for each student to finish an exam can be measured anywhere along
the real positive line, i.e. from 0 to infinity and could be measured to the
millisecond.
What are the levels of measurement?
Why are they different?
- Data can be further classified into 4 levels of measurement:
1. Nominal data.
2. Ordinal data.
3. Interval data.
4. Ratio data.
Each require different methods for summarizing and presenting, and a
different type of statistical analysis.
Describe & explain ‘nominal data’
- Nominal Data - Data represented as labels
or names. They have no order. They can only be classified and counted.
E.g., hair colour, religion, sexual orientation, gender. - No other mathematical operations permitted.
E.g., even if we assign numerical values like heterosexual=1, gay=2,
bisexual=3, it makes no sense to say 1+2=3, therefore
heterosexual+gay=bisexual. - Labels are mutually exclusive, e.g., can’t be both Christian and Muslim.
- Labels are exhaustive: every individual observation must belong to a
category (even if it’s ‘Other!’)
What’s the name for ‘data recorded at the nominal level of measurement represented as labels
or names; they have no order; they can only be classified and counted’?
Nominal Data
Describe & explain ‘ordinal data’
- Ordinal Data - Data based on a relative
ranking or rating of items based on a defined attribute or qualitative
variable. Variables based on this level of measurement are only ranked or
counted.
E.g., university league table position, educational attainment, level of
satisfaction with your lecturer. - Differences between data values are meaningless: university A being
above B in the leave table only says A is better than B, not by how much. - Can be compared and ranked as labels have relative values.
- Can also
find absolute and relative size of each label
What’s the name for ‘data recorded that’s based on a relative ranking or rating of items based on a defined attribute or qualitative
variable; variables based on this level of measurement are only ranked or
counted’?
Ordinal Data
Describe & explain ‘interval data’
- Data where the interval or the
distance between values is meaningful. The interval level of measurement is based on a scale with a known unit of measurement - Equal differences in the values are represented by equal differences in
the measurements. - Known units of measurement e.g. degrees Celcius, or shoe size. Difference between 10◦C and 15◦C is the same as that between 20◦C
and 25◦C. - Zero is a point on the scale, not the absence of a condition.
- Ratios don’t make sense with interval data: a size 28 dress is not
twice as large as a size 14
What’s the name for ‘data recorded where the interval or them distance between values is meaningful and based on a scale with a known unit of measurement’
Interval data
Describe & explain ‘ratio data’
- Data that’s based on a scale with
a known unit of measurement and a meaningful interpretation of zero on
the scale - Zero means an absence of the characteristic.
- Most quantitative data is recorded at this level.
E.g.: weight, age, number of family members, income, population,
investment, distance travelled, etc.
A pint takes twice as much beer as a half pint.
What’s the name for ‘Data recorded that are based on a scale with a known unit of measurement and a meaningful interpretation of zero on
the scale’?
Ratio data
State the ways data can be presented
- Frequency Distribution Table.
- Histogram, Frequency Polygon and Cumulative Frequency Distribution
- Bar Chart, Line Chart, Pie Chart, Scatter Plot
Describe a frequency distribution and what the point of it is
Include definition
- Suppose we had a set of data and the data are organised in tabular form. The table doesn’t give us much of an idea of how the sales are
distributed. - That’s because it only presents the raw data.
- A table that’s organized in some way would be much more useful.
- A way of achieving this is with a frequency distribution, or frequency
table - Frequency Distribution - Grouping of qualitative data into mutually exclusive and collectively
exhaustive classes showing the number of observations in each class.
What’s the name for ‘grouping of qualitative data into mutually exclusive and collectively
exhaustive classes showing the number of observations in each class’?
Frequency Distribution
State the key components of a frequency distribution
- Class Interval
- Class Midpoint
- Class Frequency
- Relative Frequency
- Cumulative Frequency
How do you obtain the ‘class interval’ in a frequency distribution
Obtained by subtracting the lower limit of a class
from the lower limit of the next class.
Which component of a frequency distribution is ‘obtained by subtracting the lower limit of a class
from the lower limit of the next class’?
Class Interval
What’s the ‘class midpoint’ in a frequency distribution
A point that divides a class into two equal parts; the average of the upper and lower class limits
What’s the ‘class frequency’ in a frequency distribution
The number of observations in each class
In a frequency distribution, what’s ‘a point that divides a class into two equal parts; the average of the upper and lower class limits’?
Class Midpoint
In a frequency distribution, what’s ‘the number of observations in each class’?
Class Frequency
Describe & explain how a frequency distribution is constructed
Step 1: Decide on the number of classes. A useful rule to determine the
number of classes (k) is the “2 to the k rule.” This is where you choose the smallest number for k such that 2^k > n, where n is the total
number of observations.
For example, let’s say n here is 30, so:
24 =16 < 30
25 =32 > 30
26 =64 > 30.
Hence, the smallest k that satisfies the condition is 5 and so 5 classes
should be enough
Step 2: Determine the class interval. Class intervals should be the same
for all classes. First calculate the range of the dataset, the difference
between the highest and lowest value. Then divide this by the number of classes which was obtained in step 1. If this new number is a decimal, round up.
Step 3: Set the individual class limits. The lower band of the first class needs to be lower than the lowest value in the dataset. Then it should be easy to know the class limits of each class from knowing the class interval from step 2.
Step 4: Determine the frequency for each class interval by categorising each data point into the appropriate class. There is some loss of detail from grouping the data like this. We can’t tell
what the lowest data point is, or which data point is which.
Step 5: Add a column for relative frequency (the fraction, or proportion, of
observations in each class) by dividing the frequency for that class by total frequency. This allows us to understand the relative size of each class.
Step 6: Add a column for cumulative frequency. These accumulate the frequencies as
we move up the class intervals, allowing us to quickly see important points about the data like the distribution
How is qualitative data usually presented?
Via a bar chart
What type of data do bar charts show?
Qualitative data
What’s a bar chart?
- A graph that shows qualitative classes on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars.
- Horizontal axis shows the variable of interest.
- The vertical axis shows the frequency of each possible outcome.
- There is a gap between the bars.
What’s the name for ‘a graph that shows qualitative classes on the horizontal axis and the classfrequencies on the vertical axis. The class frequencies are proportional tothe heights of the bars’?
Bar Chart
Describe pie charts’ use
Pie charts can also be useful to present qualitative data. They are especially useful when comparing relative frequencies.