Intro Flashcards
What infographic do you never use
Pie charts
Statistics
Statistics is the branch of mathematics that examines ways to process and analyse data. Statistics provides procedures to collect and transform data in ways that are useful to business decision makers. To understand anything about statistics, you first need to understand the meaning of a variable.
4 fundamental terms of statistics
Population
Sample
Parameter
Statistic
Population
A population consists of all the members of a group about which you want to
draw a conclusion.
Sample
A sample is the portion of the population selected for analysis
Parameter
A parameter is a numerical measure that describes a characteristic of a
population (measures used to describe a population) GREEK LETTERS REFER
TO A PARAMETER
Statistic
A statistic is a numerical measure that describes a characteristic of a sample
(measures calculated from sample data) ROMAN LETTERS REFER TO
STATISTICS
2 types of statistics
Descriptive statistics
Inferential statistics
Descriptive statistics
Collecting, summarising and presenting data
Inferential statistics
Drawing conclusions about a population based on sample
data/results (i.e. estimating a parameter based on a statistic
3 steps of descriptive statistics
Collect data
Present data
Characterise data
Collect data example
Survey
Present data example
Tables and graphs
Characterise data example
Sample mean
Steps of inferential statistics
Estimation
Hypothesis Testing
Estimation example
Estimate the population mean weight (parameter) using the
sample mean weight (statistic)
Hypothesis testing example
Test the claim that the population mean weight is 100 kilos
4 important sources when collecting data
Data distributed by organisation or individual
Designed experiment
Survey
Observational study
2 classifications of data sources
Primary
Secondary
2 types of data
Categorical (defined categories)
Numerical (quantitative)
2 types of numerical variables
Discrete (counted items)
Continuous (measured characteristics)
Categorical data
Simply classifies data into categories (e.g. marital status, hair
colour, gender)
Numerical discrete data e.g.
Counted items – finite number of items (e.g. number of
children, number of people who have type-O blood
Numerical continuous data e.g.
Measured characteristics – infinite number of items
e.g. weight, height
4 levels of Measurement and Measurement Scales from highest to lowest
Ratio data
Interval data
Ordinal data
Nominal data
Ratio data
Differences between measurements are meaningful and a true zero
exists
Interval data
Differences between measurements are meaningful but no true zero
exists
Ordinal data
Ordered categories (rankings, order or scaling)
Nominal data
Categories (no ordering or direction)
Ratio data eg
Height, weight, age, weekly food spending
Interval data eg
Temperature in degrees Celsius, standardised exam score
Ordinal data eg
Rankings in a tennis tournament, student letter grades, Likert
scales
Nominal data eg
Marital status, type of car owned, gender, hair colour
What data is charted and how is this done
Categorical data through the use of summary tables
What data is graphed and how is this done
Numerical data through the use of bar charts and pie charts
Ordered array
A sequence of data in rank order. Shows range, min to max. Provides some signals about variability within the range and may help identify outliers. If the data set is large or if the data is highly variable the ordered array is less useful.
Frequency distribution
A frequency distribution is a summary table in which data are arranged into numerically ordered classes or intervals. The number of observations in each ordered class or interval becomes the corresponding frequency of that class or interval.
Why use a frequency distribution
It is a way to summarise numerical data. It condenses the raw data into a more useful form. It allows for a quick visual
interpretation of the data and first inspection of the shape of the data.
Frequency distribution rules
Class boundaries must be mutually exclusive and classes must be collectively exhaustive. Essentially no class overlaps. Each data value belongs to only one class. Each class grouping has the same width. Usually at least 5 but no more than 15 groupings. Round up the interval width to get desirable endpoints
How is width of interval determined in a frequency distribution
range/number of desired class groupings
Histogram
A graph of the data in a frequency distribution is called a histogram. The class boundaries (or class midpoints) are shown on the horizontal axis. The vertical axis is either frequency, relative frequency, or percentage. Bars of the appropriate heights are used to represent the frequencies (number of observations) within each class or the relative frequencies (percentage) of that class.
Important note about histograms
No gaps between bars even though excel does
What allows you to compare two or more variables
Frequency polygon and ogives
Scatter diagrams
Scatter diagrams are used to examine possible relationships between two numerical variables In a scatter diagram: one variable is measured on the vertical axis (Y) and the other variable is measured on the horizontal axis (X).
Time series plot
A time-series plot is used to study patterns in the values of a
variable over time. In a time-series plot: one variable is measured on the vertical
axis and the time period is measured on the horizontal axis.
Stem and leaf display
A quick and simple way to see distribution details in a data set
Method: Separate the sorted data series into groups (the stem) and the values within each group (the leaves)
Tables and charts for numerical data
Photo 1
Stem and leaf display example
Photos 2-5
Frequency distribution example
Photos 6-10
Histogram example
Photo 11
Frequency polygon example
Photo 12
The ogive example
Photo 13
Scatter diagrams example
Photo14
Time series plot example
Photo 15
Variables
Variables are characteristics of items or individuals.
Data
Data are the observed values of variables.
Operational definition
Defines how a variable is to be measured.
Big Data
Large data sets characterised by their volume, velocity and variety.
Statistical packages
Computer programs designed to perform statistical analysis.
Primary sources
Provide information collected by the data analyser.
Secondary sources
Provide data collected by another person or organisation.
Focus group
An observational study. A group of people who are asked about attitudes and opinions for qualitative research.
Discrete variables
Can only take a finite or countable number of values.
Continuous variables
Can take any value between specified limits.
Problems for section 1.4 Chapter 1 review problems Problems for Section 2.1 Problems for Section 2.2 Problems for Section 2.3 Problems for Section 2.4 Problems for Section 2.5 Problems for Section 2.6 Chapter 2 review problems
Work through problems in textbook
Summary table
Summarises categorical or numerical data; gives the frequency, proportion or percentage of data values in each category or class.
Summary table examples
Photos 16-17
Bar chart
Graphical representation of a summary table for categorical data; the length of each bar represents the proportion, frequency or percentage of data values in a category.
Pie Chart
Graphical representation of a summary table for categorical data, each category represented by a slice of a circle of which the area represents the proportion or percentage share of the category relative to the total of all categories.
Class width (frequency distribution)
Distance between upper and lower boundaries of a class.
Range
Distance measure of variation; difference between maximum and minimum data values
Class boundaries (frequency distribution)
Upper and lower values used to define classes for numerical data.
Class midpoint
Centre of a class; representative value of class.
Relative Frequency Distributions and Percentage Distributions
A relative frequency distribution is obtained by dividing the frequency in each class by the total number of values. From this a percentage distribution can be obtained by multiplying each relative frequency by 100%.
Relative frequency distribution
Summary table for numerical data which gives the relative frequency of data values in each class.
Percentage distribution
Summary table for numerical data which gives the percentage of data values in each class.
Cumulative percentage distribution
Summary table for numerical data; gives the cumulative frequency of each successive class. A cumulative percentage distribution gives the percentage of values that are less than a certain value.
Percentage polygon
Graphical representation of a percentage distribution.
cumulative percentage polygon (ogive)
Graphical representation of a cumulative frequency distribution.
Chartjunk
Unnecessary information and detail that reduces the clarity of a graph.