Marketing Analytics Flashcards
What is data?
A distinct piece of information
Two main data types
Quantitative data takes on numeric values that allow us to perform mathematical operations (like the number of dogs). Can be divided into continuous and discrete
Categorical is used to label a group or set of items (like dog breeds - Collies, Labs, Poodles, etc.). Can be divided into ordinal and nominal
Categorical Ordinal vs Categorical Nominal
Categorical Ordinal vs. Categorical Nominal
We can divide categorical data further into two types: Ordinal and Nominal.
Categorical Ordinal data take on a ranked ordering (like a ranked interaction on a scale from Very Poor to Very Good with the dogs).
Categorical Nominal data do not have an order or ranking (like the breeds of the dog).
Continuous Vs Discrete Data
Continuous data can be split into smaller and smaller units, and still a smaller unit exists. An example of this is the age of the dog - we can measure the units of the age in years, months, days, hours, seconds, but there are still smaller units that could be associated with the age.
Discrete data only takes on countable values. The number of dogs we interact with is an example of a discrete data type.
Four Aspects for quantitative Data
There are four main aspects to analyzing Quantitative data.
Measures of Center
Measures of Spread
The Shape of the data.
Outliers
Measures of Center
Measures of Center
There are three measures of center:
Mean
Median
Mode
Calculating the Mean
Sum of all values divided by the count of values
Median
it is the middle value of a data set, when the dataset has been ordered from smallest to largest
Mode
The Mode
The mode is the most frequently observed value in our dataset.
There might be multiple modes for a particular dataset, or no mode at all.
No Mode
If all observations in our dataset are observed with the same frequency, there is no mode. If we have the dataset:
1, 1, 2, 2, 3, 3, 4, 4
There is no mode, because all observations occur the same number of times.
Many Modes If two (or more) numbers share the maximum value, then there is more than one mode. If we have the dataset:
1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9
There are two modes 3 and 6, because these values share the maximum frequencies at 3 times, while all other values only appear once.
Notation
Notation is a common language used to communicate mathematical ideas
Random Variable
Random Variables
A random variable is a placeholder for the possible values of some process (mostly… the term ‘some process’ is a bit ambiguous). As was stated before, notation is useful in that it helps us take complex ideas and simplify (often to a single letter or single symbol). We see random variables represented by capital letters (X, Y, or Z are common ways to represent a random variable).
We might have the random variable X, which is a holder for the possible values of the amount of time someone spends on our site. Or the random variable Y, which is a holder for the possible values of whether or not an individual purchases a product.
X is ‘a holder’ of the values that could possibly occur for the amount of time spent on our website. Any number from 0 to infinity really.
x1
First observed value of the random variable X
Measures of Spread
Measures of Spread are used to provide us an idea of how spread out our data are from one another. Common measures of spread include:
Range
Interquartile Range (IQR)
Standard Deviation
Variance
Histograms
Histograms
Histograms are super useful to understanding the different aspects of quantitative data such as measures of spread
Calculating the 5 Number Summa
Calculating the 5 Number Summary
The five number summary consist of 5 values:
Minimum: The smallest number in the dataset.
Q 1: The value such that 25% of the data fall below.
Q 2: The value such that 50% of the data falls below.
Q3: The value such that 75% of the data fall below.
Maximum: The largest value in the dataset.
Essentially, each value is just the median of a bunch of values
Range
The range is then calculated as the difference between the maximum and the minimum.
IQR
The interquartile range is calculated as the difference between Q3 and Q1
.
In the upcoming sections, you will practice this with Katie and on your own.