Week 2 Flashcards
What is “Churn Rate”
refers to the percentage of customers who stop doing business with a company over a specific period. It is a key metric, especially for subscription-based businesses, as it indicates how well the company retains its customers.
How to caculate ‘Customer Churn’?
ChurnRate=
TotalCustomersattheBeginningofthePeriod/
NumberofCustomersLostinaPeriod
×100
What is “Customer Tenure”?
refers to the length of time (in months) a customer has been with the company. It is a measure of customer loyalty and can indicate how long a customer has remained subscribed to the company’s services.
Tenure is often used alongside metrics like “Churn” to assess customer retention and loyalty trends.
What are the common techniques used in handling missing data?
Identifying Missing Data
Removing Missing Data
Imputing Missing Data
What is ‘Identifying Missing Data’
detecting where data is absent or incomplete in a dataset.
What is ‘Removing Missing Data’
involves deleting rows or columns that contain missing values.
What is ‘Imputing Missing Data’
involves filling in the missing values with substituted values without deleting rows or columns.
What are the 2 Common Approaches to impute missing data?
Mean/Median/Mode Imputation and Forward/Backward Fill
What is ‘Mean/Median/Mode’ imputation?
replaces missing values with a central tendency value to ensure that the dataset can still be used for analysis or modeling. While easy to implement, these methods can distort the original distribution of the dataset and may introduce bias.
What is ‘Forward/Backward Fill’
involves filling missing values with the previous or next available values, respectively, based on the order of the data.
What is ‘Visual Inspection’ in Identifying Missing Data?
uses graphical representations to identify data gaps, patterns, and abnormalities.
What is ‘Data Summary Tables’ in Identifying Missing Data?
shows an overview of the missing values in the dataset.
What is ‘Box Plots’ in Identifying Missing Data?
summarizes the distribution of a variable and shows outliers. Still, if the data is missing, there may be visible gaps or unusual behavior in the plot.
What is ‘Heatmaps’ in Identifying Missing Data?
a missing data heatmap highlights where missing data exists in a dataset. It uses color to indicate missing vs. non-missing values across the entire dataset.
What is ‘Descriptive Statistics’ in Identifying Missing Data?
summarizes and describes the basic characteristics of a dataset.
What is ‘Null’ or ‘NaN’ Counts?
counts the number of missing values in each dataset’s column (or feature). In Python, missing data is either NaN (Not a Number) or null.
What is ‘Percentage of Missing Data’
provides a proportion of missing data relative to the total number of entries in a column.
What are the 3 ‘Patterns of Analysis of Missing Data’
Missing Completely at Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR)
What are the 3 types of Missing Data?
Missing Completely at Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR)
What is ‘Missing Completely at Random (MCAR)’
where the absence of data is unrelated to any other variables in the dataset. Removing this type of missing data generally won’t introduce bias.
This occurs when the reason for the absence of a value is entirely random and unrelated to any other variables in the datasets. For example, a survey respondent accidentally skips a question, resulting in a missing value in the dataset.
What is ‘Missing At Random (MAR)’
when the missingness is related to observed data, meaning that it is not random but can be explained by other variables.
the absence of data isn’t random and can be explained by other observed variables in the dataset. For example, in a health survey, individuals working night shifts may be less likely to respond to a survey conducted during daytime hours. The missingness of their responses is related to their work schedules, an observed variable, but not directly to their health status, which is the variable of interest.