A-kassen Flashcards

Question 1

Q

What is a label?

Answer

A

What we want to predict.

Question 2

Q

What is the label in the paper?

Answer

A

Churn/loyal members.

Question 3

Q

Why did you use Jupiter?

Answer

A

Jupiter is like a notebook that comprise different code languages and python code is one of the languages. This is a powerful tool that enabled us to do the data mining through specify, program our own process steps.

Question 4

Q

What was the main purpose of the data preparation?

Answer

A

To take a critical look on the dataset. Make it so clean and small as possible because this gives more prediction power.

Question 5

Q

What is the first step in the data preparation?

Answer

A

Exploration phase

Question 6

Q

What does the exploration phase entail?

Answer

A

Three-step process. Missing data, Data types, Outliers

Question 7

Q

Why was it important to make predictions about customers intentions or tendencies to churn?

Answer

A

Enable efficiency in regards to planning of retaining customers - for example through specific promotional activities
Comparison between predictions and actual results allows to spot meaningful indicators, useful for improving performance

Question 8

Q

How is predictive modelling and data mining useful for companies?

Answer

A

Instruments for improving the decision making process within companies.
The more accurate and timely the knowledge is, the increases the likeliness for the company to improve their business performance and industry position.

Question 9

Q

Why do companies use machine learning algorithms and big data?

Answer

A

To reduce uncertainty of for example unforeseen market changes or customers behaviour.

Question 10

Q

What did you do in your paper?

Answer

A

We analysed customer data from Akademikernes A-kasse in an effort to create a predictive model of which attributes are applicable when customers chose to churn. However the model we created does not predict how likely the customer is to churn but gives a binary churn/no churn result.

Question 11

Q

What do you mean about a binary churn/no churn result?

Answer

A

It means that we classified the two elements of a given set, churn/no churn into two on the basis of a classification rule. And we classified them on the basis of attribute selection in the predictive model.

Question 12

Q

What could be the next step for Akademikernes A-kasse?

Answer

A

To use regression to calculate the likelihood of churn within the predicted group of churners we have identified. A heat map for example, could visualise which customers and what attributes where more likely to churn in the future.

Question 13

Q

Why did you chose to use the CRISP model in your paper?

Answer

A

We decided to use it as a guiding principle for the paper, as we saw correlations of what the model present and what we wanted to do in our paper. However we were aware that the model is an exploratory tool that emphasise approaches and strategy rather than software design.

Question 14

Q

What are the six stages in the data mining process that the CRISP model defines?

Answer

A

Business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Question 15

Q

What does the business understanding stage present?

Answer

A

Concern the practical business goals that the organisation wants to achieve. This goal were converted into a problem that the data mining seek to solve. AK seek to uncover the reasoning behind the churn of their members and in turn ultimately reduce the number of churners.

Question 16

Q

Could you mention some critical perceptions of your paper?

Answer

A

To begin with, we thought we would use decision trees as they are high quality models generate simple rules and make it easy to understand the impact.
Decision trees will be very complex and big if there are many attributes so we were really harsh in the attribute selection as we wanted few attributes. If we did not have this initial plan the number of attributes might be higher .

Question 17

Q

How can AK use your results?

Answer

A

Knowledge of the customers. Attributes that are linked previous churners, and therefore these tendencies can be used to spot current members that are linked to the same attributes.

Question 18

Q

What is data mining?

Answer

A

A process used for discovering meaningful relationships, trends, and patterns between large amounts of data collected in a dataset.

Question 19

Q

What is clustering?

Answer

A

Clustering is gathering groups of people with common attributes in a K number of clusters. Clustering (also called unsupervised learning) is the process of dividing a dataset into groups such that the members of each group are as similar (close) as possible to one another, and different groups are as dissimilar (far) as possible from one another.

Question 20

Q

How can clustering create value for AK?

Answer

A

Goal is to group together similar instances using some metric of similarity - so create groupings where the members of a given group are similar to each other. For example group similar customers together and design different campaigns.
It is light classification but the groupings are not predefined.
Could find a way to group similar customers together. May or may not relate to the churn question.

Question 21

Q

What can you do to present results from data mining as informative as possible?

Answer

A

You can sacrifice details - subjective decision.
Switching from ROC (receiver operating characteristics) with AUC (area under curve) and lift curves/cumulative response curve.

ROC curves are not the most intuitive visualisation for many business stakeholders who really ought to understand the results. One of the most common examples of the use of an alternate visualisation is the use of the “cumulative response curve, which is more intuitive

Question 22

Q

What are lift curves?

Answer

A

Visualisation framework that might not have all of the nice properties of ROC curves, but are more intuitive. So, conceptually as we move down the list of instances ranked by the model, we target increasingly larger proportions of all the instances.

Question 23

Q

Why is your data affected by selection bias?

Answer

A

Only access to Akademikernes A-kasse dataset, and not other A-kasse organisations. Therefore the dataset does only comprise information about AK´s members and is not a representable representation of the entire population.

Question 24

Q

What makes machine learning algorithms supervised?

Answer

A

We know the target class, and more specifically what we are looking at. The opposite is unsupervised which does not provide a purpose or target information.

Question 25

Q

Why did you chose to solely look at supervised machine learning algorithms?

Answer

A

We knew the target class, and what we wanted to look at which is churner/no churner

Question 26

Q

How do you measure the performance of the classification?

Answer

A

Through classification accuracy. However accuracy does not capture what is important for the problem at hand.

Question 27

Q

Why did you chose classification?

Answer

A

As we had a label, classification was appropriate as classification involves selecting which, out of a set of labels should be assigned to some data, according to some spot meaningful indicators.

Question 28

Q

What is binary classification?

Answer

A

It means that we are working with two options.

Question 29

Q

Why did you disregard some attributes?

Answer

A

Because we found some of them to be irrelevant to predictions related to the target “will churn” and “will not churn”. We observed distinctiveness degree in which the attributes we have chosen affect the target in the attribute selection section of the paper.

Question 30

Q

Why is it not enough to look at the accuracy?

Answer

A

Because a dataset can be imbalances, which was the case for our dataset, and in this case the accuracy can be misleading. In our example we had 71,5% non-churners.

Question 31

Q

What did you do to cope with the accuracy “problem”?

Answer

A

We decided to use stratified cross validation in order to evaluate how generalisable our model was.

Question 32

Q

What is stratified cross validation?

Answer

A

Stratified cross validation is a way of splitting or portion the entire dataset into “k” bin of equal size to get a fair share of data points in both training data and test data.

Question 33

Q

Why is it efficient to use stratified cross validation?

Answer

A

Utilise the dataset more. By using stratified cross validation, you will get more out of the training and test data, that is, the best validation and learning results as possible.
If we were to only use training data (hold out data), we would only train on parts of the dataset and leave the rest behind for test.

Question 34

Q

Why is your dataset imbalanced?

Answer

A

Because the class churn_loyal has a high concentration 71,5% - and as the concentration is not equal it can be argued that it is imbalanced

Question 35

Q

What do you need to take into consideration when your dataset is imbalance?

Answer

A

It is a very normal thing - but we knew that we need to be aware that the accuracy could be misleading and therefore we chose not to focus much upon the accuracy.

Question 36

Q

What did you do when you were checking data types?

Answer

A

We ensured that the machine learning algorithms were able to get the right data types.

Question 37

Q

Why did we look for outliers?

Answer

A

To make the dataset as clean as possible, and then we could either remove or adjust these.

Question 38

Q

Why was your dataset poor?

Answer

A

We spotted a lot of duplicates, null-values, inconsistent casing, and danish special characters.

Question 39

Q

Why did we look for outliers?

Answer

A

To make the dataset as clean as possible, and then we could either remove or adjust these.

Question 40

Q

How did you drop values?

Answer

A

Defined a rule to decide wether and attribute brings value into predicting churn/no churn.

Question 41

Q

How did you drop values?

Answer

A

Used coding to create som rules to decide wether and attribute brings value into predicting churn/no churn.

For example we cleaned some of the specific instances, and categorised education names and commune into broader groups.

Question 42

Q

What is an example of an unique value?

Answer

A

Education name which has values as for example Ha. IT and Ha.it.

Question 43

Q

Why did we have to clean each attribute in some cases?

Answer

A

Because the machine cannot distinguish between for example Ha.IT and Ha.it. After doing this unique values were dropped from 841 to 796.

Question 44

Q

Explain how you handled registration date and churn month.

Answer

A

It was not possible to use churn month because it was null for most/all members in the dataset, since they have not churned - and it is not possible to make calculating time range of null values.

We converted the registration date and churn month into a new attribute: days as member. And we could calculate how many days the instance had been a member.

The data is only representable within the given timeframe but this is easy to change. Recommend to change every day - so AK has an accurate overview.

Question 45

Q

How did you handle outliers?

Answer

A

Introduced a new attribute - years_correct.
age*365-days as member.
So we calculated the critical range, and if a persons number was below 6570 it meant that they had been member before turning 18 or the data was not correct.

Question 46

Q

What did the general clean up include?

Answer

A

Map danish names to machine computer friendly names
Fill null values with meaningful values such as other etc
Drop low quality instances based on our rule based attribute selection

Question 47

Q

Which five evaluation metrics did you use?

Answer

A

Accuracy, variance and standard deviation, confusion matrix metrics, ROC/AUC, base rate

Question 48

Q

What is accuracy?

Answer

A

A measure of how many instances the model has correctly guessed in total.

Question 49

Q

What is accuracy paradox?

Answer

A

Often we can see this in dataset that are skewed, which is the case for us. There is accuracy paradox when models with lower accuracy are preferred because it is important, on the basis of our business understanding, that who can be churners on the basis of attributes.

Question 50

Q

What is variance and standard deviation?

Answer

A

Tells us how much we can expect our model to vary when adding new datasets. If we see a big variance between each fold we could think that the dataset is not generalisable which is preferred if using the same model on another dataset.

Question 51

Q

What is confusion matrix?

Answer

A

Gives a quick overview of results. Many metrics can be calculated on behalf of the matrix and one of them that we used is TPR or recall.

Question 52

Q

What is confusion matrix?

Answer

A

Gives a quick overview of results. Many metrics can be calculated on behalf of the matrix for example FPR and TPR.

Question 53

Q

Why is base rate used?

Answer

A

For model performance. To set a baseline for the performance of the model.

Question 54

Q

Why is a naive classifier used?

Answer

A

To get a median over the population

Question 55

Q

How do you limit the amount of overfitting?

Answer

A

We used stratified cross validation and fold 5 times so

Question 56

Q

How do you limit the amount of overfitting?

Answer

A

We used stratified cross validation and fold 5 times so we could train each fold and measure the mean and variance, which can be used to estimate the expectation upon future datasets.

Question 57

Q

Why did we chose 5 folds instead of 10?

Question 58

Q

Why did we want to be above the random graph which is the diagonal line from 0,0 to 1,1?

Answer

A

If it is below the line the model are not performing well and anything avove is an advantage from being random.

Question 59

Q

What does 0,0 and 1,1 represent?

Answer

A

0,0 means that you never classify an instance as positive. 1,1 means that you classify all as positive.

Question 60

Q

Why is feature engineering used?

Answer

A

Feature engineering can lead to increased performance by refining the dataset with knowledge about the attributes. An example is days as member where we creates a new feature since we saw bigger potential this way. So instead of introducing models that are more complex we refined our dataset and improve performance in that way.

Question 61

Q

What is domain knowledge?

Question 62

Q

Why is feature scaling used?

Answer

A

To limit the influence from numeric features with broader ranges. As for example days as member had a very high range this could affect other numeric features. Therefore we used feature scaling to secure that our attributes contribute approximately proportionally.

Question 63

Q

What is a feature scaling method?

Answer

A

Standardscaler which standardized the numeric features

Question 64

Q

What is standardization?

Answer

A

Is done in the dataset to reduce variance in order to not have an imbalance in the estimator influence and dominate the function

Answer 63

A

How much gain each attribute provides to the label. The closer to 1 the more information the attribute provides.

Answer 64

A

To choose which attributes are most important in regards to the label we wanted to predict. We defined a rule that would exclude the attributes that did not meet a high enough contribution.

Answer 65

A

The correlation coefficient spans from -1 to 1 whereas the coefficient is denoting how correlated a feature is to the label. If the coefficient states 0 the feature is no way correlating to the label meaning no prediction power.

Answer 66

A

Very similar to cross validation, but SFK ensures the class is approximately the same size in each fold. Each of the folds will be tested and trained k number of times.

Answer 67

A

Parametric and nonparametric

Answer 68

A

In order to utilise a broader toolset of classifiers on our dataset, it made sense to include methods from both categories.

Answer 69

A

The accuracy, TPR and TNR are identical so we followed the advice from scikit learner and used only Linear SVC

Answer 70

A

To explore where our instances fit in regards to our target value.

Answer 71

A

With euclidean distance

Answer 72

A

Based on error rate. But how? IN regards to recall we chose 5? (67%)

Answer 73

A

Because Linear SVC predicted only 14% TPR and KNN predicted 69%

Answer 74

A

Determined which attributes are contributing factor when a member choose to churn.

Answer 75

A

Our model can look at any given customer in AK and predict with 87% certainty wether the customer will churn or not.

Answer 76

A

Run the model agains customer database every day to identify groups of customer who will stay members but also identify a group that are likely to churn. With this AK can engage in several efforts to retain customers.

Answer 77

A

No. It only gives a binary churn/no churn result.

Answer 78

A

We defined the data set
We acquired and prepared the data
We used data mining techniques to extract value from data

Answer 79

A

On average is the best way to evaluate how generalizable our model was. It is always working on held-out data (training data). We used stratified cross validation

Answer 80

A

Stratified cross validation is a way of splitting or portion the entire dataset into “k” bin of equal size in order to get a fair share of data points in both the training data and test data

Answer 81

A

Large dataset. Time consuming to clean.

Answer 82

A

Python - ten line python script to transform dates, understand something, make the instances more clean and therefore more intuitive and usable

Answer 83

A

Basis for serious data science

Answer 84

A

Because the important results are on the test data because then you have trained the model and ….

Brainscape's Knowledge GenomeTM

A-kassen Flashcards

Brainscape's Knowledge Genome^TM