A-kassen Flashcards

1
Q

What is a label?

A

What we want to predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the label in the paper?

A

Churn/loyal members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why did you use Jupiter?

A

Jupiter is like a notebook that comprise different code languages and python code is one of the languages. This is a powerful tool that enabled us to do the data mining through specify, program our own process steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What was the main purpose of the data preparation?

A

To take a critical look on the dataset. Make it so clean and small as possible because this gives more prediction power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the first step in the data preparation?

A

Exploration phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the exploration phase entail?

A

Three-step process. Missing data, Data types, Outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why was it important to make predictions about customers intentions or tendencies to churn?

A
  1. Enable efficiency in regards to planning of retaining customers - for example through specific promotional activities
  2. Comparison between predictions and actual results allows to spot meaningful indicators, useful for improving performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is predictive modelling and data mining useful for companies?

A

Instruments for improving the decision making process within companies.
The more accurate and timely the knowledge is, the increases the likeliness for the company to improve their business performance and industry position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do companies use machine learning algorithms and big data?

A

To reduce uncertainty of for example unforeseen market changes or customers behaviour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What did you do in your paper?

A

We analysed customer data from Akademikernes A-kasse in an effort to create a predictive model of which attributes are applicable when customers chose to churn. However the model we created does not predict how likely the customer is to churn but gives a binary churn/no churn result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you mean about a binary churn/no churn result?

A

It means that we classified the two elements of a given set, churn/no churn into two on the basis of a classification rule. And we classified them on the basis of attribute selection in the predictive model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What could be the next step for Akademikernes A-kasse?

A

To use regression to calculate the likelihood of churn within the predicted group of churners we have identified. A heat map for example, could visualise which customers and what attributes where more likely to churn in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why did you chose to use the CRISP model in your paper?

A

We decided to use it as a guiding principle for the paper, as we saw correlations of what the model present and what we wanted to do in our paper. However we were aware that the model is an exploratory tool that emphasise approaches and strategy rather than software design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the six stages in the data mining process that the CRISP model defines?

A

Business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the business understanding stage present?

A

Concern the practical business goals that the organisation wants to achieve. This goal were converted into a problem that the data mining seek to solve. AK seek to uncover the reasoning behind the churn of their members and in turn ultimately reduce the number of churners.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Could you mention some critical perceptions of your paper?

A

To begin with, we thought we would use decision trees as they are high quality models generate simple rules and make it easy to understand the impact.
Decision trees will be very complex and big if there are many attributes so we were really harsh in the attribute selection as we wanted few attributes. If we did not have this initial plan the number of attributes might be higher .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can AK use your results?

A

Knowledge of the customers. Attributes that are linked previous churners, and therefore these tendencies can be used to spot current members that are linked to the same attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is data mining?

A

A process used for discovering meaningful relationships, trends, and patterns between large amounts of data collected in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is clustering?

A

Clustering is gathering groups of people with common attributes in a K number of clusters. Clustering (also called unsupervised learning) is the process of dividing a dataset into groups such that the members of each group are as similar (close) as possible to one another, and different groups are as dissimilar (far) as possible from one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can clustering create value for AK?

A

Goal is to group together similar instances using some metric of similarity - so create groupings where the members of a given group are similar to each other. For example group similar customers together and design different campaigns.
It is light classification but the groupings are not predefined.
Could find a way to group similar customers together. May or may not relate to the churn question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What can you do to present results from data mining as informative as possible?

A

You can sacrifice details - subjective decision.
Switching from ROC (receiver operating characteristics) with AUC (area under curve) and lift curves/cumulative response curve.

ROC curves are not the most intuitive visualisation for many business stakeholders who really ought to understand the results. One of the most common examples of the use of an alternate visualisation is the use of the “cumulative response curve, which is more intuitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are lift curves?

A

Visualisation framework that might not have all of the nice properties of ROC curves, but are more intuitive. So, conceptually as we move down the list of instances ranked by the model, we target increasingly larger proportions of all the instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is your data affected by selection bias?

A

Only access to Akademikernes A-kasse dataset, and not other A-kasse organisations. Therefore the dataset does only comprise information about AK´s members and is not a representable representation of the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What makes machine learning algorithms supervised?

A

We know the target class, and more specifically what we are looking at. The opposite is unsupervised which does not provide a purpose or target information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why did you chose to solely look at supervised machine learning algorithms?

A

We knew the target class, and what we wanted to look at which is churner/no churner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you measure the performance of the classification?

A

Through classification accuracy. However accuracy does not capture what is important for the problem at hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Why did you chose classification?

A

As we had a label, classification was appropriate as classification involves selecting which, out of a set of labels should be assigned to some data, according to some spot meaningful indicators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is binary classification?

A

It means that we are working with two options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why did you disregard some attributes?

A

Because we found some of them to be irrelevant to predictions related to the target “will churn” and “will not churn”. We observed distinctiveness degree in which the attributes we have chosen affect the target in the attribute selection section of the paper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Why is it not enough to look at the accuracy?

A

Because a dataset can be imbalances, which was the case for our dataset, and in this case the accuracy can be misleading. In our example we had 71,5% non-churners.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What did you do to cope with the accuracy “problem”?

A

We decided to use stratified cross validation in order to evaluate how generalisable our model was.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is stratified cross validation?

A

Stratified cross validation is a way of splitting or portion the entire dataset into “k” bin of equal size to get a fair share of data points in both training data and test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why is it efficient to use stratified cross validation?

A

Utilise the dataset more. By using stratified cross validation, you will get more out of the training and test data, that is, the best validation and learning results as possible.
If we were to only use training data (hold out data), we would only train on parts of the dataset and leave the rest behind for test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Why is your dataset imbalanced?

A

Because the class churn_loyal has a high concentration 71,5% - and as the concentration is not equal it can be argued that it is imbalanced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What do you need to take into consideration when your dataset is imbalance?

A

It is a very normal thing - but we knew that we need to be aware that the accuracy could be misleading and therefore we chose not to focus much upon the accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What did you do when you were checking data types?

A

We ensured that the machine learning algorithms were able to get the right data types.

37
Q

Why did we look for outliers?

A

To make the dataset as clean as possible, and then we could either remove or adjust these.

38
Q

Why was your dataset poor?

A

We spotted a lot of duplicates, null-values, inconsistent casing, and danish special characters.

39
Q

Why did we look for outliers?

A

To make the dataset as clean as possible, and then we could either remove or adjust these.

40
Q

How did you drop values?

A

Defined a rule to decide wether and attribute brings value into predicting churn/no churn.

41
Q

How did you drop values?

A

Used coding to create som rules to decide wether and attribute brings value into predicting churn/no churn.

For example we cleaned some of the specific instances, and categorised education names and commune into broader groups.

42
Q

What is an example of an unique value?

A

Education name which has values as for example Ha. IT and Ha.it.

43
Q

Why did we have to clean each attribute in some cases?

A

Because the machine cannot distinguish between for example Ha.IT and Ha.it. After doing this unique values were dropped from 841 to 796.

44
Q

Explain how you handled registration date and churn month.

A

It was not possible to use churn month because it was null for most/all members in the dataset, since they have not churned - and it is not possible to make calculating time range of null values.

We converted the registration date and churn month into a new attribute: days as member. And we could calculate how many days the instance had been a member.

The data is only representable within the given timeframe but this is easy to change. Recommend to change every day - so AK has an accurate overview.

45
Q

How did you handle outliers?

A

Introduced a new attribute - years_correct.
age*365-days as member.
So we calculated the critical range, and if a persons number was below 6570 it meant that they had been member before turning 18 or the data was not correct.

46
Q

What did the general clean up include?

A
  1. Map danish names to machine computer friendly names
  2. Fill null values with meaningful values such as other etc
  3. Drop low quality instances based on our rule based attribute selection
47
Q

Which five evaluation metrics did you use?

A

Accuracy, variance and standard deviation, confusion matrix metrics, ROC/AUC, base rate

48
Q

What is accuracy?

A

A measure of how many instances the model has correctly guessed in total.

49
Q

What is accuracy paradox?

A

Often we can see this in dataset that are skewed, which is the case for us. There is accuracy paradox when models with lower accuracy are preferred because it is important, on the basis of our business understanding, that who can be churners on the basis of attributes.

50
Q

What is variance and standard deviation?

A

Tells us how much we can expect our model to vary when adding new datasets. If we see a big variance between each fold we could think that the dataset is not generalisable which is preferred if using the same model on another dataset.

51
Q

What is confusion matrix?

A

Gives a quick overview of results. Many metrics can be calculated on behalf of the matrix and one of them that we used is TPR or recall.

52
Q

What is confusion matrix?

A

Gives a quick overview of results. Many metrics can be calculated on behalf of the matrix for example FPR and TPR.

53
Q

Why is base rate used?

A

For model performance. To set a baseline for the performance of the model.

54
Q

Why is a naive classifier used?

A

To get a median over the population

55
Q

How do you limit the amount of overfitting?

A

We used stratified cross validation and fold 5 times so

56
Q

How do you limit the amount of overfitting?

A

We used stratified cross validation and fold 5 times so we could train each fold and measure the mean and variance, which can be used to estimate the expectation upon future datasets.

57
Q

Why did we chose 5 folds instead of 10?

A

x

58
Q

Why did we want to be above the random graph which is the diagonal line from 0,0 to 1,1?

A

If it is below the line the model are not performing well and anything avove is an advantage from being random.

59
Q

What does 0,0 and 1,1 represent?

A

0,0 means that you never classify an instance as positive. 1,1 means that you classify all as positive.

60
Q

Why is feature engineering used?

A

Feature engineering can lead to increased performance by refining the dataset with knowledge about the attributes. An example is days as member where we creates a new feature since we saw bigger potential this way. So instead of introducing models that are more complex we refined our dataset and improve performance in that way.

61
Q

What is domain knowledge?

A

x

62
Q

Why is feature scaling used?

A

To limit the influence from numeric features with broader ranges. As for example days as member had a very high range this could affect other numeric features. Therefore we used feature scaling to secure that our attributes contribute approximately proportionally.

63
Q

What is a feature scaling method?

A

Standardscaler which standardized the numeric features

64
Q

What is standardization?

A

Is done in the dataset to reduce variance in order to not have an imbalance in the estimator influence and dominate the function

65
Q

What is information gain attribute evaluation?

A

How much gain each attribute provides to the label. The closer to 1 the more information the attribute provides.

66
Q

Why did we run information gain an correlation threshold in comparison?

A

To choose which attributes are most important in regards to the label we wanted to predict. We defined a rule that would exclude the attributes that did not meet a high enough contribution.

67
Q

What is correlation attribute evaluation?

A

The correlation coefficient spans from -1 to 1 whereas the coefficient is denoting how correlated a feature is to the label. If the coefficient states 0 the feature is no way correlating to the label meaning no prediction power.

68
Q

Why is stratified k-fold and why is it useful?

A

Very similar to cross validation, but SFK ensures the class is approximately the same size in each fold. Each of the folds will be tested and trained k number of times.

69
Q

What are the two categories models can be divided into?

A

Parametric and nonparametric

70
Q

Why did you use both parametric and nonparametric models?

A

In order to utilise a broader toolset of classifiers on our dataset, it made sense to include methods from both categories.

71
Q

You tested Linear SVC, SVC Poly kernel, SVC RBF kernel and SVC Linear kernel. Why did you use Linear SVC?

A

The accuracy, TPR and TNR are identical so we followed the advice from scikit learner and used only Linear SVC

72
Q

Why did we use KNN?

A

To explore where our instances fit in regards to our target value.

73
Q

The goal with KNN is to find neighbours with the shortest distance. How can the distance be calculated?

A

With euclidean distance

74
Q

Why did we decide K nearest neighbour with k = 4?

A

Based on error rate. But how? IN regards to recall we chose 5? (67%)

75
Q

Why did you chose to work with KNN instead of Linear SVC?

A

Because Linear SVC predicted only 14% TPR and KNN predicted 69%

76
Q

What have you achieved through data mining efforts?

A

Determined which attributes are contributing factor when a member choose to churn.

77
Q

How can our results be implemented in the business?

A

Our model can look at any given customer in AK and predict with 87% certainty wether the customer will churn or not.

78
Q

What are our recommendations to AK?

A

Run the model agains customer database every day to identify groups of customer who will stay members but also identify a group that are likely to churn. With this AK can engage in several efforts to retain customers.

79
Q

Does the model predict how likely the customer is to churn?

A

No. It only gives a binary churn/no churn result.

80
Q

What did we state in our integrim progress report?

A

x

81
Q

What did we state in our workplan?

A

x

82
Q

What was the three main thing you did in your paper?

A
  • We defined the data set
  • We acquired and prepared the data
  • We used data mining techniques to extract value from data
83
Q

Why did you use cross validation? And what type did you use?

A

On average is the best way to evaluate how generalizable our model was. It is always working on held-out data (training data). We used stratified cross validation

84
Q

Why did you use stratified k-fold cross validation?

A

Stratified cross validation is a way of splitting or portion the entire dataset into “k” bin of equal size in order to get a fair share of data points in both the training data and test data

85
Q

What did we think was challenging? Were we too ambitious?

A

Large dataset. Time consuming to clean.

86
Q

Did you use any other approaches to big data that were not covered explicitly in the curriculum?

A

Python - ten line python script to transform dates, understand something, make the instances more clean and therefore more intuitive and usable

87
Q

What is Linux?

A

Basis for serious data science

88
Q

Why didnt you use the results from the training data?

A

Because the important results are on the test data because then you have trained the model and ….