Lecture 13 - Feature Engineering Flashcards

Question 1

Q

List the 3 steps in machine learning.

Answer

A

1) Get labeled training data.
2) Convert your training data into n-dimentional vectors (feature selection)
3) Run the ML algorithm

Question 2

Q

In order to do supervised machine learning we must have what?

Answer

A

Labeled training data.

Question 3

Q

How can we get labeled training data?

Answer

A

Find a dataset that includes labels
Label it ourselves
Trick users into labeling it
Hire users to label it

Question 4

Q

Why is it difficult to label data by hand?

Answer

A

Assumes you have domain expertise
Slow
Time consuming
Expensive

Question 5

Q

Why is it difficult to trick users into labeling data by hand?

Answer

A

Takes time to collect

May take effort to create the system to record the desired behaviors.

Question 6

Q

In order to send data as input to ML algorithms we must do what?

Answer

A

We must convert it into a vector of numbers. For quantitative variables this is easy but not as easy for categorical variables.

Question 7

Q

What can we do to ordinal variables to convert them to vector numbers?

Answer

A

We can assign them to a sequence of intergers

Question 8

Q

Why can’t we just assign random numbers to nominal variables?

Answer

A

The algorithm would get confused since it assumes those distances are meaningful.

Question 9

Q

Describe on-hot encoding

Answer

A

It is the process of taking categorical values and using a binarizer to turn the categorical value into a vector number.

Question 10

Q

Describe bag of words and how it works?

Answer

A

It is an approach used to vectorize words.

Take all the words in the corpus and assign them a number
Make a new vector where each index means a word
Take all documents and map them to a vector and mark 1 for each word position.

Lecture 13 - Feature Engineering Flashcards

(10 cards)