week6 Flashcards

Question 1

Q

Support Vector Machine

Answer

A

Another Linear Classifier
SVMs use a boundary called a hyperplane to partition data into groups of similar values
vector space-based machine learning method aiming to find a decision boundary between two classes that are maximally far from any point in the training data
the goal of SMV is to create a flat boundary called a hyperplane which is a straight line that divides the space to create fairly homogenous partitions on either side
the SMV learning combines aspects of both the instance-based nearest neighbor learning (lazy learning) classification using nearest neighbors and linear regression

Question 2

Q

Application of SMV

Answer

A

classification of microarray gene expression data in the field of bioinformatics to identify cancer or other genetic diseases
text categorization such as identification of the language used in a document or the classification of documents by subject matter
the detection of rare yet important events like combustion engine failure, security breaches, or earthquakes

Question 3

Q

Collaborative Filtering Algorithm

Answer

A

The technique used by Recommender Systems
user-based filtering system
if users A and B have purchased similar items in the past, the recommender system would have recommended items purchased by user B and to user A
the similarity in behavior between two users is often computed by Cosine distance or Euclidean distance measure
The lesser the cosine angle, the higher the similarity in behavior between the two users

Question 4

Q

Content-Based Filtering Algorithm

Answer

A

another algorithm used by recommender systems
unlike collaborative filtering algorithm, content-based algorithms use features of the items such as genre, artist
is user A has been buying Harry Potter books, it is likely that user A may purchase another fantasy book ‘the hobbit’
one technique to computer similarity between two items is cosine similarity or Euclidean distance

Question 5

Q

Feature Engineering

Answer

A

process of extracting features from a raw dataset
a term coined to give due importance to the domain knowledge required to select sets of features for machine learning algortihms
as technology becomes more sophisticated, more datasets will be available
but do we need all the features/variables of the dataset?

Question 6

Q

Advantages of feature engineering

Answer

A

Question 7

Q