week6 Flashcards
1
Q
Support Vector Machine
A
- Another Linear Classifier
- SVMs use a boundary called a hyperplane to partition data into groups of similar values
- vector space-based machine learning method aiming to find a decision boundary between two classes that are maximally far from any point in the training data
- the goal of SMV is to create a flat boundary called a hyperplane which is a straight line that divides the space to create fairly homogenous partitions on either side
- the SMV learning combines aspects of both the instance-based nearest neighbor learning (lazy learning) classification using nearest neighbors and linear regression
2
Q
Application of SMV
A
- classification of microarray gene expression data in the field of bioinformatics to identify cancer or other genetic diseases
- text categorization such as identification of the language used in a document or the classification of documents by subject matter
- the detection of rare yet important events like combustion engine failure, security breaches, or earthquakes
3
Q
Collaborative Filtering Algorithm
A
- The technique used by Recommender Systems
- user-based filtering system
- if users A and B have purchased similar items in the past, the recommender system would have recommended items purchased by user B and to user A
- the similarity in behavior between two users is often computed by Cosine distance or Euclidean distance measure
- The lesser the cosine angle, the higher the similarity in behavior between the two users
4
Q
Content-Based Filtering Algorithm
A
- another algorithm used by recommender systems
- unlike collaborative filtering algorithm, content-based algorithms use features of the items such as genre, artist
- is user A has been buying Harry Potter books, it is likely that user A may purchase another fantasy book ‘the hobbit’
- one technique to computer similarity between two items is cosine similarity or Euclidean distance
5
Q
Feature Engineering
A
- process of extracting features from a raw dataset
- a term coined to give due importance to the domain knowledge required to select sets of features for machine learning algortihms
- as technology becomes more sophisticated, more datasets will be available
- but do we need all the features/variables of the dataset?
6
Q
Advantages of feature engineering
A
- improved predictive performance of the model
- faster and less complex machine learning process
- a better understanding of the underlying data relationships
- explainable and implementable machine learning models and solutions
7
Q
A