03 - Content-based Filtering Flashcards

Question 1

Q

How to represent textual data?

Answer

A

Question 2

Q

How does tabular data for Recommender Systems look?

Answer

A

Every document/instance is represented as one row of a table/matrix or as a vector
Every column (feature) corresponds to a term
All documents are vectors in a vector space
Every term corresponds to one dimension in the vector space
Every instance represents one feature vector or point in a n-dimensional vector space

Question 3

Q

How to measure the relevance of a document?

Answer

A

Question 4

Q

What are possible problems with the Euclidian Distance (L2 Norm)?

Answer

A

Question 5

Q

What is the Inverse Document Frequency (IDF)?

Answer

A

IDF = log((Number of documents in corpus)/(Number of documents in D, that contain the searched term))

Question 6

Q

What is TF-IDF?

Answer

A

Question 7

Q

Why is IDF not ideal?

Answer

A

If there are no documents with this term, there is a division by 0

Question 8

Q

What is a possible extension for IDF?

Answer

A

If you have two documents that are equally relevant, you could define more criteria e.g. the age of the document

(8 cards)