1.Introduction Flashcards
What is TF.IDF used for? How is it obtained? What does a high TF.IDF mean?
Time Frequency times Inverse Document Frequency. TF is obtained by TF_ij = f_ij/max_k(f_k), the number of occurrences of a term divided by the max number of occurrences of any term. IDF is obtained by IDF_i = log_2(N/n_i) where N is the total number of documents and n_i is the number of documents where the term is present. The TF.IDF score is then TF_ij X IDF_i. The terms with the highest scores are often the best at characterizing the topic of the document.
What is a hash function? (Definition, examples & applications)
A hash function is a function which maps objects from a universe U (numbers, strings, documents, etc.) into integers {0, 1, …, N-1} (“buckets”), in a uniform way (more or less), i.e., each bucket contains more or less the same number of objects. Can be computed in linear time.
What is an index? (Definition, examples & applications)
An index is a data structure that makes it efficient to retrieve objects given the value of one or more elements of those objects. The most common situation
is one where the objects are records, and the index is on one of the fields of that record. Given a value v for that field, the index lets us retrieve all the records with value v in that field, without having to retrieve all the records in the file.
Why is the constant e important? How can it be approximated?
Approximated by (1+1/x)^x. It is important and useful because it lets us obtain approximations to many other seemingly complex expressions with the help of its own approximation and Taylor expansion. (especially (1+a)^b = e^ab)
(1+a)^b = ?
e^ab
(1+1/x)^x = ?
1/e