Chapter 1 - Term by Document Matrix Flashcards

1
Q

Define term by document matrix

A

Database of webpages containing various erms is called a term by document matrix The (i,j) term is 0 if term i is not in document j. Can be constructed where:
(i,j) entry is 1 if term i occurs in document j or (i,j) entry is no of times term i is in document j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define the search vector

A

Search vector for terms i1,i2,…..ik is a vector with 1 in positions ij, j=1,2,…..,k and zeros everywhere else

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What information will the dot product of the search vector and the column i of A the term by document matrix give?

A

v x column i of A >0 means document i contains at least one of the terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the dot product of the search vector and the term by document matrix return

A

A list of all documents for which the dot product is positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to find a webpage on which one of list of terms appears

A

Column vector v of terms and calculate v multiplied by each column of A. If answer is positive then they are the relevant pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define a normalised vector

A

Vector of length 1 int he same direction as the other vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When does equality occur int he Cauchy schwarz inequality

A

When u or v are multiples of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do I normalise the term by document matrix

A

Normalise each column of the matrix - replace column c with c/||c|| and normalise the search vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When normalised term by document and search vector what does dot product tell you

A

Calculated d.w where d is column of normalised term by document matrix and w is the normalised search vector.
d.w>0 only when v.c>0 (original un normalised version) so this means one of the terms is contained in the document
But d>w<=1 and the closer to 1 the number is the better the webpage fits our query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Format of term by document matrix

A

Terms are the rows - columns are the documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define semantic content

A

d.w - normalised dot product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How could a webpage filter webpages returned so that they are helpful using the dot product method

A

Set a lower bound for d.w and only return webpages above that bound to make sure helpful webpages are kept. Ensures webpages returned contain many occurrences of the query and to speed up response time but this can cause missed documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly