Foundations Flashcards

1
Q

Example of a Transactional Query

A

“Best Budget Headphones”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Informational Query

A

Query where the user does not know what they are looking for, but searches for reading material on the given topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example of Informational Query

A

“Books on space”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Transactional Queries

A

Queries when we know what we are looking for and want to find places where we can get that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Question Answering Queries

A

Queries where we ask a question we don’t know the answer to, and answers are short, verifiable, and specific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 Main uses of Search Engines

A

Personal Search: Finding documents within your own personal computer
Enterprise Search: Finding information within an outside server/organization firewall (e.g., library catalog)
Vertical Search: Finding information across public websites about a specific topic (e.g., Yelp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Documents

A

Storage and retrieval unit of a search system. The system will always return documents or links to them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Example of Documents in a system

A

In Gmail, search returns links to emails, so emails are considered documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Query

A

Expression of information need that the user has.
The input to a search engine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Tokens

A

Token: Internal representation of a word in a document or query.
Tokenization: Process of converting words into tokens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Terms

A

Vocabulary (V): Set of unique tokens across all documents.
Term: Each element in the vocabulary set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Out-of-Vocabulary (OOV)

A

A token appearing in a query but not in any document. IR systems typically ignore OOV tokens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tokenization Benefits

A

Helps find documents when users type words differently.
Removes inconsistencies from documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Architecture of a Search Engine

A

Documents added to the index.
Users send queries and receive results.
Matching tokens in the query against terms in the index to find relevant documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Term-Document Incidence Matrix

A

Matrix indicating the importance of a term for a document. Common weight choices: Boolean, Counts, tf-idf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Index

A

Associative data structure on disk, allowing quick retrieval of objects given search keys.

17
Q

Boolean Queries

A

Each term is a boolean predicate.
True for being in the document, false otherwise.

18
Q

Pros of Boolean Query

A

Easy to understand, implement, and explain.

19
Q

Cons of Boolean Query

A

Hard to write, treats all documents and terms the same.

20
Q

Querying by Ranking

A

Scoring function should prioritize the number of query terms.
More appearances of a term in a document contribute more to the score.

21
Q

Scoring with Vectors

A

Documents and queries represented as vectors.
Score computed using document and query vectors.

A document’s score is influenced by the terms it shares with the query.

22
Q

Evaluating Quality of Answers

A

Benchmark with queries and answers by humans.
Precision: Fraction of returned documents that are relevant.
Recall: Fraction of relevant documents that are returned.

23
Q

Precision and Recall Trade-off

A

Precision and recall are somewhat opposites.
Perfect recall is achieved by returning every document.

24
Q

When should you prioritize Precision and Recall?

A

When to Prioritize:

Precision: When a bad answer can have consequences.
Recall: When missing a good answer can have consequences.