Information Retrieval Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Information Retrieval (IR)?

A

Information Retrieval (IR) refers to the process of obtaining relevant information from a large collection of unstructured data, typically in the form of text documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is IR used for?

A

The primary goal of information retrieval is to find relevant information in response to a user’s query, often in the context of large datasets, such as search engines, document databases, or online repositories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Draw me IR architecture.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain User:

A

The process starts with a user who has an information need and formulates a query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain Query:

A

The query could be a word, phrase or question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Corpus:

A

Collection of documents or data that the IR system searches throuh to retrieve relevant information. The corpus could be web pages, documents, databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain Indexing:

A

The corpus undergoes indexing, where each document is processed and indexed to allow fast information retrieval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Indexed Data Structure:

A

The indexing process creates an indexed data structure as a result. This contains orgainsed representations of the corpus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain IR System:

A

When the user submits a query, the IR system searches through the indexed data structures to find documents that match the query terms. It retrieves relevant documents and passes them back to the user.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain Output:

A

An output is generated based on the user’s query can could be a list of documents, ranked search results or relevant snippets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What data structures are used in IR?

A

Lists
Dictionaires (also called Hash Maps/Tables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a list and it’s usage?

A

A list is an ordered collection of elements where each element can be accessed by it’s index.

Usage in IR:
Document lists: Lists are used to store collections of documents or results. For example, a list of documents that match a query.
Posting lists: In inverted indexing (common in IR systems), each word (term) is associated with a posting list, which is a list of documents where the term appears.
Ordered result lists: The ranked list of search results presented to the user is often a list, where elements are ranked by relevance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a dictionary and it’s usage?

A

A dictionary is a collection of key-value pairs where each key is unique and values can be accessed via their keys.

Usage in IR:
Inverted index: A dictionary is often used to store the inverted index, where each key is a term (word) and the value is a posting list of documents containing that term.
Term frequency counts: Dictionaries can store term frequencies, where keys are terms and values are counts of how often a term appears in a document.
Document metadata: Dictionaries are used to store metadata about documents, such as document IDs and their corresponding attributes (e.g., title, URL, length).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pros and Cons of a List?

A

Pros:
Elements are stored in the order they are inserted
Has dynamic sizing
easy to iterate through
Insertion at the end is fast O(1)

Cons:
Slow search due to sequentially going through the list O(n)
Expensive insertions and deletions as it requries shifting of other elements O(n)
Not ideal for key based access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pros and Cons of a Dictionary?

A

Pros:
Fast look up due to key O(1)
Efficient insertions and deletions O(1) due to key value pairs
Ideal for large data sets
No duplicate keys
Dynamic size

Cons:
Unordered
Memory overhead
Handling collisions
Complex
Not suitable for sequential data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly