508 - 533: Inverted Index, Top-k and Skyline Queries Flashcards
In an inverted index, what does the posting list store?
a) Hash values of terms
b) Document IDs containing the term
c) Scores of terms in documents
d) Aggregated top-k scores
b) Document IDs containing the term
What is the goal of a Top-k Query?
a) Retrieve all documents with the highest scores
b) Compute the k objects with the highest aggregated scores
c) Find the k smallest indexed objects
d) List all objects in sorted order
b) Compute the k objects with the highest aggregated scores
What does Fagin’s Algorithm do first during query processing?
a) Perform random access to all lists
b) Aggregate scores directly
c) Read sequentially from each list to find k distinct objects in all lists
d) Skip lists with low scores
c) Read sequentially from each list to find k distinct objects in all lists
What defines a Skyline in a dataset?
a) The most frequent data points
b) Points dominated by all others
c) Points not dominated by any other point
d) The first k highest-scoring points
c) Points not dominated by any other point
What is the key difference between Fagin’s Algorithm and the Threshold Algorithm?
Fagin’s Algorithm processes until k objects are found in all lists, while the Threshold Algorithm uses a threshold (τ) to terminate earlier when no unseen objects can exceed the current scores.
Explain Dominance in Skyline Queries.
A point p dominates another point q if p is at least as good as q in all dimensions and strictly better in at least one dimension.
How does an inverted index improve search efficiency?
By mapping terms to a list of documents containing those terms, enabling efficient query retrieval.
What is the purpose of the k-skyband in Skyline Queries?
It consists of points dominated at most k−1 times, generalizing the concept of Skyline Queries.
Why is monotonicity important in Top-k Query processing?
It ensures that if an object has lower individual scores than another in all lists, its aggregated score cannot exceed the other’s.
Describe the role of aggregation functions in Top-k Queries.
They combine individual scores from index lists to compute an overall score for each object, used to rank results.