Exam Deck Flashcards

1
Q

Basic Measures of TR system and formulas

A
Precision = a / a+c - are retreived results relevant
Recall = a / a+b - have all relevant documents been retrieved
F-measure = 2PR/P+R - combines Precision and recall
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the ideal PR curve, what does it characterize

A
  1. Horizontal line, Precision = recall

2. characterizes overall accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is average precision

A

standard measure of comparing two ranking methods

  • Combines Precision and recall
  • Sensitive to rank of every relevant document
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is nDCG

A
  • utility of top k documents
  • utility of lowly ranked document is discounted
  • normalized across queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe all types of feedback

A

Relevance feedback - Reliable judgement, but requires effort

Pseudo Feedback - Not reliable, assumed top k ranked docs are reliable with no user effort

Implicit Feedback - Uses clickthrough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Latent semantic indexing

A
  • find a way to represent the term-document space by a lower dimension latent space
  • improve storage and ambiguity search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

LSI steps

A
  • Term document matrix -> Word assignment to topics -> Topic importance -> topic distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros of using VSM

A
  • Automatic selection of index terms
  • Partial Matching of queries and documents
  • Ranking to the similarity score
  • Term weighting schemes
  • Various extensions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Problems with Lexical Semantics

A
  • Synonymy = Different terms may have identical or similar meanings similarity is high even though cosine small
  • Polysemy = words often have many meanings , vsm unable to discriminate. cosine large but should be small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Advantages of Lexical

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Main idea of LSI

A

Perform a low-rank appx of the document term matrix
General Idea
- to map documents to low dimensions.
- represent semantic associations
- compute similarity based on the inner product in semantic space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Web search challenges and opportunity

A

Challenges

  • Scalability = Parallel indexing and searching
  • Low-quality information and spam
  • Dynamics of web

Opportunities
- many additional heuristics can be leveraged to improve accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is web crawler

A
  • an essential component of web search
  • BFS
  • complete vs focused crawling
  • Incremental crawling is resource optimization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is mapreduce

A
  • Minimizes effort of programmer for simple parallel programming tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is page rank algorithm

A
  • captures page popularity

- random surfing to visit every page and assess the popularity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is hits algorithm

A
  • pages with good authorities/ good hubs