PageRank Flashcards

1
Q

What are some of the challenges of building a Web Search Engine?

A
  • Lack of order and indexing.
  • Many duplicated or similar pages.
  • Huge number of pages.
  • Pages aren’t classified.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give a brief description of the PageRank algorithm…

A

An algorithm that displays web pages based on their authority.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What 2 principles underpin the PageRank algorithm?

A

Hyperlink Tricks
Random Surfer Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the time complexity of PageRank?

A

O( MN^2 )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for the PageRank Vector?

A

Pn+1 = H * Pn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do we assign P0 on first iteration of PageRank?

A

Vector containing 1/N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 implementation issues PageRank have?

A

Sink Pages
Cycles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Sink Pages…

A

A page that has no outgoing links.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we identify a Sink Page?

A

If a column vector contains all 0’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the solution to a Sink Page problem?

A

Replace the column vector 0’s with 1/N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define the Cycle problem…

A

Linked pages that are in a cycle, and lead to an infinite increase of authority for the pages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the solution to the Cycle Problem?

A

Random Surfer Model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Random Surfer Model?

A

A solution to Cycles. It’s based on the assumption that a surfer has a slight chance of starting a new session as opposed to following a link. This probability is referred to the Dampening Factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Dampening Factor?

A

The probability that a browser stays on the website. Traditionally 0.85.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the formula for the Google Matrix?

A

G = d * A + ( 1 - d ) * B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give a step by step overview of the PageRank algorithm…

A
  1. Initialise P0 and H
  2. Eliminate any Sink Pages
  3. Solve the Cycle Pages problem
  4. Iterate until convergence
17
Q

In the Google Matrix equation, what does d, A, B stand for?

A

d = dampening factor.
A = Sink-free transition matrix.
B = Matrix containing 1/N.

18
Q

What is the formula to calculate the PageRank vecotr?

A

Pn+1 = H * Pn