How does Google rank webpages? Flashcards

Question 1

Q

Number of webpages out there

Answer

A

40-60 billion

Question 2

Q

Directional links

Answer

A

which way a link leads to
in-degree
- links point towards webpage
out-degree
- links points away from webpage

Question 3

Q

Networks are made of what elements?

Answer

A

topolgy
- graphs, matrices
functionality
- what you do on the graph

Question 4

Q

Calculate Node Importance Scores

Answer

A

add up importance scores through incoming links
normalize by the spread of importance

Question 5

Q

What is generally important in determining a node’s importance score?

Answer

A

the hyperlink graph of webpages
the out-degree of all nodes pointing to the node
the importance scores of other nodes

The hyperlink graph is key, because that gives the adjacency relationships among the nodes. This gives us parameters such as the out-degrees of all nodes pointing to i. But we must also remember that computing importance is recursive: In the end, we must solve a linear system of equations that will comput the importance scores of all nodes simultaneously.

Question 6

Q

What does Google do? What two scores determine a page’s rank?

Answer

A

crawling the web
storing and indexing the pages
computing two scores to rank pages per search
- relevant scores
  - search terms
- importance scores

Question 7

Q

Matrix Muliplication

Answer

A

vector
- matrix with one column
columns of first matrix must = rows of second matrix
- 1x2 x 2x2
π^TH = Cπ^T
- H = vector
- π₂ = eigen vector of G
- C = constant = eigen value

Question 8

Q

What is the first ROW of the matrix H for the hyperlink graph?

Answer

A

[0 0.5 0.5 0]

Question 9

Q

Dangling Node

Answer

A

node with in-degrees and no out-degrees

Question 10

Q

What are reasons why dangling nodes pose an issue to computing a set of importance scores?

Answer

A

Mathematically speaking, each requires division by 0 across one row of the H matrix.
They make the algorithm less distributed.
In the equation π’H=π’ (with dangling rows in H filled with zeros), they could cause the only solution to be a vector of zeros.
They could cause the equation π’H=π’ not to have an equilibrium.

Since a dangling node has an out-degree of 0, this technically requires division by 0 when constructing H. Also, assuming we fill these dangling rows with 0’s, we showed a case where the only solution is a vector of 0’s. Finally, we stated that a dangling node can prevent H from having a dominant eivenvector corresponding to an eigenvalue of 1, which is a requirement for equilibrium. But they have nothing to do with making the PageRank algorithm more or less distributed.

Question 11

Q

What is true of the Google matrix G?

Answer

A

G has strictly positive entries (as long as θ is not 1).
It guarantees that the PageRank algorithm will converge to a unique equilibrium.
It is composed of a large, sparse matrix and two rank-one matrices.

The randomization ingredient guarantees that G will be strictly positive. Also, G guarantees that the PageRank algorithm will converge to a unique equilibrium. Further, we formed G as the sum of H, a large sparse matrix in reality, and two rank-one matrices. But the rows of G always sum to 1.

Question 12

Q

Consider the following webgraph. Without performing any computations, what is true?

Answer

A

Node 2 will have a higher importance score than Node 1.
Node 3 is a dangling node.
Node 4 is a dangling node.
The Google matrix G will have all positive entries (as long as theta is not 1).

Node 2 must have a higher importance score than Node 1, because Node 1’s importance comes entirely from Node 4, while Node 2’s comes from Node 4 and Node 1. Node 3 is a dangling node by definition because it doesn’t point to any other nodes. And the Google matrix always has positive entries. But Node 4 is not a dangling node.

How does Google rank webpages? Flashcards

(12 cards)