How does Google rank webpages? Flashcards

1
Q

Number of webpages out there

A

40-60 billion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Directional links

A
  • which way a link leads to
  • in-degree
    • links point towards webpage
  • out-degree
    • links points away from webpage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Networks are made of what elements?

A
  • topolgy
    • graphs, matrices
  • functionality
    • what you do on the graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Calculate Node Importance Scores

A
  • add up importance scores through incoming links
  • normalize by the spread of importance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is generally important in determining a node’s importance score?

A
  • the hyperlink graph of webpages
  • the out-degree of all nodes pointing to the node
  • the importance scores of other nodes

The hyperlink graph is key, because that gives the adjacency relationships among the nodes. This gives us parameters such as the out-degrees of all nodes pointing to i. But we must also remember that computing importance is recursive: In the end, we must solve a linear system of equations that will comput the importance scores of all nodes simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Google do? What two scores determine a page’s rank?

A
  • crawling the web
  • storing and indexing the pages
  • computing two scores to rank pages per search
    • relevant scores
      • search terms
    • importance scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Matrix Muliplication

A
  • vector
    • matrix with one column
  • columns of first matrix must = rows of second matrix
    • 1x2 x 2x2
  • πTH = CπT
    • H = vector
    • π2 = eigen vector of G
    • C = constant = eigen value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the first ROW of the matrix H for the hyperlink graph?

A
  • [0 0.5 0.5 0]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dangling Node

A
  • node with in-degrees and no out-degrees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are reasons why dangling nodes pose an issue to computing a set of importance scores?

A
  • Mathematically speaking, each requires division by 0 across one row of the H matrix.
  • They make the algorithm less distributed.
  • In the equation π’H=π’ (with dangling rows in H filled with zeros), they could cause the only solution to be a vector of zeros.
  • They could cause the equation π’H=π’ not to have an equilibrium.

Since a dangling node has an out-degree of 0, this technically requires division by 0 when constructing H. Also, assuming we fill these dangling rows with 0’s, we showed a case where the only solution is a vector of 0’s. Finally, we stated that a dangling node can prevent H from having a dominant eivenvector corresponding to an eigenvalue of 1, which is a requirement for equilibrium. But they have nothing to do with making the PageRank algorithm more or less distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is true of the Google matrix G?

A
  • G has strictly positive entries (as long as θ is not 1).
  • It guarantees that the PageRank algorithm will converge to a unique equilibrium.
  • It is composed of a large, sparse matrix and two rank-one matrices.

The randomization ingredient guarantees that G will be strictly positive. Also, G guarantees that the PageRank algorithm will converge to a unique equilibrium. Further, we formed G as the sum of H, a large sparse matrix in reality, and two rank-one matrices. But the rows of G always sum to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Consider the following webgraph. Without performing any computations, what is true?

A
  • Node 2 will have a higher importance score than Node 1.
  • Node 3 is a dangling node.
  • Node 4 is a dangling node.
  • The Google matrix G will have all positive entries (as long as theta is not 1).

Node 2 must have a higher importance score than Node 1, because Node 1’s importance comes entirely from Node 4, while Node 2’s comes from Node 4 and Node 1. Node 3 is a dangling node by definition because it doesn’t point to any other nodes. And the Google matrix always has positive entries. But Node 4 is not a dangling node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly