MapReduce and PageRank Flashcards
PageRank
The original algorithm (been changed a bit since then) that Google uses for search. It ranks pages based on how many links on the internet connect to a specific website. If there are more links, then you get seen sooner in the search. Fewer means you appear lower in the Google search
Chunking
Imagine you have a lot of roads going from one place to another. We want to organize these roads into groups based on where they start. Each group has only a few starting points.
For each starting point, we want to know two things: how much stuff is already there (let’s call it old credit), and how many roads are going out from there.
To make it easy for the person managing this information (let’s call them the Mapper), we make sure each group of roads is small enough to fit in their memory. This grouping process is sometimes called ““sharding.””
Now, here’s the thing: sometimes people decide how to group these roads, but if they don’t, it’s done in a random way.
So, it’s like putting roads into small groups, making sure each group is easy to handle, and if we don’t say how to group them, it happens randomly. The idea is to make things easy to manage and share the work effectively.
MapReduce
The main components that the programmer defines are:
Map: How to turn one element into a list of (key, value) pairs (often just one).
Reduce: How to combine two values into one value.
The framework turns these into the following tasks:
Mapper: Takes a chunk of elements and maps each element in the chunk into a (key, value) pair list.
Reducer: Takes a group of values (having the same key) and combines them into one value.