Week 11 (IO patterns) Flashcards

Question 1

Q

why would we not want to load or store data the way hadoop does out of the box

Answer

A

inject data from original source w/o storing in hdfs

feeding MR output to next process

Question 2

Q

2 general ways to modify the way data is loaded on disk

Answer

A

input format: configure how contiguous chunks of input are generated from blocks in hdfs

record reader: configure how records appear in the map phase

Question 3

Q

2 general ways to modify the way data is stored on disk

Question 4

Q

what are the roles fo input format in hadoop

Answer

A

make sure data is there

split input blocks and files into logical chunks to be assigned to a map task

create record reader to be used to create key,val pairs from raw input split

Question 5

Q

what type of view does inputsplit represent of the split

Answer

A

byte-oriented

Question 6

Q

what is partition pruning

Answer

A

configure if files are loaded into MR based on name of file

Question 7

Q

what is the goal of a reccomendation sys

Answer

A

predict the rating or preference that a user would give to an item

Question 8

Q

what is collabaritive filtering

Answer

A

the process of identifying similar users and reccomending what similar users like

Question 9

Q

in collab filtering, when are users similar

Answer

A

if their vectors are close according to some distance measure (jaccard or cosine distance)

Question 10

Q

big n of collab filtering and then what it eventually ends up being

m = num of customers

n = num of product/catalog items

Answer

A

O(MN)

ends up being O(M+N)

Question 11

Q

what does item to item collab filtering do

Answer

A

matches each of the users purchased items to similar items

combines those into reccomendation list

Question 12

Q

Week 11 (IO patterns) Flashcards

(12 cards)