11 - Scaling Up: Big Data Flashcards

1
Q

Parallel processing

A
  • parallelism: work on separate pieces at the same time
  • – challenges = coordination, mutability, blocking
  • distributed computing: same but != CPU, machine
  • – challenges = sending instruction, fault tolerance, data storage and retrieving
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Programming: imperative, declarative

A
  • imperative = direct orders, manual scheduling and data ctl, optimize perf possible (C, C++, Java, Matlab)
  • declarative = state goals, data automatically managed and stored, automatic scheduling but not necessarily efficient (SQL, R, Python can be)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Queue computing

A
  • master (or name) node(s) = main address
  • worker node = where computation is performed
  • scheduler = decides which job, which resources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Databases

A
  • SQL = structured query language

- NOSQL (beyond)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Big data

A
  • PageRank: sort website by qlty (Google)
  • MapReduce
  • – map = apply fct° to every e/ of list
  • – reduce = aggregate e/ and summarize
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Further big data

A
  • distributed computing: Hadoop, Spark, Dask, DAGs

- cloud computing: Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly