Big Data Lecture 01 Introduction Flashcards

1
Q

What is the main learning objective of the course?

A

Learn to query gigantic amounts of data even when it is a bit messy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is Data Science similar to Physics?

A

It is epistemic science of artificial data, so it has the same relation as Physics has to Mathematics, but to Computer Science.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What was the first human data transmitting manner and what was its problem, and how was it solved?

A

People would speak or sing, however, this would get distorted over time. Solved by writing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What was the first data storing format? What is its problem and how was it solved?

A

Clay tablet table, tables are the most natural form of storing data.<br></br><br></br>Problematic copying, this was solved by the printing press.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How was data stored in computers in history?

A
  • 1960s: file base systems
  • 1970s: relational databases
  • 2000s: NoSQL era
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Three Vs of Big Data

A
  • Volume
  • Variety
  • Velocity
  • (Veracity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do we store more data?

A

<ul><li>We can, storage is cheap.</li><li>It carries value.</li><li>Combined data is worth more than sum of its parts.</li><li>We need data totality, some sites only operate well if they have all the data.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name prefix for unit: 1 000 (3 zeros)

A

kilo (k)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name prefix for unit: 1 000 000 (6 zeros)

A

Mega (M)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name prefix for unit: 1 000 000 000 (9 zeros)

A

Giga (G)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name prefix for unit: 1 000 000 000 000 (12 zeros)

A

Tera (T)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name prefix for unit: 1 000 000 000 000 000 (15 zeros)

A

Peta (P)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name prefix for unit: 1 000 000 000 000 000 000 (18 zeros)

A

Exa (E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name prefix for unit: 1 000 000 000 000 000 000 000 (21 zeros)

A

Zetta (Z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name prefix for unit: 1 000 000 000 000 000 000 000 000 (24 zeros)

A

Yotta (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name prefix for unit: 1 000 000 000 000 000 000 000 000 000 (27 zeros)

A

Ronna (R)

17
Q

Name prefix for unit: 1 000 000 000 000 000 000 000 000 000 000 (30 zeros)

A

Quetta (Q)

18
Q

What are examples of different data shapes?

A

<ul><li>Tables,</li><li>trees,</li><li>graphs,</li><li>cubes,</li><li>text (unstructured).</li></ul>

19
Q

What is capacity?

A

How much data we can store.

20
Q

What is throughput?

A

How fast we can transmit data.

21
Q

What is latency?

A

How long till we start receiving data.

22
Q

What is the progress made in capacity, throughput and latency in last 70 years? What does this mean?

A

<ul><li>Capacity 23 000 000 000x,</li><li>Throughput 20 800x,</li><li>Latency 144x.</li></ul>

This is a big problem, now we need to parallelize.

23
Q

What is Big Data?

A

Porfolio of technologies that we designed to <i>store, manage and analyze data</i> that is too large to fit on a single machine while accommodating for the issue of growing discrepancy beween capacity, throughput and latency.