Big Data Flashcards

Big data terms

1
Q

6 V’s of big data

A
Volume
Variety
Velocity
Veracity
Value
Vulnerability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Veracity

A

the data must be
authentic,
credible, and
available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variety

A

The data is no longer (only) structured, so we have to forget that everything can be fitted in a traditional database. We must be prepared to add new data sources, with all kind of formats; ranging from plain text to multimedia contents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Volume

A

The amount of data collected absurdly grows every minute, and we have the need to adapt our storage and processing tools to that volume, using distributed solutions (use of multiple machines, instead of one very — VERY — expensive supercomputer / mainframe)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Velocity

A

The urgency required for the data to be processed, is linked to the frequency of its generation / acquisition, and the need to use them in decisions making as quickly as possible; even in real time (or almost).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Value

A

the data must have value for the business or for society

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Vulnerability

A

the data must comply with legality, respect privacy, and be stored and accessed in a safe way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 types of classic machine learning

A

1) supervised

2) unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Supervised ML

A

when the training data is “labeled”.
This means that, for each sample, we have the values ​​corresponding to the observed variables (the inputs) and the variable we wanna learn to predict or classify (the output, target, or dependent variable).
Withing this type we find
A) the regression algorithms (those predicting a numerical value) and
B) the classification algorithms (when the output is limited to certain categorical values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unsupervised ML

A

when the training data is not labeled (we don’t have a target variable).
The goal here is to find some kind of structure or pattern, for example to group the training samples, so we’ll be able to classify future samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine Learning (modern, sophisticated)

A

Ensemble methods
Reinforcement learning
Deep learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ensemble methods

A

basically it’s the joint use of several algorithms to obtain better results by combining their results.
The most common example is Random Forests, although XGBoost has become very famous because of its victories in Kaggle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reinforcement learning

A

the machine learns from trial and error, thanks to the feedback it gets in response to the iterations with its surrounding environment.
You may have heard about AlphaGo (world’s best Go player) or AlphaStar (capable of crushing us in Starcraft II)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Deep learning

A

It’s based on the use of artificial neural networks. An artificial neural network is a computational model, with a layered structure, formed by interconnected nodes that work together.
Using graphic Processing Units (GPUs) has improved Deep Learning speed and cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Programming basics

A
Data types 
strings 
Arrays 
Loops
Conditions
Variables
functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpreted vs compiled vs Byte code

A

Interpreted sends all source code
Compiled sends only machine code (doesn’t cross platform)
Intermediate, decided how much, called Byte code

Compiled C, C++, Objective C
Interpreted PHP, JavaScript
Hybrid Java, C#, VB.Net, Python

17
Q

JavaScript

A

Meant for manipulating web pages with the interpreter in the web browser
Unlike objective C, C++, or Java, which runs directly on the operating system.

Vb is interpreted by MS office
ActionScrip is interpreted by Flash

18
Q

Wealky typed language
Vs
Strongly typed language

A

Wealky typed language = variables do not need to be defined
Vs
Strongly typed language = variable type (e.g. integer, float, string) must be defined

19
Q

Escape the quotes
Vs
Comment out

A

Use a backslash if you need to use double quotes inside a string.
“He said, "that’s fine," and left.”

Use \n to carriage returN in a string

But use forward slash to comment.
// Single line comment
/* Multiple line
comment */

20
Q

Operator to add a value to a variable without creating a new variable

A

Increment operator, Decrement operator

+=

Score = score + 10

Score += 10

+= -= *= /=

If value is 1, just ++ –