Big Data Flashcards

Question 1

Q

6 V’s of big data

Answer

A

Volume
Variety
Velocity
Veracity
Value
Vulnerability

Question 2

Q

Veracity

Answer

A

the data must be
authentic,
credible, and
available

Question 3

Q

Variety

Answer

A

The data is no longer (only) structured, so we have to forget that everything can be fitted in a traditional database. We must be prepared to add new data sources, with all kind of formats; ranging from plain text to multimedia contents

Question 4

Q

Volume

Answer

A

The amount of data collected absurdly grows every minute, and we have the need to adapt our storage and processing tools to that volume, using distributed solutions (use of multiple machines, instead of one very — VERY — expensive supercomputer / mainframe)

Question 5

Q

Velocity

Answer

A

The urgency required for the data to be processed, is linked to the frequency of its generation / acquisition, and the need to use them in decisions making as quickly as possible; even in real time (or almost).

Question 6

Q

Value

Answer

A

the data must have value for the business or for society

Question 7

Q

Vulnerability

Answer

A

the data must comply with legality, respect privacy, and be stored and accessed in a safe way

Question 8

Q

2 types of classic machine learning

Answer

A

1) supervised

2) unsupervised

Question 9

Q

Supervised ML

Answer

A

when the training data is “labeled”.
This means that, for each sample, we have the values corresponding to the observed variables (the inputs) and the variable we wanna learn to predict or classify (the output, target, or dependent variable).
Withing this type we find
A) the regression algorithms (those predicting a numerical value) and
B) the classification algorithms (when the output is limited to certain categorical values)

Question 10

Q

Unsupervised ML

Answer

A

when the training data is not labeled (we don’t have a target variable).
The goal here is to find some kind of structure or pattern, for example to group the training samples, so we’ll be able to classify future samples.

Question 11

Q

Machine Learning (modern, sophisticated)

Answer

A

Ensemble methods
Reinforcement learning
Deep learning

Question 12

Q

Ensemble methods

Answer

A

basically it’s the joint use of several algorithms to obtain better results by combining their results.
The most common example is Random Forests, although XGBoost has become very famous because of its victories in Kaggle

Question 13

Q

Reinforcement learning

Answer

A

the machine learns from trial and error, thanks to the feedback it gets in response to the iterations with its surrounding environment.
You may have heard about AlphaGo (world’s best Go player) or AlphaStar (capable of crushing us in Starcraft II)

Question 14

Q

Deep learning

Answer

A

It’s based on the use of artificial neural networks. An artificial neural network is a computational model, with a layered structure, formed by interconnected nodes that work together.
Using graphic Processing Units (GPUs) has improved Deep Learning speed and cost

Question 15

Q

Programming basics

Answer

A

Data types 
strings 
Arrays 
Loops
Conditions
Variables
functions

Question 16

Q

Interpreted vs compiled vs Byte code

Answer

Study These Flashcards

A

Interpreted sends all source code
Compiled sends only machine code (doesn’t cross platform)
Intermediate, decided how much, called Byte code

Compiled C, C++, Objective C
Interpreted PHP, JavaScript
Hybrid Java, C#, VB.Net, Python

Question 17

Q

JavaScript

Answer

Study These Flashcards

A

Meant for manipulating web pages with the interpreter in the web browser
Unlike objective C, C++, or Java, which runs directly on the operating system.

Vb is interpreted by MS office
ActionScrip is interpreted by Flash

Question 18

Q

Wealky typed language
Vs
Strongly typed language

Answer

Study These Flashcards

A

Wealky typed language = variables do not need to be defined
Vs
Strongly typed language = variable type (e.g. integer, float, string) must be defined

Question 19

Q

Escape the quotes
Vs
Comment out

Answer

Study These Flashcards

A

Use a backslash if you need to use double quotes inside a string.
“He said, "that’s fine," and left.”

Use \n to carriage returN in a string

But use forward slash to comment.
// Single line comment
/* Multiple line
comment */

Question 20

Q

Operator to add a value to a variable without creating a new variable

Answer

Study These Flashcards

A

Increment operator, Decrement operator

+=

Score = score + 10

Score += 10

+= -= *= /=

If value is 1, just ++ –

Big Data Flashcards

Big data terms (20 cards)