technology and tools Flashcards

1
Q

what is Hadoop?

A

open source distributed computing framework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is Hadoop written in?

A

java

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 4 main compensates of Hadoop?

A

map reduce
YARN
HDFS
Hadoop common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what size blocks does a Hadoop distributed file system (hdfs) use?

A

128 mb blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

in the HDFS of Hadoop is failure normal?

A

yes as its highly fault tolerable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the name node?

A

master server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does the name node do?

A

holds file system
undertakes file and directory operations
maps blocks to datanodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the data node?

A

a file split into more than one block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what do data nodes do?

A

read and write requests

reports back to namenode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is bad about HDFS?

A

not good for small reads
not good for many small files
append not amend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is map reduce?

A

java based programming paradigm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when to use map reduce?

A

problems that are embarrassingly parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does the Map from map reduce do?

A

Performs a map function on input key-value pairs to generate intermediate key-value pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does the reduce from map reduce do?

A

Performs a reduce function on intermediate key-value groups to generate output key-value pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

name a case where you would use map reduce?

A
data mining
spam detection
ad optimisation
index building in search engines 
article clustering for news
statistical machine translation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does YARN stand for?

A

Yet another resource negotiator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does yarn do?

A

Manages and monitors workloads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
What are the main features of yarn?
A shared
B fast
C scalability
D flexibility
E efficiency
A

A
C
D
E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is pig

A

Data flow language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is hive/hiveQL

A

SQL style query language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is hbase

A

Column-orientated database

22
Q

What is mahout

A

Machine learning library

23
Q

What is spark

A

In memory processing

24
Q
In Hadoop what are the data ingestion programs 
Flume
Hbase
Sqoop 
Storm
A

Flume
Sqoop
Storm

25
Q

In Hadoop what are the analytic and machine learning programs
Spark
Giraph
Mahout

A

Giraph

Mahout

26
Q
What are the no sql programs on Hadoop
Tez
Hbase
Cassandra
Spark
A

Hbase

Cassandra

27
Q

In Hadoop what programs are the engines
Spark
Storm
Tex

A

Spark

Tez

28
Q

What is zookeeper in hadoop

A

Cluster and workflow management

29
Q

What does hive do?

A

Coverts sql queries into java jobs

30
Q

What does hbase allow you to do?

A

Read/write operations on large datasets and works in real time

31
Q

What does spark do?

A

Analytic engine for large scale data processing

32
Q

What is different with sparks data sharing?

A

It’s in memory and not disk

33
Q

What is greenplum

A

Open source data platform

34
Q

What is postgresql

A

Rdbms with object oriented features

35
Q

What is MADlib

A

Open source library for in database analytics

36
Q

In greenplum what is the intersect operation

A

Rows from all answer sets

37
Q

In greenplum what is the except operation

A

Rows from first answer set minus rows from second

38
Q

In greenplum what is the union all operation

A

Rows from all answer sets with repeating rows

39
Q

In greenplum what is the union operation

A

Rows from all answer sets minus repeating rows

40
Q

In greenplum what is the group by operation

A

Group results based on one or more specified columns

41
Q

In greenplum what is the group by with union all operation

A

Add sub totals and grand totals

42
Q

In greenplum what is the roll up operation

A

Replaces union all

43
Q

In greenplum what is the cube operation

A

Creates sub totals of all possible combinations

44
Q

In greenplum what is the grouping function

A

Distinguishes NULL from summary markers

45
Q

In greenplum what is a window function.

A

Performs a calculation across a set of rows that are related to the current roe

46
Q

In greenplum and window functions what clause should you apply to specify which data window

A

OVER

47
Q

In greenplum window functions how would you define window partitions

A

PARTITION BY

48
Q

what does MAD stand for in MADlib?

A

magnetic
agile
deep

49
Q

what are the MADlib in-database analytical functions

a) regression
b) classification
c) validation
d) text analysis
e) descriptive analytics
f) clustering and top modelling
g) association rule mining

A

a) regression
b) classification
c) validation
e) descriptive analytics
f) clustering and top modelling
g) association rule mining

50
Q

what does MADlib do?

A

creates models without moving data out of DBMS