class 1 Flashcards

1
Q

categories in big data

A

nosql -> not only structured query language

eg: hbase, cassandra, mongodb, bigtable

hadoop ->

HDFS: hadoop distributed file system to store files on hadoop

Map reducer: to process the data

Yarn: yet another resource negotiator (new in hadoop 2.x)

Hadoop ecosystem: can write code without using JAVA

  • hive, impala, pig, scoop, oozie, flume, spark
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

types of projects

A

Analytical project - hive, impala

hql -> hive query language

impala -> same as hive

pig -> latin script

sqoop -> to import data from rdbms (oracle, mqsql) to hadoop

oozie -> to schedule the jobs

flume and spark streaming -> process continously flowing data

kafka -> process continous flowing data (java)

Spark -> latest, ram level processing,

Transactional projects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly