class 1 Flashcards
categories in big data
nosql -> not only structured query language
eg: hbase, cassandra, mongodb, bigtable
hadoop ->
HDFS: hadoop distributed file system to store files on hadoop
Map reducer: to process the data
Yarn: yet another resource negotiator (new in hadoop 2.x)
Hadoop ecosystem: can write code without using JAVA
- hive, impala, pig, scoop, oozie, flume, spark
types of projects
Analytical project - hive, impala
hql -> hive query language
impala -> same as hive
pig -> latin script
sqoop -> to import data from rdbms (oracle, mqsql) to hadoop
oozie -> to schedule the jobs
flume and spark streaming -> process continously flowing data
kafka -> process continous flowing data (java)
Spark -> latest, ram level processing,
Transactional projects