Lecture 8 - Big Data Concepts Flashcards
Vs that define Big Data
Volume Variety Velocity Veracity Variability Value
Challenges of Big Data?
Effectively and efficiently capturing, storing, and analyzing big data
Critical Success Factors of the Big Data?
A clear business need Strong, committed sponsorship Alignment between the business and IT strategy A fact-based decision-making culture A strong data infrastructure The right analytics tools Right people with skills
Enables of Big Data Analytics?
In-memory analytics
In-database analytics
Grid computing and MPP
Appliances
In-memory analytics?
Storing and processing the complete data set in RAM
In -Database analytics?
Placing analytic procedures close to where data is stored
Grid computing and MPP?
Use of many machines and processors in parallel
Appliances?
Combining hardware, software, and storage in a single unit for performance and stability
Challenges of Big Data Analytics?
Data volume Data integration Processing capabilities Data governance Skill avaiability Solution cost
Business Problems addressed by Big data analytics?
Process efficiency and cost reduction
Brand management
Revenue maximization
Enhanced customer experience
MapReduce?
Distributes the processing of very large multi-structured data files across a large cluster of ordinary machines
Goal of MapReduce?
Achieving high performance with simple computers
Example tasks of MapReduce?
Indexing Web for search
Graph analysis
Text analysis
Machine learning
Hadoop?
Is an open source framework for storing and analyzing massive amounts of distributed, unstructured data
How does Hadoop work?
Access unstructured and semi structured data
Break the data up into parts
Each part is replicated multiple times and loaded into the file system for replication and failsafe processing
A node acts as the facilitator and another as job tracker
Jobs are distributed to the clients and once completed the results are collected and aggregated using MapReduce