Midterm Flashcards
What are the data types associated with Big Data?
Structured (tabular data), semi-structured (XML files), and unstructured (text, audio,
video, images) data are all associated with Big Data.
Which statement best describes small data?
Small data is available in limited quantities that humans can easily interpret with little or no digital processing.
Which of the following capabilities are quantifiable advantages of distributed processing?
- You can add and remove execution nodes as and when required, significantly reducing infrastructure costs.
- Since problem instructions are executed on separate execution nodes, memory and processing requirements are low even while processing large volumes of data.
- Parallel processing can process Big Data in a fraction of the time compared to linear processing.
- Parallel processing foxes and executes errors logically without impacting other nodes.
Which of these statements describes Big Data?
- Data generates in huge volumes and can be structures, semi-structured, or unstructured.
- Big data arrives continuously at enormous speed from multiple sources.
- Big data is mostly located in storage within enterprises and data centers.
Which of the following capabilities are quantifiable advantages of parallel processing?
Parallel processing can process Big Data in a fraction of the time compared to linear processing.
What is vertical scaling and horizontal scaling?
- Vertical scaling improves the current system. Bigger computer.
- Horizontal scaling adds more systems. Having more computers.
What is Big data? Why does it matter?
- Everything we do is increasingly leaving a digital trace ( or data) which we can use and analyze to become smarter. It’s the entire process, not just the data itself, it is a process.
- It matters because it’s the future. Data is being collected all around us all the time and we can use it to improve our lives.
Which of the following statements about Hadoop are true?
- Collection of computers working together at the same time to perform tasks.
- Hadoop allows for running applications on clusters.
- Process massive amount of data in distributed files systems that are linked together.
- Set of open-source programs and procedures which can be used as the framework for Big Data operation.
MapReduce is a programming model used in Hadoop for processing Big Data. Its also a processing technique for what?
Distributed computing
Which of the following key features of HDFS ensure against data loss?
Replication
What are the components of a Hadoop 1 Architecture(before 2014)?
HDFS and MapReduce
All of the following accurately describe Hadoop, Except?
- Open source
- Java based
- Real Time
- Distributed Computing Approach
Answer:
- Real Time
Which of the following is a component of Hadoop?
- YARN
- HDFS
- MapReduce
Namenode keeps metadata in?
HDFS
Which of the following is a data processing engine for Hadoop Framework?
MapReduce
In which language can you code in Hadoop?
Java
Hadoop can be deployed on commodity servers, which provides low-cost processing as well as storage of unstructured, huge volume of data.
True
Which of the following manages the resources among all the applications running in a Hadoop cluster?
YARN
What are the main Hadoop components in Hadoop 2 and Hadoop 3? What functions do they perform?
- YARN - Cluster Management
- HDFS - Manages the storage of data
- MapReduce - Framework to process data.