Big Data for Dummies Flashcards
What are the three Vs of big data?
Extremely large volumes of data; extremely high velocity of data, extremely wide variety of data
Why is big data important?
Enables organizations to gather, store, manage and maniuplate vast amounts of data at the right speed, the at the right time, to gain the right insights.
Data warehouses vs. data marts
Data warehouses can be too complex and large and didn’t offer the speed and agility that the business required. The answer was a further refinement of hte data being managed through data marts. Data marts were focused on specific business issues and more streamlined, supporting the business need for speedy queries.
Data warehouses are typically fed in…
Batch intervals, like daily or weekly. Limits in real-time business and consumer environments
What is a BLOB
Binary large objects – stores an unstructured data element. ODMS (object database management system) stores the BLOB as an addressable set of pieces so that we could see what was in there.
What is advantage of object database
Includes a programming language and a structure for the data elements so that it is easier to manipulate various data objects without programming and complex joins.
What are some of the technologies a the heart of big data? (4)
- Virtualization 2. Parallel processing 3. Distributed file systems 4. In-memory databases
Different approaches to handling data exist based on whether it is data in motion or data at rest. What is data in motion vs data at rest
Data in motion would be used if a company is able to analyze the quality of its products during the manufacturing process to avoid costly errors. Data at rest would be used by a business analyst to better understand customers’ current buying patterns.
Is big data a single technology? What does it help companies gain?
Big data is a combo of old and new technologies that helps companies gain actionable insight.
What are the 5 components of the cycle of big data management
- Capture 2. Organize 3. Integrate 4. Analyze 5. Act
Why is validation an important issue in big data management
If your organization is combining data sources, it is critical that you have the ability to validate that these sources make sense when combined. Also, certain data sources may contain sensitive information, so you must implement sufficient levels of security and governance.
Where would you start in big data management?
Start with the problem you’re trying to solve. That will dictate the kind of data that you need and what the architecture might look like.
How do you determine what performance requirements will be when setting up a big data management system?
Your needs will depend on the nature of hte analysis you are supporting. You will need the right amount of computational power and speed. Some analysis will be real time but you will be storing some amount of data as well. -How much data will my organization need to manage today and in the future? -How often will my organization need to manage data in real time or near real time? -How much risk can my organization afford? Is my industry subject to strict security, compliance and governance requirements?
Why do you need redundancy in your data management system?
So you are protected from unanticipated latency and downtime
What is in a big data tech stack?