Chapter 3 Flashcards
Organization Prerequisites
- data management and Big Data governance frameworks
- Sound processes and sufficient skillsets
- the quality of the data processing by Big Data solutions needs to be assessed
- well planned roadmap to use big data
Data Procurement
means getting data
- open-source platforms
- external data sources, ex gov data, commercial data markets
Privacy
data can reveal private information when the datasets are analyzed jointly
telemetry data
sending data methods, car diagnosis, fit bit
security
Securing Big Data involves ensuring that the data networks and repositories are sufficiently secured via
authentication and authorization mechanisms.
provenance
- Provenance refers to information about the source of the data and how it has been processed.
- helps determine the authenticity and quality of data,
and it can be used for auditing purposes - Ultimately, the goal of capturing provenance is to be able to reason over the generated analytic results with the knowledge of the origin of the data
Limited Realtime Support
Approaches that achieve near-realtime
results often process transactional data as it arrives and combine it with previously summarized batch-processed data.
Distinct Performance Challenges
- large datasets coupled with complex search algorithms can lead to long query times.
- network bandwidth- the time to transfer a unit of data can exceed its actual data processing time
Distinct Governance Requirements
- standardization of how data is tagged and the metadata used for tagging
- policies that regulate the kind of external data that may be acquired
- policies regarding the management of data privacy and data anonymization
- policies for the archiving of data sources and analysis results
- policies that establish guidelines for data cleansing and filtering
Distinct Methodology
A methodology will be required to control how data flows into and out of Big Data solutions
Clouds
clouds provide remote environments that can host IT
infrastructure for large-scale storage and processing, among other things
Big Data analytics lifecycle
- Business Case Evaluation
- Data Identification
- Data Acquisition & Filtering
- Data Extraction
- Data Validation & Cleansing
- Data Aggregation & Representation
- Data Analysis
- Data Visualization
- Utilization of Analysis Results
At different stages in the analytics lifecycle,
data will be in different states, which are __, ___, ___
data-in-motion (transmitted)
data-in-use (processed)
data-at-rest (storage)
Business Case Evaluation
- define what are the goals and motivation
- identify KPIs, if not, then SMART goal, which stands for specific, measurable, attainable, relevant and timely
- Determine the budget, and any required purchases
Data Identification
Identifying a wider variety of data sources