Chapter 3 Flashcards

1
Q

Organization Prerequisites

A
  • data management and Big Data governance frameworks
  • Sound processes and sufficient skillsets
  • the quality of the data processing by Big Data solutions needs to be assessed
  • well planned roadmap to use big data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Procurement

A

means getting data

  1. open-source platforms
  2. external data sources, ex gov data, commercial data markets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Privacy

A

data can reveal private information when the datasets are analyzed jointly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

telemetry data

A

sending data methods, car diagnosis, fit bit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

security

A

Securing Big Data involves ensuring that the data networks and repositories are sufficiently secured via
authentication and authorization mechanisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

provenance

A
  • Provenance refers to information about the source of the data and how it has been processed.
  • helps determine the authenticity and quality of data,
    and it can be used for auditing purposes
  • Ultimately, the goal of capturing provenance is to be able to reason over the generated analytic results with the knowledge of the origin of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Limited Realtime Support

A

Approaches that achieve near-realtime

results often process transactional data as it arrives and combine it with previously summarized batch-processed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distinct Performance Challenges

A
  1. large datasets coupled with complex search algorithms can lead to long query times.
  2. network bandwidth- the time to transfer a unit of data can exceed its actual data processing time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Distinct Governance Requirements

A
  • standardization of how data is tagged and the metadata used for tagging
  • policies that regulate the kind of external data that may be acquired
  • policies regarding the management of data privacy and data anonymization
  • policies for the archiving of data sources and analysis results
  • policies that establish guidelines for data cleansing and filtering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Distinct Methodology

A

A methodology will be required to control how data flows into and out of Big Data solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Clouds

A

clouds provide remote environments that can host IT

infrastructure for large-scale storage and processing, among other things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Big Data analytics lifecycle

A
  1. Business Case Evaluation
  2. Data Identification
  3. Data Acquisition & Filtering
  4. Data Extraction
  5. Data Validation & Cleansing
  6. Data Aggregation & Representation
  7. Data Analysis
  8. Data Visualization
  9. Utilization of Analysis Results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

At different stages in the analytics lifecycle,

data will be in different states, which are __, ___, ___

A

data-in-motion (transmitted)
data-in-use (processed)
data-at-rest (storage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Business Case Evaluation

A
  1. define what are the goals and motivation
  2. identify KPIs, if not, then SMART goal, which stands for specific, measurable, attainable, relevant and timely
  3. Determine the budget, and any required purchases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Identification

A

Identifying a wider variety of data sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Acquisition & Filtering

A
  • gather all of the data sources that were identified during the previous stage
  • filtering for the removal of corrupt data
17
Q

Data Extraction

A

is dedicated to extracting disparate data and transforming it
into a format that the underlying Big Data solution

18
Q

Data Validation & Cleansing

A

dedicated to establishing often complex validation rules and removing any known invalid data

19
Q

Data Aggregation & Representation

A

is dedicated to integrating multiple datasets together using common fields

20
Q

Confirmatory data analysis

A

deductive approach where the cause of the phenomenon being investigated is proposed beforehand.
The proposed cause or assumption is called a hypothesis.
The data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions

21
Q

Exploratory data analysis

A

is an inductive approach that is closely associated with data mining.
No hypothesis or predetermined assumptions are generated.
it may not provide definitive answers