CHAPTER EIGHT THE ARCHITECTURE OF ANALYTICS AND BIG DATA ALIGNING A ROBUST TECHNICAL ENVIRONMENT WITH BUSINESS STRATEGIES Flashcards

1
Q

What has become technically and economically feasible over the last decade?

A

Capturing and storing huge quantities of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the data volume tiers mentioned?

A
  • Megabytes
  • Gigabytes
  • Terabytes
  • Petabytes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What percentage of all data is estimated to be analyzed?

A

0.5 percent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What challenges do most IT departments face regarding data?

A

Strain to meet minimal service demands and invest resources in support and maintenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a common issue organizations face when integrating data into analytical applications?

A

Data cleansing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the role of IT departments in analytics?

A

Manage information technology for analytics and other applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What critical task must organizations determine for analytical architecture?

A

How to encourage insightful answers and prevent uncontrolled proliferation of ‘versions of the truth’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is necessary for determining technical capabilities for analytical competition?

A

Close collaboration between IT organizations and business managers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should guiding principles for technology investments reflect?

A

Corporate priorities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the job of the IT architect or chief data officer?

A

To ensure the right data, technology, and processes for analytics across the enterprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the stages of analytical competition?

A
  • Stage 1: Poor-quality data and poorly integrated systems
  • Stage 2: Efficient transaction data collection but lacking the right data
  • Stage 3: Proliferation of BI tools but non-standard data
  • Stage 4: High-quality data with an enterprise-wide analytical plan
  • Stage 5: Full-fledged analytics architecture with integrated big and small data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the analytics and big data architecture encompass?

A

Processes and technologies for collecting, structuring, managing, and reporting decision-oriented data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the six elements of the analytics and big data architecture?

A
  • Data management
  • Transformation tools and processes
  • Repositories
  • Analytical tools and applications
  • Data visualization tools and applications
  • Deployment processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the goal of a well-designed data management strategy?

A

To ensure the organization has the right information and uses it appropriately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a major challenge companies face regarding data?

A

Dirty data: inconsistent, fragmented, and out of context information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What questions must IT and business experts tackle to achieve analytical competition?

A
  • Data relevance
  • Data sourcing
  • Data quantity
  • Data quality
  • Data governance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does data relevance pertain to?

A

What data is needed to compete on analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the significance of having access to the right data?

A

It is crucial for competitive differentiation and business performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What problem arises from the collaboration between IT and business managers?

A

Blame for wrong data collection or unavailability of right data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What companies have improved cooperation between quantitative analysts and business leaders?

A
  • Intel
  • Procter & Gamble
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What do IT executives believe about business managers regarding data needs?

A

They believe business managers do not understand what data they need

This reflects a gap in communication and understanding between IT and business sides.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What do surveys of business managers reveal about IT executives?

A

Business managers believe IT executives lack the business acumen to make meaningful data available

This indicates a need for better collaboration between IT and business leaders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is essential for organizations to compete analytically?

A

Cooperation between business leaders and IT managers

Without this cooperation, data gathering for competitive analysis is severely limited.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What role do quantitative analysts play in companies like Intel and Procter & Gamble?

A

They work closely alongside business leaders

This collaboration helps bridge the gap between data analysis and business needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is crucial for defining relationships among data used in analysis?

A

Considerable business expertise is required

This expertise helps IT understand potential relationships in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Identify the types of customers an insurance company may have.

A
  • Corporate customers
  • Individual subscribers
  • Members of subscribers’ families

Each type of customer has unique medical histories and relationships with service providers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is necessary for data to be useful for analytics?

A

Insight into the nature of relationships among the data

Without this insight, data usefulness is extremely limited.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Where does data for analytics originate?

A

It originates from many places and needs to be managed through an enterprise-wide infrastructure

This ensures that data is streamlined, consistent, and scalable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the importance of having common applications and data across the enterprise?

A

It helps yield a ‘consistent version of the truth’

This is essential for everyone involved in analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are enterprise systems?

A

Integrated software applications that automate, connect, and manage information flows for business processes

They are critical for providing consistent data for tasks like financial reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is edge analytics?

A

A paradigm where data is analyzed at the source rather than being sent to a centralized repository

This approach is becoming more common due to the growth of IoT devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a challenge with collecting data from IoT devices?

A

It is often unfeasible to send all data to a central repository for analysis

Real-time analytics at the edge can optimize operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are some sources of external data?

A
  • Internet
  • Social media
  • External data providers
  • Government information
  • Company websites

These sources provide valuable data for analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a potential legal issue with data collection?

A

Sensitive customer information may be illegal to capture

Organizations must navigate legal constraints when collecting data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the significance of Progressive’s Snapshot program?

A

It offers discounts for customers who allow data collection about their driving behavior

This helps in accurate pricing and understanding risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How much data was Walmart’s data warehouse in 2007?

A

About 600 terabytes

This was the largest data warehouse at that time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the current trend in data volume management?

A

Hadoop clusters storing data across multiple commodity servers

This technology allows for large-scale data management.

38
Q

What two pitfalls should companies avoid in data collection?

A
  • Collecting all possible data ‘just in case’
  • Collecting data that is easy to capture but not important

Both can lead to data overload and inefficiency.

39
Q

What are some characteristics that increase the value of data?

A
  • Correctness
  • Completeness
  • Currency
  • Consistency
  • Context
  • Control
  • Analysis

These attributes ensure data is actionable and valuable.

40
Q

What is the first step in the data management life cycle?

A

Data acquisition

This involves determining what data is needed and how to integrate IT systems.

41
Q

What does data cleansing involve?

A

Detecting and removing out-of-date, incorrect, incomplete, or redundant data

This is critical for ensuring data quality.

42
Q

What is the purpose of data organization and storage?

A

To systematically extract, integrate, and synthesize data for use

This ensures data is ready for analysis.

43
Q

What is ETL in data management?

A

Extract, Transform, Load

A traditional process for making data usable in a data warehouse.

44
Q

What is the role of transformation tools in data management?

A

They clean and validate data to make it decision-ready

This is necessary for accurate analytics.

45
Q

What is the role of transformation procedures in data management?

A

Transformation procedures define the business logic that maps data from its source to its destination.

46
Q

Why is significant manual effort required in data transformation?

A

Both business and IT managers must expend significant effort to transform data into usable information.

47
Q

What is the estimated labor cost for data integration according to Sohaib Abbasi?

A

For every dollar spent on integration technology, around seven to eight dollars is spent on labor for manual data coding.

48
Q

What is the challenge of defining business concepts like ‘customer’ in data transformation?

A

A ‘customer’ may be defined as a company in one system but as an individual in another, leading to inconsistent definitions.

49
Q

What can be done with missing data during transformation?

A

Missing data can sometimes be filled using inferred data or projections, or it may simply remain missing.

50
Q

What role do automated machine learning systems play in data standardization?

A

They help identify likely overlaps and redundancies in data.

51
Q

What is a data warehouse?

A

A data warehouse is a database that contains integrated data from different sources and is regularly updated.

52
Q

What is the purpose of a data mart?

A

Data marts support a single business function or process and usually contain predetermined analyses.

53
Q

What is contained in a metadata repository?

A

It contains technical information, data definitions, source information, and instructions on how the data should be applied.

54
Q

What are the advantages of open-source distributed data frameworks like Hadoop?

A

They allow storage of data in any format at lower costs than traditional warehouses but may require higher technical expertise.

55
Q

What is a data lake?

A

A data lake stores data in its original format and structures it as it is accessed for analysis.

56
Q

What factors influence the choice of analytical tools?

A

Factors include how thoroughly decision making should be embedded in business processes and whether to use third-party applications or custom solutions.

57
Q

What is the ROI of implementing packaged analytical applications according to IDC?

A

The median ROI is 140 percent.

58
Q

What are some major players in the analytical software market?

A

Major players include SAS, IBM, SAP, R, and RapidMiner.

59
Q

What is the primary use of spreadsheets in analytics?

A

Spreadsheets are used for the ‘last mile’ of analytics before data presentation.

60
Q

What are OLAP tools used for?

A

OLAP tools are used for semistructured decisions and analyses on relational data.

61
Q

What is a key characteristic of data visualization tools like Tableau?

A

They operate on entire datasets rather than just data cubes.

62
Q

What do statistical algorithms enable managers to do?

A

They enable managers to analyze data to arrive at optimal targets such as prices or loan amounts.

63
Q

How do rule engines function?

A

Rule engines process conditional statements to address logical questions.

64
Q

What technologies have superseded rule engines in popularity?

A

Machine learning and cognitive technologies.

65
Q

What is the objective of data mining tools?

A

To identify patterns in complex and ill-defined data sets.

66
Q

What can text mining tools help managers identify?

A

Emerging trends in near-real time.

67
Q

What is text categorization?

A

Using statistical models or rules to rate a document’s relevance to a certain topic.

68
Q

What is the purpose of natural language processing tools?

A

To make sense of language and answer human questions.

69
Q

What is event streaming used for?

A

To analyze data as it comes in from applications like the Internet of Things.

70
Q

What do simulation tools model?

A

Business processes using mathematical and scientific functions.

71
Q

What is the main focus of web or digital analytics?

A

Managing and analyzing online and e-commerce data.

72
Q

What is A/B testing in web analytics?

A

Statistical comparisons of which version of a website gets more clicks.

73
Q

What is web analytics?

A

A category of analytical tools for managing and analyzing online and e-commerce data.

74
Q

What type of information does web analytics typically provide?

A

Descriptive information such as unique visitors, time spent on a site, and conversion rates.

75
Q

What does A/B testing in web analytics entail?

A

Statistical comparisons of which version of a website gets more clicks or conversions.

76
Q

What is social media analytics focused on?

A

Counting social activities and assessing the sentiment associated with them.

77
Q

What problem can arise from excessive technological proliferation in analytics?

A

It can lead to the use of too many analytics and data management tools without a coherent architecture.

78
Q

According to a 2015 survey, how many analytics and data management tools did marketing organizations average?

A

More than twelve tools.

79
Q

What was a key barrier to success for Equifax identified in a 2010 assessment?

A

Analytics activities took too long to complete due to organizational and data-related issues.

80
Q

What significant change has Equifax made to its analytics infrastructure?

A

Shifted to a Hadoop-based data lake for easier and cost-effective data assembly.

81
Q

How has the speed of analytics changed at Equifax under new leadership?

A

Analytics evaluation time reduced from about a month to just a few days.

82
Q

What analytical model does Equifax use to identify trends in consumer credit history?

A

A neural network model.

83
Q

Which tools does Equifax incorporate into its analytics technology architecture?

A

Open-source tools like R and Python.

84
Q

What is the role of business intelligence software?

A

Allows users to create reports, visualize data, and share insights.

85
Q

What do visual analytical tools enable users to do without statistical skills?

A

Manipulate data and analyses through an intuitive visual interface.

86
Q

What is critical for effective deployment of analytics?

A

Creating, managing, implementing, and maintaining data and applications.

87
Q

What is a deployment platform in analytics?

A

A structured approach to managing the deployment process of analytics.

88
Q

What are major concerns related to deployment processes in analytics?

A

Privacy, security, and the ability to archive and audit data.

89
Q

What indicates a company has its analytics act together?

A

Centralized analytical roles and some degree of central coordination.

90
Q

What must senior management establish for a robust analytical architecture?

A

Guiding principles that align architectural decisions with business strategy.

91
Q

Fill in the blank: An enterprise-wide approach to managing data and analytics is often viewed as a _______.

A

renegade activity.

92
Q

What should the analytics architecture be able to do in a fast-changing environment?

A

Be flexible and adapt to changing business needs and objectives.