Enterprise Data Management Flashcards

1
Q

Within data migration phases, Extraction, Transformation and Load scripts are deliverables of

A) Develop Programs and Testing
B) Data Extraction
C) Data Transfer
D) Data Transformation

A

A) Develop Programs and Testing

This is because ETL scripts involve developing the necessary code and logic to extract data from source systems, transform it into the required format, and load it into target systems. This development and subsequent testing ensure that the data migration process is accurate and efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following is used to migrate data

A) DataClean
B) ETI Extract
C) Trillium
D) Oracle Warehouse Builder

A

C) Trillium
*D) Oracle Warehouse Builder**

Here’s why the other options are less likely:

DataClean is typically used for data cleansing, which focuses on improving data quality by identifying and correcting errors or inconsistencies. While data cleaning can be part of the data migration process, it’s not the core function.

ETI Extract isn’t a commonly recognized term in data migration. “Extract” is a stage in the data migration process (Extract, Transform, Load) but it wouldn’t be a standalone tool name.

Oracle Warehouse Builder is a data warehousing tool, not specifically designed for data migration. While it can be used for data movement tasks, it’s not the primary purpose.
Trillium, on the other hand, is a software suite from various vendors that offers data integration and migration capabilities. It’s a more fitting choice for the scenario.

It’s important to note that there are many other data migration tools available, and the best choice depends on the specific requirements of the migration project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following is not a valid way for providing DQM in Source-to-Target Architecture?

A) DQM at Source
B) DQM as a part of External processes
C) DQM as a part of ETL process
D) DQM at Target

A

B) DQM as a part of External processes

Data Quality Management should be integrated directly within the source, ETL process, or target to ensure data integrity and quality throughout the data migration process. External processes that are not part of the ETL or data handling pipeline may not effectively manage and ensure data quality within the Source-to-Target Architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Architecture of the data migration solution should include (Choose all which are applicable)

A) Ability to block data access to source environments
B) Ability to handle data cleansing requirements
C) Ability to handle document regeneration
D) Ability to perform audits for Reconciliation

A

B) Ability to handle data cleansing requirements
D) Ability to perform audits for Reconciliation

Explanation:
B) Ability to handle data cleansing requirements:
Ensuring data quality is crucial in data migration. The architecture should include mechanisms for data cleansing to correct errors and inconsistencies before loading into the target system.

D) Ability to perform audits for Reconciliation:
The ability to audit and reconcile data ensures that the migration process is accurate and complete. This helps in verifying that the data in the target environment matches the source data post-migration.

Not applicable:
A) Ability to block data access to source environments:
While security and access control are important, blocking data access to source environments is not typically a requirement of the migration architecture itself. This might be more relevant to security policies rather than the architecture of the migration solution.

C) Ability to handle document regeneration:
Document regeneration is not usually a core requirement of data migration solutions. Data migration primarily focuses on moving data rather than regenerating documents, which could be a separate process outside the migration scope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of these is mainly concerned with the aggregate results of movement of data from source to target?

A) Completeness
B) Validity
C) Data Flow
D) Business Rules

A

A) Completeness

Explanation:
A) Completeness:
Completeness refers to ensuring that all the expected data has been successfully moved from the source to the target. It checks that no data is missing during the migration process, thereby focusing on the aggregate results of the data movement.

Not applicable:
B) Validity:
Validity focuses on whether the data conforms to defined formats, types, and ranges.

C) Data Flow:
Data Flow refers to the movement and transformation of data through the different stages of the ETL process, but not necessarily the aggregate results.

D) Business Rules:
Business Rules are specific criteria that data must meet during the migration process, often used in the transformation phase to ensure the data aligns with business requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which are the reports generated out of data analysis and Profiling?

A) Issue Registers
B) Monitors
C) Matrices
D) Mappings

A

A) Issue Registers

Explanation:
A) Issue Registers:
Issue registers are documents that list problems or discrepancies identified during data analysis and profiling. They help track data quality issues and guide efforts to resolve them.

Not applicable:
B) Monitors:
Monitors refer to tools or processes used to continuously observe data or systems, but they are not reports generated from data analysis and profiling.

C) Matrices:
Matrices are often used for mapping and displaying relationships or comparisons, but they are not specifically reports generated from data profiling.

D) Mappings:
Mappings refer to the definition of how data fields from the source map to the target fields in data migration, not a report generated from data profiling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of these is a benefit of implementing metadata management?

A) Facilitates change management
B) Creates an agile business platform
C) Helps in preventing data lineage reporting and analysis
D) Enables Service Oriented Architecture

A

A) Facilitates change management

Explanation:
A) Facilitates change management:
Metadata management helps in understanding the structure, relationships, and dependencies of data across the enterprise. This understanding is crucial for managing changes effectively, as it allows for the assessment of impact, better planning, and smoother implementation of changes.

Not applicable:
B) Creates an agile business platform:
While metadata management can contribute to agility, it is not the primary benefit directly associated with implementing metadata management.

C) Helps in preventing data lineage reporting and analysis:
This is incorrect because metadata management actually supports and enhances data lineage reporting and analysis by providing detailed information about data origins, movements, and transformations.

D) Enables Service Oriented Architecture:
Enabling Service Oriented Architecture (SOA) is related to designing and implementing services, which is not a direct benefit of metadata management. Metadata management may support SOA by providing detailed data definitions and relationships, but it is not the primary benefit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Physical model is created by the following role

A) Database Administrator
B) Database Architect
C) Information Architect (IA)
D) Enterprise Architect (EA)

A

A) Database Administrator

Explanation:
A) Database Administrator (DBA):
DBAs are responsible for designing the physical data models based on logical data models created by database architects or information architects. They translate the logical data models into physical database designs, including schema definitions, indexing strategies, storage allocation, etc.

Not applicable:
B) Database Architect:
Database architects usually focus on designing logical data models and overall database structures rather than the physical implementation.

C) Information Architect (IA):
Information architects design the overall information architecture of an organization, focusing on data integration, governance, and strategy rather than physical database design.

D) Enterprise Architect (EA):
Enterprise architects design and oversee the entire IT architecture of an organization, including systems, applications, and infrastructure, but they are less involved in the detailed physical database design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

BIDS DQM is a modular approach for building problems

A) TRUE
B) FALSE

A

A) TRUE

Explanation:
BIDS (Business Intelligence Development Studio) DQM (Data Quality Management) is indeed a modular approach for building data quality rules and handling data quality issues within Microsoft’s BI (Business Intelligence) tools environment. This approach allows for flexible customization and integration of data quality processes into BI solutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In the DQM Methodology, DQM Framework governs (choose two)

A) Data Cleansing
B) Data Re-migration
C) Data Destruction
D) Data Quality Specification

A

A) Data Cleansing
D) Data Quality Specification

Explanation:
A) Data Cleansing:
The DQM Framework typically includes processes and rules for data cleansing, which is essential for ensuring data quality by correcting errors and inconsistencies.

D) Data Quality Specification:
The DQM Framework governs the specification of data quality requirements and standards, ensuring that data meets specified criteria for accuracy, completeness, consistency, etc.

Not applicable:

B) Data Re-migration:
Data re-migration refers to the process of migrating data again due to previous migration issues or changes in requirements. This is typically not directly governed by the DQM Framework.

C) Data Destruction:
Data destruction involves securely removing data that is no longer needed, which is more related to data lifecycle management and security policies rather than data quality management specifically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ETL and target database systems do not have access to update source data. When doing data cleansing at the source, if bad data is encountered, it has to be deleted from the source system.

A) TRUE
B) FALSE

A

A) TRUE

Explanation:
When ETL (Extract, Transform, Load) processes and target database systems do not have direct access to update source data, and data cleansing is performed at the source:

If bad data is encountered during data cleansing, it typically needs to be deleted from the source system to ensure that only cleansed and correct data is extracted and loaded into the target system. This ensures data integrity and accuracy throughout the ETL process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CITY column of a table contains information such as Bangalore, Bangalore-64, Bangalore-560001, Mumbai-400002 etc. In order to have just city information in the column following will need to be done

A) Data Merging
B) Data Splitting
C) Data Parsing
D) Data Mapping

A

C) Data Parsing

Explanation:
C) Data Parsing:
Data parsing involves extracting relevant parts of data from a larger string or field. In this case, parsing would be used to extract only the city name portion from entries that contain additional information like postal codes or codes after a dash. This process helps in standardizing the data format within the CITY column to contain only the city names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False: Data Profiling is not a part of the Data Migration Methodology

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Data profiling is indeed a part of the Data Migration Methodology. It involves analyzing and assessing the source data to understand its structure, quality, and characteristics before migration. This analysis helps in planning and executing the migration process effectively, ensuring that data integrity and quality are maintained throughout. Therefore, data profiling plays a crucial role in the initial stages of data migration methodology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: Parallel running strategy eliminates the problem of having dependencies between systems

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Parallel running strategy involves running both old and new systems simultaneously for a period during the transition phase of a system upgrade or migration. While it helps in validating the new system and ensuring continuity of operations, it does not inherently eliminate dependencies between systems. Dependencies can still exist, especially if data or processes need to synchronize or integrate between the old and new systems during parallel running. Dependency management remains crucial even with parallel running to ensure smooth transition and eventual decommissioning of the old system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which Metadata spans across the BI Technical Metadata and the Business Metadata?

A) Counterpoint Metadata
B) Back-Room Metadata
C) Front-Room Metadata

A

A) Counterpoint Metadata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Profiling program includes

A) Issue Register Maintenance
B) Analyzing Relationships
C) Threshold analysis of certain fields
D) Cleaning the Incorrect data

A

B) Analyzing Relationships
C) Threshold analysis of certain fields

Explanation:
B) Analyzing Relationships:
Data profiling involves examining relationships between different data elements to understand dependencies and associations within the dataset.

C) Threshold analysis of certain fields:
Threshold analysis involves setting criteria or thresholds for certain data fields to identify outliers, anomalies, or data quality issues based on predefined rules or thresholds.

Not applicable:
A) Issue Register Maintenance:
Issue register maintenance is typically a separate process for managing and tracking data quality issues identified during data profiling, rather than a direct component of data profiling itself.

D) Cleaning the Incorrect data:
Cleaning incorrect data is part of data cleansing, which is a subsequent step after data profiling identifies data quality issues. It is not typically considered part of the data profiling program itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The rule-based approach to Data clean-up includes (Choose all which are applicable)

A) Suggestive rules
B) Detective rules
C) Corrective rules
D) Derivative rules

A

B) Detective rules
C) Corrective rules

Explanation:
B) Detective rules:
Detective rules are used to identify and detect data quality issues, anomalies, or inconsistencies within the dataset.

C) Corrective rules:
Corrective rules are applied to clean, correct, or standardize the data based on predefined rules or criteria.

Not applicable:
A) Suggestive rules:
Suggestive rules typically provide recommendations or suggestions rather than enforcing specific actions for data clean-up.

D) Derivative rules:
Derivative rules are usually used to derive new data or metrics from existing data rather than directly related to data clean-up processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data Archival and Data Backup are synonymous—both are used for the same purpose of storing a primary copy of data.

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Data Archival and Data Backup serve different purposes:

Data Backup:
Data backup is the process of creating copies of data to protect against data loss due to hardware failure, accidental deletion, or other disasters. Backups are typically used for recovery purposes and are often stored temporarily or periodically updated.

Data Archival:
Data archival involves moving data that is no longer actively used but needs to be retained for compliance, historical, or business reasons to a separate storage location. Archival data is stored for long-term retention and retrieval, often in a different storage tier optimized for cost-efficiency and access frequency.

While both involve storing data copies, they serve distinct purposes related to data protection and long-term storage needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Select all the Data Quality and Data Profiling Tools among the following

A) First Logic
B) Informatica Data Explorer
C) Powercenter
D) Data Flux Power Studio

A

B) Informatica Data Explorer
D) Data Flux Power Studio

Data profiling tools help you understand the structure, content, and format of your data, while data quality tools cleanse and improve the accuracy of your data.

First Logic is a software development company and doesn’t specialize in data quality or profiling.
Powercenter is an ETL (Extract, Transform, Load) tool from Informatica and doesn’t focus on data profiling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Federated Metadata Management ensures relative autonomy for local repositories

A) TRUE
B) FALSE

A

A) TRUE

Explanation:
Federated Metadata Management allows for relative autonomy of local metadata repositories while enabling centralized management and governance. This approach supports distributed systems and organizations by allowing local repositories to maintain control over their metadata while facilitating interoperability and unified access across the enterprise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which among the following is a data management practice that characterizes the content, quality, and structure of your data

A) Data Enrichment
B) Data Caving
C) Data Refinement
D) Data Profiling

A

D) Data Profiling

Explanation:
Data Profiling involves analyzing and assessing the content, quality, and structure of data within a dataset. It helps in understanding data characteristics such as completeness, consistency, accuracy, and relationships between data elements. This practice is essential for data management and ensuring data meets organizational standards and requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Enterprise Data Management

The best place to implement data quality checks is

A) Source systems
B) ETL
C) Target Systems (Example: Data Warehouse)
D) All of the options

A

D) All of the options

Explanation:
Implementing data quality checks at various stages ensures comprehensive data quality management throughout the data lifecycle:

A) Source systems: Implementing data quality checks at the source ensures that data entering the system is accurate and consistent from the outset.

B) ETL (Extract, Transform, Load): Implementing data quality checks during ETL processes ensures that data transformations maintain data quality and integrity.

C) Target Systems (e.g., Data Warehouse): Implementing data quality checks in the target system ensures that data stored in the data warehouse or target database meets quality standards and is suitable for reporting and analysis.

By implementing data quality checks across all these stages, organizations can ensure that data is reliable, consistent, and accurate throughout its lifecycle, from acquisition to consumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Who is responsible for Data management and Data Quality from a business perspective

A) Data Analyst
B) Data Steward
C) Information Architect
D) Data Keeper

A

B) Data Steward

Explanation:
B) Data Steward: Data stewards are responsible for overseeing the management, quality, and governance of data within an organization. They ensure that data meets organizational standards, policies, and regulatory requirements. Data stewards collaborate closely with business users, data analysts, and IT teams to improve data quality, integrity, and usability.
While data analysts may analyze data and information architects design data structures, the primary responsibility for managing and ensuring the quality of data usually falls to data stewards within an organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which of the following is\are true about Automated data profiling tools?

A) Generates summary views
B) Enables effective decision-making
C) Can create appropriate cleansing rules
D) Result Interpretation requires the involvement of IT administrators

A

A) Generates summary views
C) Can create appropriate cleansing rules

Explanation:
A) Generates summary views: Automated data profiling tools generate summary views of data characteristics such as data distributions, patterns, completeness, and quality metrics. These summaries help users quickly understand the overall state of their data.

C) Can create appropriate cleansing rules: Many automated data profiling tools can analyze data patterns and anomalies to suggest or create appropriate cleansing rules. This capability helps in automating the data cleansing process based on identified issues.

Not applicable:
B) Enables effective decision-making: While data profiling is essential for providing insights into data quality and structure, the effectiveness of decision-making depends on how well these insights are interpreted and utilized by business users and stakeholders, not necessarily requiring IT administrators.

D) Result Interpretation requires the involvement of IT administrators: Interpretation of data profiling results can involve various stakeholders, including business analysts, data stewards, and IT administrators. However, it’s not exclusively limited to IT administrators; it depends on the organization’s data governance and management practices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which among the following is not an advantage of custom code (select as many as possible)

A) Low cost
B) Easily adaptable to changes
C) Optimization of programs
D) high auditing capabilities

A

A) Low cost
D) High auditing capabilities

Here’s the breakdown of the options regarding advantages of custom code:

Low cost (A): Not necessarily an advantage. Custom code development can be expensive due to developer time and ongoing maintenance.
Easily adaptable to changes (B): An advantage. Custom code can be tailored to specific needs and easily modified as requirements evolve.
Optimization of programs (C): An advantage. Developers can fine-tune custom code for performance and efficiency.
High auditing capabilities (D): Not necessarily an advantage. While custom code can be audited, it can also be complex and time-consuming to understand compared to pre-built solutions with readily available documentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Match the vendor name with the associated appliance server name

A) IBM—Balanced Configuration Unit or Balanced Warehouse
B) Ingres—IceBreaker
C) IBM—Greenplum
D) HP—Neo View

A

All of them

A) IBM - Balanced Configuration Unit or Balanced Warehouse - IBM offers a family of data warehousing servers called Balanced Configuration Unit (BCU).
B) Ingres - Ice Breaker - Ingres, a database management system, has “IceBreaker” as its data integration appliance.
C) IBM - Greenplum - IBM Greenplum is a data warehousing appliance specifically designed for large datasets.
D) HP - Neo View - Hewlett-Packard (HP) offered “NeoView” as a business intelligence and data visualization appliance (discontinued)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

RAID provides fault tolerance against only the disk failures.

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
RAID (Redundant Array of Independent Disks) provides fault tolerance against disk failures as well as other types of failures, depending on the RAID level used:

Disk failures: RAID protects data against disk failures by storing data redundantly across multiple disks. If a disk fails, data can be reconstructed from the remaining disks.

Other failures: Depending on the RAID level (such as RAID 1, RAID 5, RAID 6, etc.), RAID can also provide protection against other types of failures, such as controller failures, power failures, and even multiple disk failures in certain configurations (e.g., RAID 6).

Therefore, RAID is not limited to protecting against disk failures alone but can enhance data availability and fault tolerance against a range of potential failures within a storage system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Data Consolidation will have a major impact on

A) Sufficiency
B) Latency
C) Uniqueness
D) Consistency

A

D) Consistency

Explanation:
Consistency: Data consolidation involves bringing together data from different sources or systems into a single, unified repository. Ensuring consistency across this consolidated data—ensuring that data is accurate, up-to-date, and matches across all sources—is crucial for maintaining data integrity and reliability.
While data consolidation can indirectly affect sufficiency (ensuring enough data is gathered), latency (time delays in data processing), and uniqueness (ensuring data uniqueness and deduplication), consistency is directly impacted because the consolidation process aims to eliminate discrepancies and ensure uniformity across the integrated datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which is the process of transferring data from online to offline storage

A) Backup
B) Recovery
C) Archiving
D) Purging

A

The process of transferring data from online to offline storage is:

C) Archiving

Explanation:
Archiving: Archiving involves moving data that is no longer actively used but needs to be retained for long-term storage, regulatory compliance, or historical purposes to offline or secondary storage systems. This process helps free up primary storage space while ensuring data remains accessible when needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

ETL and target database systems do not have access to update source data. When doing data cleansing at the source, if bad data is encountered, it has to be scrubbed or cleaned manually or by applying some rules.

A) TRUE
B) FALSE

A

A) TRUE

Explanation:
When ETL (Extract, Transform, Load) processes and target database systems do not have direct access to update source data, any data cleansing or scrubbing required must be done at the source system. This typically involves manual intervention or applying automated rules to clean or correct the bad data before it is extracted and loaded into the target system. This ensures that only cleansed and accurate data is transferred and stored in the data warehouse or target database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Which of the following is not part of the Enterprise Data Management Framework?

A) Data Quality Management
B) Metadata Management
C) Corporate Performance Management
D) Master Data Management

A

C) Corporate Performance Management

Explanation:
Data Quality Management: This involves ensuring the accuracy, completeness, consistency, and reliability of data across the enterprise.

Metadata Management: This involves managing data about data, which helps in understanding, tracking, and using data effectively across the organization.

Master Data Management: This involves managing the critical data of an organization to ensure a single, consistent point of reference.

Corporate Performance Management: This is more about managing and monitoring an organization’s performance rather than managing data itself. It focuses on business metrics, performance indicators, and strategic management, which is outside the core scope of Enterprise Data Management.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

RAID stands for

A) Redundant Array of Inexpensive Disks
B) Redundant Array using Inexpensive Disks
C) Redundancy Array of Inexpensive Disks
D) Replicated Array using Inexpensive Disks

A

A) Redundant Array of Independent Disks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Which of the following statements are false? (Choose all which are applicable)

A) The migration process can be reused
B) If a tool does not have source system access privileges, then PULL technology is used
C) The architecture should have the ability to handle document regeneration
D) Generally a Target Look Alike is used for metadata creation and capture

A

Statement B remains entirely true.
Statement A can be true depending on the reusability of the data quality checks within the migration process.
Statements C and D are less relevant to core DQM architecture.

A) The migration process can be reused - This can still be TRUE or FALSE. Reusability depends on the data involved and the DQM practices implemented. Standardized data structures and transformations can promote reusability within the DQM framework.
B) If a tool does not have source system access privileges, then PULL technology is used - This remains TRUE. DQM tools rely on access to source data for quality assessment. If PULL isn’t possible due to permissions, PUSH would be the alternative.
C) The architecture should have the ability to handle document regeneration - For DQM, this becomes less relevant. The focus is on data quality, and document regeneration might be a secondary concern depending on the data types involved. However, the DQM architecture might need to handle metadata associated with documents.
D) Generally a Target Look Alike is used for metadata creation and capture - This is still FALSE. A TLA is primarily for testing data transformations and target system functionality, not core to DQM metadata creation. DQM tools typically have their own mechanisms for metadata capture from source systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Select any two types of Data Consolidation

A) House grouping
B) Business grouping
C) House holding
D) Business holding

A

The two types of Data Consolidation are:

C) House holding
B) Business grouping

Explanation:
House holding: This involves consolidating data based on households, which is commonly used in contexts like customer data management to aggregate data at the household level rather than the individual level.

Business grouping: This involves consolidating data based on business entities, which is used to aggregate data at the business unit or organizational level to provide a comprehensive view of business performance.

Not applicable:
House grouping: This is not a recognized term in data consolidation.
Business holding: This is not a recognized term in data consolidation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Which process verifies that the source field threshold is not subject to truncation during the transformation or loading of data

A) Source to target counts
B) Source to target data reconciliation
C) Field to Field verification
D) Domain counts

A

C) Field to Field verification

Explanation:
Field to Field verification: This process involves comparing the specific fields in the source data with the corresponding fields in the target data to ensure that data has been accurately transformed and loaded without truncation or data loss. It checks the field lengths, data types, and values to verify that the transformation process has not caused any truncation issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Which among the two strategies involves greater downtime

A) Big Bang
B) Parallel
C) Both
D) None

A

A) Big Bang

Explanation:
Big Bang: This strategy involves migrating all data and switching from the old system to the new system in one go, usually over a single event or a short period. This approach typically requires greater downtime because the entire system is offline during the migration process until the new system is fully operational.

Parallel: In contrast, the parallel strategy runs both the old and new systems simultaneously for a period, allowing users to switch gradually and reducing the overall downtime. However, it requires more resources to maintain both systems during the transition period.

37
Q

Unmanaged data issues can be solved by

A) Data Profiling
B) Data Parsing
C) Data Scrubbing
D) Data Consolidation

A

D) Data Consolidation

Explanation:
Data Consolidation: This process involves combining data from different sources into a single, coherent dataset. It helps in resolving unmanaged data issues by integrating disparate data, ensuring consistency, removing duplicates, and improving overall data quality and accessibility.
While other processes like data profiling, parsing, and scrubbing also play roles in improving data quality and management, data consolidation specifically addresses the issue of unmanaged and fragmented data by bringing it together into a unified structure.

38
Q

BIDS DQM has the flexibility to adopt various technologies.

A) TRUE
B) FALSE

A

A) TRUE

Explanation:
BIDS (Business Intelligence Development Studio) Data Quality Management (DQM) is designed to be flexible and adaptable, allowing it to integrate with and leverage various technologies. This flexibility enables organizations to implement and maintain data quality standards using the tools and technologies that best fit their existing infrastructure and business needs.

39
Q

Which among the following is not a part of steps in achieving Data Quality?

A) Data Consolidation
B) Data Cleansing
C) Data Redundancy
D) Data Matching

A

C) Data Redundancy

Explanation:
A) Data Consolidation: This process involves combining data from multiple sources into a single, coherent dataset, which helps in achieving consistency and completeness.

B) Data Cleansing: This involves identifying and correcting errors and inconsistencies in the data to ensure accuracy and reliability.

D) Data Matching: This process involves comparing and linking data from different sources to ensure consistency and eliminate duplicates.

C) Data Redundancy: Data redundancy refers to the unnecessary duplication of data, which can lead to inconsistencies and increased storage requirements. It is generally something that data quality processes aim to reduce or eliminate, not a step in achieving data quality.

40
Q

True or False: Backup is a primary copy of data

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
A backup is a secondary copy of data, created to protect against data loss. The primary copy of data is the original data actively used and stored in the primary storage system. Backups are used to restore data in case the primary copy is lost, corrupted, or otherwise inaccessible.

41
Q

Select the two obstacles for data parsing

A) Floating data
B) Misspellings
C) Outdated Information
D) Non-uniform structures

A

B) Misspellings
D) Non-uniform structures

Explanation:
B) Misspellings: Misspellings in data can make it difficult for parsing algorithms to correctly identify and extract intended information, leading to errors or incomplete parsing results.

D) Non-uniform structures: Data that does not follow a consistent or standardized format can pose challenges for parsing algorithms, as they may struggle to interpret and extract information consistently across different data instances.

42
Q

Which is gaining an understanding of data with respect to a Quality Specification?

A) Data Cleansing
B) Data Profiling
C) Data Quality Enhancement
D) Data Parsing

A

B) Data Profiling

Explanation:
Data Profiling: This process involves analyzing data to gain insights into its structure, content, relationships, and quality characteristics. It helps in understanding the overall quality of data against specified quality standards or specifications. Data profiling provides essential information for identifying data issues, assessing data completeness, consistency, accuracy, and other quality metrics before proceeding with data cleansing or enhancement activities.

43
Q

A materialized view is generally built using

A) 5th Normal form
B) 3rd Normal form
C) Denormalized form
D) It does not matter. It is just a view and is not required to store data

A

C) Denormalized form

Explanation:
Denormalized form: A materialized view is a database object that contains the results of a query and is stored as a table. It is typically denormalized, meaning it may contain redundant data or data from multiple normalized tables to improve query performance by reducing joins and aggregations at runtime.
Materialized views are used to precompute and store data to accelerate query performance, making them especially useful in data warehousing and reporting scenarios where query response time is critical.

44
Q

What are done in data modeling?

1) Defining entities and attributes
2) Defining relationships
3) Defining indexes and partitions
4) Defining tables and columns

A) 1, 2 and 4 are correct
B) 2,3 and 4 are correct
C) 1 and 2 are correct
D) 3 and 4 are correct

A

In data modeling, the following tasks are typically done:

Defining entities and attributes
Defining relationships
Defining tables and columns
Therefore, the correct answer is:

A) 1, 2, and 4 are correct

Explanation:
1) Defining entities and attributes: Data modeling involves identifying and defining the entities (objects or things) within the system being modeled and their attributes (properties or characteristics).

2) Defining relationships: Data modeling also includes defining the relationships between entities, specifying how entities are related or connected to each other.

4) Defining tables and columns: In relational data modeling, entities are typically mapped to tables, and attributes are mapped to columns within those tables.

Indexes and partitions (mentioned in option 3) are more related to database administration tasks rather than data modeling itself.

45
Q

True or False: Internal/external hard disks cannot be used for data storage

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Both internal and external hard disks are commonly used for data storage in computing environments. They provide storage capabilities for various types of data ranging from personal files to system backups and large-scale databases. Therefore, the statement that internal/external hard disks cannot be used for data storage is false.

46
Q

Centralized metadata Architecture involves increased complexity in Configuration Management

A) TRUE
B) FALSE

A

A) TRUE

Explanation:
Centralized metadata architecture typically involves managing all metadata in a centralized repository or system. This approach can lead to increased complexity in configuration management because:

Centralization: Managing metadata from various systems and applications in a single repository can be complex due to the need for synchronization, consistency, and governance across different data sources.

Configuration Management: Ensuring that configurations are properly managed and updated across all systems and applications accessing the centralized metadata repository can be challenging, requiring robust processes and tools to maintain accuracy and integrity.

Therefore, centralized metadata architecture often introduces complexities in configuration management to ensure that all systems and applications relying on centralized metadata operate correctly and efficiently.

47
Q

Identification and evaluation of tools is carried out in which phase of migration

A) Framework definition
B) Assess
C) Design
D) Migrate

A

Identification and evaluation of tools typically occur in the Assess phase of migration.

Explanation:
Assess phase: This phase involves evaluating the current state of the environment, identifying business requirements, assessing risks, and evaluating potential tools and technologies that will be used in the migration process. This includes identifying tools for data profiling, ETL (Extract, Transform, Load), data quality management, and other aspects necessary for successful migration.

Framework definition: Involves defining the overall approach, strategy, and methodology for the migration project.

Design phase: Involves designing the target architecture, data mappings, transformation rules, and other technical specifications required for the migration.

Migrate phase: Involves executing the migration plan, moving data, implementing transformations, and validating the migrated data.

Therefore, the correct answer is B) Assess phase.

48
Q

The process of removing misspellings, transpositions and variations called

A) Data Consolidation
B) Data Profiling
C) Data Tracking
D) Data Correction

A

D) Data Correction

Explanation:
Data Correction: This process involves identifying and rectifying errors in data, which can include correcting misspellings, fixing data transpositions (such as switching characters or words), and standardizing variations to ensure consistency and accuracy in the data.

Data Consolidation: Involves combining data from multiple sources into a unified dataset.

Data Profiling: Involves analyzing data to gain insights into its structure, content, and quality characteristics.

Data Tracking: Involves monitoring and tracing the movement and changes of data over time.

49
Q

Which among the following is not a part of Cleansing?

A) Correction
B) Parsing
C) Consolidation
D) Standarization

A

C) Consolidation

Explanation:
In the context of data cleansing:

Correction: Involves identifying and rectifying errors in data.
Parsing: Involves breaking down data into its component parts to facilitate processing.
Standardization: Involves converting data into a standard format or structure.
Consolidation: Involves combining data from multiple sources into a unified dataset, which is typically not considered a part of data cleansing but rather a separate step in data integration or management.

50
Q

Which among the following is an iterative process

A) One shot Migration
B) Two shot Migration
C) Phased Migration
D) Parallel Migration

A

C) Phased Migration

Explanation:
Phased Migration: Involves migrating systems or data in phases or stages, where each phase is completed sequentially. It is an iterative process where each phase builds upon the previous one, allowing for adjustments and improvements based on feedback and outcomes from earlier phases.

One shot Migration: Involves migrating all systems or data in a single event or over a short period, typically without breaking it down into stages or phases.

Two shot Migration: This term is not commonly used in migration strategies. It might refer to a specific scenario or approach, but it is not standard terminology in migration planning.

Parallel Migration: Involves running both old and new systems simultaneously for a period, then switching completely to the new system once it is fully operational.

51
Q

RAID provides fault tolerance against all the hardware failures.

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
RAID (Redundant Array of Independent Disks) provides fault tolerance primarily against disk failures, but not against all hardware failures. While RAID configurations can protect against single or multiple disk failures (depending on the RAID level), they do not provide protection against other types of hardware failures such as power supply failures, controller failures, or motherboard failures. RAID is focused on maintaining data availability and integrity in the event of disk failures specifically.

52
Q

Data Profiling is performed as part of

A) Data Quality Enhancement
B) Source system analysis
C) User Acceptance Testing
D) Requirements gathering

A

B) Source system analysis

Explanation:
Source system analysis: Data profiling involves examining the data in the source systems to understand its structure, quality, and content. This helps in identifying data issues, understanding data distributions, and preparing for data cleansing and transformation processes. It is a critical step in analyzing the source data before it is moved or used in any data integration or migration processes.

Data Quality Enhancement: While data profiling informs data quality enhancement, it is primarily part of analyzing the current state of data rather than the actual enhancement process.

User Acceptance Testing: This is more about validating that the system meets user requirements and specifications, not about profiling the data.

Requirements gathering: This involves identifying and documenting the data needs and requirements of the stakeholders, rather than profiling the data itself.

53
Q

Which of the following is used to design the internal schema of the database

A) Database Scripts
B) Conceptual Data Model
C) Physical Data Model
D) Logical Data Model

A

C) Physical Data Model

Explanation:
Database Scripts: These are used to create and manage database objects but do not represent the design of the internal schema.

Conceptual Data Model: This is a high-level model that defines the overall structure and organization of the data without considering how the data will be physically implemented.

Physical Data Model: This model focuses on the actual implementation of the database on the storage medium. It includes details such as tables, columns, indexes, partitions, and the relationships between these elements. It represents how data will be stored, accessed, and managed in the database system, thus designing the internal schema of the database.

Logical Data Model: This model defines the structure of the data elements and the relationships between them, independent of the physical considerations. It is more detailed than the conceptual model but still does not include physical storage details.

54
Q

True of False: Purging refers to hashing using purging algorithms

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Purging refers to the process of permanently deleting or removing data that is no longer needed from a database or storage system to free up space and improve performance. It does not involve hashing or the use of purging algorithms for hashing purposes.

Hashing is a separate process that involves converting data into a fixed-size string of characters, typically for purposes like data indexing or security. Purging, on the other hand, is specifically about data deletion and cleanup.

55
Q

While performing Data profiling, following activity/activities will be performed

A) Data Striping
B) Data Transformation
C) Data Cleansing
D) All of the options

A

The answer is

D) All of the options

Data profiling involves a comprehensive analysis of the data to understand its structure, content, and quality. This process encompasses several activities, including:

Data Cleansing: Identifying and correcting errors, inconsistencies, and missing values in the data.
Data Transformation: Transforming the data into a format suitable for analysis, which may involve formatting changes, derivations, and aggregations.
Data Striping: This term is less common in data profiling, but it could refer to extracting a subset of data for profiling purposes, especially when dealing with very large datasets.
Therefore, all of the options (A, B, and C) can be part of data profiling activities.

56
Q

Among following, the vendor/s that do/es not provide appliance server solution is/are

A) Microsoft
B) Netezza
C) Ingres
D) HP

A

C) Ingres

Here’s why:

Microsoft: While not traditionally known for appliance servers, Microsoft Azure offers cloud-based appliance services.
Netezza: Netezza is a company well-known for its data warehouse appliance solutions.
HP: As mentioned earlier, HP offers NeoView, a product line including appliance server solutions.
Ingres: While Ingres offers database software, they are not commonly associated with pre-configured appliance server solutions.

57
Q

Choose two dimensions of Data Quality

A) Uniqueness
B) Structural dimension
C) Accuracy
D) Reusability

A

A) Uniqueness
C) Accuracy

Explanation:
Uniqueness: Refers to the extent to which data entries are distinct and not duplicated. High-quality data should have minimal or no duplicates to ensure integrity and reliability.

Accuracy: Refers to how correctly the data represents the real-world values it is intended to model. Accurate data is essential for making informed decisions based on reliable information.

Structural dimension: This term does not typically refer to a dimension of data quality. It might relate to the structural integrity of data, but it is not a standard dimension of data quality.

Reusability: While important, reusability is more related to the utility and management of data rather than a direct measure of its quality.

58
Q

Which Metadata Management Architecture ensures minimal effort during integration?

A) Federated Metadata Management
B) Distributed Metadata Management
C) Centralized Metadata Management

A

A) Federated Metadata Management

Explanation:
Federated Metadata Management: This architecture allows different systems to maintain their own metadata repositories while providing a unified view and access to metadata across these systems. It ensures minimal effort during integration because it leverages existing systems and integrates them in a way that preserves their autonomy while still enabling centralized access to metadata.

Distributed Metadata Management: This approach involves multiple metadata repositories that may operate independently. While it allows for local control, it can make integration more complex due to the lack of a centralized management framework.

Centralized Metadata Management: This architecture involves consolidating metadata into a single, central repository. While it can simplify management and provide a single source of truth, it may require significant effort to integrate all metadata into one centralized system, especially in diverse environments.

59
Q

Base module within BIDS DQM framework is referred as

A) Metadata Management Framework
B) Data Quality Framework
C) DQ Framework definition
D) None of the options

A

B) Data Quality Framework

Explanation:
In the context of the BIDS (Business Intelligence Development Studio) DQM (Data Quality Management) framework, the base module typically refers to the foundational set of tools and methodologies used to manage and improve data quality. The Data Quality Framework encompasses the essential components and practices for ensuring data quality within the system.

The other options, while relevant to data quality and metadata management, are not typically referred to as the base module within the BIDS DQM framework specifically.

60
Q

Which option is not best suited for Impact Analysis?

A) Federated Metadata Management
B) Distributed Metadata Management
C) Centralized Metadata Management

A

B) Distributed Metadata Management

Explanation:
Federated Metadata Management: This approach allows different systems to maintain their own metadata repositories while providing a unified view across these repositories. It supports effective impact analysis by enabling access to metadata from multiple systems in a coherent manner.

Distributed Metadata Management: This approach involves multiple metadata repositories that may operate independently. It is not best suited for impact analysis because the lack of a centralized or unified view makes it challenging to trace and analyze the impact of changes across different systems.

Centralized Metadata Management: This approach consolidates all metadata into a single repository, providing a comprehensive and unified view, which is highly effective for impact analysis as it allows easy tracing and understanding of dependencies and relationships.

61
Q

Which of these is not a benefit of implementing Data Quality Management?

A) Reduction in number of IT systems to manage
B) Fraud Detection
C) Exposes inconsistent business processes
D) Ensures clean feeds to data mining tools

A

A) Reduction in number of IT systems to manage

Explanation:
Reduction in number of IT systems to manage: Implementing Data Quality Management (DQM) does not inherently reduce the number of IT systems to manage. DQM focuses on improving the quality of data across existing systems, rather than reducing the number of systems.

Fraud Detection: DQM can help identify anomalies and inconsistencies in data that may indicate fraudulent activities.

Exposes inconsistent business processes: By ensuring data quality, DQM can highlight areas where business processes are not consistently followed, as poor data quality often results from inconsistent processes.

Ensures clean feeds to data mining tools: DQM ensures that the data fed into data mining and analytics tools is accurate, consistent, and reliable, which is essential for generating meaningful insights.

62
Q

Inspection of Data for errors, inconsistencies and redundancies is called

A) Data Quality Enhancement
B) Data Quality Tracking
C) Data Profiling
D) Data Integration

A

C) Data Profiling

Explanation:
Data Profiling: This process involves inspecting and analyzing data to identify errors, inconsistencies, and redundancies. It helps in understanding the structure, content, and quality of data, and is a crucial step in data quality management and preparation.

Data Quality Enhancement: This refers to activities aimed at improving data quality, which can include data cleansing, transformation, and enrichment, but not the initial inspection of data.

Data Quality Tracking: This involves monitoring data quality over time to ensure it remains high, rather than inspecting data for errors.

Data Integration: This involves combining data from different sources into a unified view, but it is not specifically about inspecting data for errors or inconsistencies.

63
Q

Which Metadata contains Scheduling and Reconciliation information?

A) Process Metdata
B) Control Metadata
C) Both A and B

A

B) Control Metadata

Explanation:
Control Metadata: This type of metadata includes information related to the scheduling and management of processes, including job scheduling, workflow orchestration, and reconciliation of data. It helps manage the execution and monitoring of data processing tasks.

Process Metadata: This typically includes information about the data processing activities themselves, such as the definitions of ETL (Extract, Transform, Load) processes, data flows, and transformations. It does not specifically focus on scheduling and reconciliation.

64
Q

Data Quality Tracking involves matching, merging and linking data

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Data Quality Tracking primarily involves monitoring and assessing the quality of data over time to ensure it meets the required standards and remains accurate, complete, and consistent. It focuses on ongoing data quality management and may involve tracking metrics, issues, and improvements.

Matching, merging, and linking data are activities related to data integration and data quality improvement, rather than tracking. These activities are part of the processes used to improve data quality by resolving duplicates, consolidating records, and establishing relationships between data sets.

65
Q

Among following, which RAID option provides least protection against hard disk failure?

A) RAID 0
B) RAID 0+1
C) RAID 1
D) RAID 2
E) RAID 5

A

A) RAID 0

Explanation:
RAID 0: Provides no protection against hard disk failure. It stripes data across multiple disks for improved performance but does not offer redundancy. If any disk fails, all data in the RAID 0 array is lost.

RAID 0+1: Also known as RAID 01, this configuration mirrors the data from RAID 0 arrays. It provides some protection against disk failure by using mirroring, but it is more vulnerable to failures compared to other RAID levels that provide more robust redundancy.

RAID 1: Mirrors data across multiple disks, providing protection against a single disk failure. If one disk fails, the data is still available on the other disk.

RAID 2: Uses bit-level striping with dedicated Hamming code parity, which provides fault tolerance but is less commonly used due to its complexity and inefficiency compared to other RAID levels.

RAID 5: Provides block-level striping with distributed parity, offering protection against a single disk failure by storing parity information distributed across the disks.

66
Q

Data governance council does not provide

A) Data Quality Support
B) Support from Subject Matter Expert
C) Application Development Support
D) All of the options

A

C) Application Development Support

Explanation:
Data Quality Support: The data governance council often focuses on ensuring data quality and establishing standards and policies related to data quality.

Support from Subject Matter Expert: Data governance councils usually involve subject matter experts to provide insights and expertise on data-related issues and decisions.

Application Development Support: While the data governance council provides oversight and strategic direction for data management practices, it typically does not provide direct support for application development. Application development is usually handled by development teams and IT departments, not by the governance council.

67
Q

Which is the best approach for Data Migration

A) Custom code
B) Tool based
C) Combination of custom based and Tool based
D) Based on the migration requirements

A

D) Based on the migration requirements

Explanation:
Custom Code: This approach involves developing bespoke scripts or programs tailored to the specific needs of the migration. While it can be flexible, it might be resource-intensive and complex.

Tool-Based: Using migration tools can streamline the process with pre-built functionalities and best practices. It is often quicker and easier to implement but might have limitations based on tool capabilities.

Combination of Custom-Based and Tool-Based: This approach combines the strengths of both custom code and tools, potentially offering a more flexible and robust solution. However, it might involve more complexity in integration and maintenance.

Based on the Migration Requirements: This is the best approach because it allows you to choose the most appropriate method based on the specific needs, complexity, volume, and constraints of the migration project. It ensures that the chosen method aligns with the project’s requirements and objectives.

68
Q

Match the vendor name with the associated appliance server name (Vendor - name)

A) HP - Balanced Configuration Unit or Balanced Warehouse
B) Sun - IceBreaker
C) Sun - Greenplum
D) HP - Neo View

A

The pairings of vendor name with appliance server name are:

D) HP - Neo View

Here’s the breakdown:

IBM - “Balanced Configuration Unit” or “Balanced Warehouse”.
Ingress - IceBreaker
EMC - Greenplum

69
Q

What type of Checks examine whether the data is complete at micro level?

A) Business Rules
B) Structural Integrity
C) Transformation
D) Data Flow

A

B) Structural Integrity

Explanation:
Structural Integrity: Checks that focus on structural integrity examine whether the data adheres to predefined rules about its structure, including completeness at a micro level. These checks ensure that data is correctly formatted and adheres to the schema rules, which includes verifying that all necessary fields are populated.

Business Rules: These checks evaluate whether data meets specific business requirements and rules, often focusing on the correctness and relevance of data in the business context rather than its completeness at a micro level.

Transformation: These checks are concerned with whether data transformations are applied correctly, ensuring that data is accurately converted from one format or structure to another.

Data Flow: These checks examine the movement and transformation of data across systems, focusing on the paths and processes data undergoes rather than its completeness at a micro level.

70
Q

Which is not a challenge for data migration

A) Technical compatibility issues
B) Orphan data
C) Mapped Data
D) Source System Stabilization

A

C) Mapped Data

Explanation:

Technical Compatibility Issues: Challenges in data migration often involve ensuring that the source and target systems are compatible in terms of data formats, structures, and technologies.

*Orphan Data: *This refers to data that is left behind or not properly linked during the migration process, which can be a significant challenge.

Mapped Data: Mapped data refers to the data that has been successfully aligned or translated from the source to the target system according to the defined mapping rules. This is generally not a challenge but rather an outcome of the migration process.

Source System Stabilization: Ensuring that the source system remains stable and operational throughout the migration process can be a challenge, as disruptions can impact the migration.

71
Q

Match the vendor name with the associated appliance server name

A) IBM - Balanced Configuration Unit or Balanced Warehouse
B) Ingres - Ice Breaker
C) IBM - Greenplum
D) HP - Neo View

A

A) IBM - Balanced Configuration Unit or Balanced Warehouse - IBM offers a family of data warehousing servers called Balanced Configuration Unit (BCU).
B) Ingres - Ice Breaker - Ingres, a database management system, has “IceBreaker” as its data integration appliance.
C) IBM - Greenplum - IBM Greenplum is a data warehousing appliance specifically designed for large datasets.
D) HP - Neo View- Hewlett-Packard (HP) offered “NeoView” as a business intelligence and data visualization appliance (discontinued).

72
Q

Which among the following spelled the same but mean different?

A) Synonyms
B) Homonyms
C) Linkages

A

B) Homonyms

Explanation:
Synonyms: Words that have the same or similar meanings.
Homonyms: Words that are spelled the same and sound the same but have different meanings.
Linkages: This term does not relate to words with the same spelling but different meanings; it generally refers to connections or relationships between things.

73
Q

Metadata is useful only to the technical staff creating a data warehouse.

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
Metadata is useful to various stakeholders beyond just the technical staff creating a data warehouse:

Business Users: They can use metadata to understand the data, its source, and its meaning, which helps in making informed decisions.
Data Stewards: Metadata helps in managing data governance, ensuring data quality, and maintaining compliance.
Analysts: It assists in understanding the context, lineage, and transformations of data, which is crucial for accurate analysis.
Developers: Metadata is essential for developing, maintaining, and optimizing data processes and integrations.

74
Q

Which among the following is a component of Front Room Metadata?

A) Data Model
B) Data Structures
C) User documents
D) Security profiles

A

C) User documents

Explanation:
Front Room Metadata refers to metadata that is more oriented towards business users and less technical in nature. It helps end-users understand the data from a business perspective.

Data Model: Typically considered as part of back-room metadata, which is used by technical staff for database design and development.
Data Structures: Also part of back-room metadata, focusing on the technical aspects of how data is stored and organized.
User documents: These are designed to help business users understand and utilize the data effectively. They often include business glossaries, data dictionaries, and user guides.
Security profiles: These are more technical and related to access control and data security, which is part of the back-room metadata.

75
Q

Data architecture will include all of the following except

A) Data modeling
B) Data Storage
C) Data Security
D) All of it are included

A

D) All of it are included

Explanation:
Data architecture encompasses various components essential for the management, storage, and use of data within an organization. This includes:

Data Modeling: Creating visual representations of data entities, their relationships, and rules.
Data Storage: Defining how and where data is stored, including databases and data warehouses.
Data Security: Ensuring that data is protected against unauthorized access and breaches.

76
Q

Following role(s) is/are relevant roles in order to achieve data quality:

I) Onsite data quality developer
II) Source system expert
III) Data steward
IV) Offshore data quality developer

A) I, II and IV are correct
B) II, III and IV are correct
C) I and III are correct
D) All are correct

A

D) All are correct

Explanation:
Achieving data quality involves collaboration among various roles, each contributing unique expertise and responsibilities:

Onsite Data Quality Developer (I): Focuses on implementing data quality measures and solutions at the source location.
Source System Expert (II): Provides in-depth knowledge of the source systems, which is crucial for identifying and resolving data quality issues.
Data Steward (III): Oversees data governance and ensures that data quality standards and policies are enforced.
Offshore Data Quality Developer (IV): Works remotely to develop and implement data quality solutions, often in collaboration with the onsite team.

77
Q

In Which architecture, there is no need for maintaining bi-directional connections between various tools?

A) Federated Metadata Management
B) Distributed Metadata Management
C) Centralized Metadata Management

A

C) Centralized Metadata Management

Explanation:
In Centralized Metadata Management:

Centralized Metadata Management: All metadata is stored and managed in a single, central repository. There is no need for maintaining bi-directional connections between various tools because all tools access and interact with the central metadata repository.
Federated Metadata Management: This involves multiple repositories that communicate with each other, often requiring bi-directional connections to ensure consistency and integration.
Distributed Metadata Management: Metadata is distributed across various locations and often requires bi-directional connections to synchronize and manage metadata across different systems.

78
Q

Data Security covers not only production environment but also back-up data and archived data as well

A) TRUE
B) FALSE

A

TRUE
**
Explanation:
Data Security encompasses protecting data across all environments and storage mediums, including:

Production Environment: The active, operational systems where data is processed and used.
Back-up Data: Copies of data created to ensure data can be restored in case of data loss or corruption.
Archived Data: Data that is no longer actively used but stored for long-term retention and future reference.

79
Q

Data Matching identifies similar data only within a particular source

A) TRUE
B) FALSE

A

B) FALSE

Explanation:

Data Matching is a process that identifies similar or duplicate data across different sources or within a particular source. It involves comparing data from multiple sources to identify records that refer to the same entity, despite variations in the data.

80
Q

Data Quality Assessment deals with

A) AS-IS state
B) TO-BE state

A

A) AS-IS state

Explanation:

Data Quality Assessment focuses on evaluating the current state (AS-IS state) of data within an organization. This involves:

Assessing the accuracy, completeness, consistency, and reliability of the data as it currently exists.
Identifying data quality issues and areas for improvement.
Establishing a baseline for measuring future improvements.
TO-BE state typically refers to the desired or future state of data quality after improvements have been made, which is the target state to achieve after addressing the issues identified in the AS-IS state assessment.

81
Q

True or False: Data Storage refers to storing structured and semi-structured data only.

A) TRUE
B) FALSE

A

B) FALSE

Explanation:

Data Storage refers to storing all types of data, including:

Structured Data: Data that is organized in a fixed schema, such as databases.
Semi-Structured Data: Data that does not have a fixed schema but has some organizational properties, such as XML or JSON files.
Unstructured Data: Data that lacks a predefined structure, such as text files, images, videos, and social media content.

82
Q

Compressed data being just the way in which data is stored, does not require any specific modeling technique.

A) TRUE
B) FALSE

A

FALSE

Explanation:

Compressed Data does require specific techniques for both compression and decompression, and these techniques can impact how the data is modeled and managed.

Compression Techniques: Various algorithms and methods are used to reduce the size of data, such as lossless or lossy compression.
Modeling Techniques: The choice of compression technique can affect how data is organized, accessed, and queried. For instance, some compression methods might optimize for speed, while others might focus on reducing size.

83
Q

It is possible to minimize data quality problems in systems by having better hardware and superior RDBMS system. One needs to purchase Data Quality option available within the databases.

A) TRUE
B) FALSE

A

B) FALSE

Explanation:
While better hardware and a superior RDBMS (Relational Database Management System) can improve overall system performance and efficiency, they do not inherently resolve data quality problems. Data quality issues are typically related to data integrity, accuracy, completeness, consistency, and timeliness, which require specific data quality management practices.

To effectively address data quality issues, organizations often need to:

Implement Data Quality Tools: Specialized tools or software solutions designed for data cleansing, profiling, validation, and enhancement.
Establish Data Governance: Define policies, procedures, and standards for managing data quality.
Perform Regular Data Quality Assessments: Continuously monitor and improve data quality through assessment and correction processes.

84
Q

Which of the following data quality violations in a name CANNOT be cleansed by using rules?

A) A space in front of a name
B) Use of all capitals
C) Name misspelling
D) Inconsistent use of middle initial

A

C) Name misspelling

Explanation:
A space in front of a name: This can be cleansed by applying rules to trim leading or trailing spaces.
Use of all capitals: This can be addressed by applying rules to convert text to proper case or title case.
Name misspelling: This is typically not easily addressed by simple rules, as it requires more advanced techniques like fuzzy matching or manual correction.
Inconsistent use of middle initial: This can be standardized using rules to ensure consistent formatting of names and initials.

85
Q

A column of a table contains following information in its City column. JHUMARI TELAIYA JHUMARI TILAIYA JHUMARI TILLAIA JHUMARITALAIYA JHUMARITELAIYA JHUMARITELAY In order to clean this data, following will need to be done.

A) Data Merging
B) Data Splitting
C) Data Parsing
D) Data Standardisation

A

D) Data Standardisation

Explanation:

Data Standardisation involves aligning data to a consistent format or value to ensure uniformity. In this case, various representations of the city name need to be standardized to a single, correct format.

Data Merging: Combines data from different sources or records but does not directly address inconsistencies in data values.
Data Splitting: Divides data into multiple parts or components, which is not applicable to the issue of inconsistent naming.
Data Parsing: Breaks data into manageable pieces or components but does not address the consistency of the data itself.
Data Standardisation: Adjusts variations in data values to a common format, which is the appropriate approach to handle inconsistent representations of the same city name.

86
Q

Logical Data Issues deals with

A) Business Rules
B) Data Profile
C) Data Parsing
D) Data Storage

A

A) Business Rules

Explanation:

Logical Data Issues are concerned with how data is structured, related, and validated according to business rules and logic. They involve:

Business Rules: Ensuring that data adheres to predefined rules and constraints, such as relationships between entities, valid values, and consistency within the data model.
Data Profile: This typically deals with understanding the characteristics of data rather than resolving logical issues.
Data Parsing: Involves breaking data into components, which does not directly address logical issues.
Data Storage: Concerns the physical storage of data rather than its logical structure or rules.

87
Q

Among the following, which one provides optimal performance?

A) RAID 0
B) RAID 0+1
C) RAID 1
D) RAID 2
E) RAID 5

A

A) RAID 0

Explanation:

RAID 0 provides optimal performance among the options listed because it uses striping to split data across multiple disks, which enhances read and write speeds. However, it does not provide fault tolerance, as it offers no redundancy or data protection.

Here’s a brief overview of the other RAID levels:

RAID 0+1: Combines RAID 0 and RAID 1, providing both striping and mirroring. It offers good performance and fault tolerance but requires double the storage capacity and has higher costs.

RAID 1: Mirrors data across disks for redundancy, offering good fault tolerance but with no performance enhancement over a single disk and reduced storage efficiency.

RAID 2: Uses bit-level striping with dedicated Hamming code for error correction. It’s rarely used in practice due to complexity and lack of practical benefits over other RAID levels.

RAID 5: Uses block-level striping with distributed parity for fault tolerance. It offers good performance and redundancy but has slightly lower performance compared to RAID 0 due to the parity calculations.

88
Q

Which Architecture utilizes optimal hardware resource?

A) Federated Metadata Management
B) Distributed Metadata Management
C) Centralized Metadata Management

A

B) Distributed Metadata Management

Explanation:

Distributed Metadata Management architecture optimizes hardware resources by distributing metadata across multiple repositories or systems. This approach can:

Distribute Workloads: Balance the processing and storage load across multiple servers or locations, making efficient use of hardware resources.
Scalability: Scale out by adding more resources as needed without overloading a single system.
In contrast:

Centralized Metadata Management involves a single, central repository, which can become a bottleneck and may require significant hardware resources to handle all metadata operations.
Federated Metadata Management involves multiple metadata repositories with interconnections, but does not always optimize hardware usage as effectively as a distributed approach, as it still requires maintaining connections and synchronization between repositories.