Enterprise Data Management Flashcards
Within data migration phases, Extraction, Transformation and Load scripts are deliverables of
A) Develop Programs and Testing
B) Data Extraction
C) Data Transfer
D) Data Transformation
A) Develop Programs and Testing
This is because ETL scripts involve developing the necessary code and logic to extract data from source systems, transform it into the required format, and load it into target systems. This development and subsequent testing ensure that the data migration process is accurate and efficient.
Which of the following is used to migrate data
A) DataClean
B) ETI Extract
C) Trillium
D) Oracle Warehouse Builder
C) Trillium
*D) Oracle Warehouse Builder**
Here’s why the other options are less likely:
DataClean is typically used for data cleansing, which focuses on improving data quality by identifying and correcting errors or inconsistencies. While data cleaning can be part of the data migration process, it’s not the core function.
ETI Extract isn’t a commonly recognized term in data migration. “Extract” is a stage in the data migration process (Extract, Transform, Load) but it wouldn’t be a standalone tool name.
Oracle Warehouse Builder is a data warehousing tool, not specifically designed for data migration. While it can be used for data movement tasks, it’s not the primary purpose.
Trillium, on the other hand, is a software suite from various vendors that offers data integration and migration capabilities. It’s a more fitting choice for the scenario.
It’s important to note that there are many other data migration tools available, and the best choice depends on the specific requirements of the migration project.
Which of the following is not a valid way for providing DQM in Source-to-Target Architecture?
A) DQM at Source
B) DQM as a part of External processes
C) DQM as a part of ETL process
D) DQM at Target
B) DQM as a part of External processes
Data Quality Management should be integrated directly within the source, ETL process, or target to ensure data integrity and quality throughout the data migration process. External processes that are not part of the ETL or data handling pipeline may not effectively manage and ensure data quality within the Source-to-Target Architecture.
Architecture of the data migration solution should include (Choose all which are applicable)
A) Ability to block data access to source environments
B) Ability to handle data cleansing requirements
C) Ability to handle document regeneration
D) Ability to perform audits for Reconciliation
B) Ability to handle data cleansing requirements
D) Ability to perform audits for Reconciliation
Explanation:
B) Ability to handle data cleansing requirements:
Ensuring data quality is crucial in data migration. The architecture should include mechanisms for data cleansing to correct errors and inconsistencies before loading into the target system.
D) Ability to perform audits for Reconciliation:
The ability to audit and reconcile data ensures that the migration process is accurate and complete. This helps in verifying that the data in the target environment matches the source data post-migration.
Not applicable:
A) Ability to block data access to source environments:
While security and access control are important, blocking data access to source environments is not typically a requirement of the migration architecture itself. This might be more relevant to security policies rather than the architecture of the migration solution.
C) Ability to handle document regeneration:
Document regeneration is not usually a core requirement of data migration solutions. Data migration primarily focuses on moving data rather than regenerating documents, which could be a separate process outside the migration scope.
Which of these is mainly concerned with the aggregate results of movement of data from source to target?
A) Completeness
B) Validity
C) Data Flow
D) Business Rules
A) Completeness
Explanation:
A) Completeness:
Completeness refers to ensuring that all the expected data has been successfully moved from the source to the target. It checks that no data is missing during the migration process, thereby focusing on the aggregate results of the data movement.
Not applicable:
B) Validity:
Validity focuses on whether the data conforms to defined formats, types, and ranges.
C) Data Flow:
Data Flow refers to the movement and transformation of data through the different stages of the ETL process, but not necessarily the aggregate results.
D) Business Rules:
Business Rules are specific criteria that data must meet during the migration process, often used in the transformation phase to ensure the data aligns with business requirements.
Which are the reports generated out of data analysis and Profiling?
A) Issue Registers
B) Monitors
C) Matrices
D) Mappings
A) Issue Registers
Explanation:
A) Issue Registers:
Issue registers are documents that list problems or discrepancies identified during data analysis and profiling. They help track data quality issues and guide efforts to resolve them.
Not applicable:
B) Monitors:
Monitors refer to tools or processes used to continuously observe data or systems, but they are not reports generated from data analysis and profiling.
C) Matrices:
Matrices are often used for mapping and displaying relationships or comparisons, but they are not specifically reports generated from data profiling.
D) Mappings:
Mappings refer to the definition of how data fields from the source map to the target fields in data migration, not a report generated from data profiling.
Which of these is a benefit of implementing metadata management?
A) Facilitates change management
B) Creates an agile business platform
C) Helps in preventing data lineage reporting and analysis
D) Enables Service Oriented Architecture
A) Facilitates change management
Explanation:
A) Facilitates change management:
Metadata management helps in understanding the structure, relationships, and dependencies of data across the enterprise. This understanding is crucial for managing changes effectively, as it allows for the assessment of impact, better planning, and smoother implementation of changes.
Not applicable:
B) Creates an agile business platform:
While metadata management can contribute to agility, it is not the primary benefit directly associated with implementing metadata management.
C) Helps in preventing data lineage reporting and analysis:
This is incorrect because metadata management actually supports and enhances data lineage reporting and analysis by providing detailed information about data origins, movements, and transformations.
D) Enables Service Oriented Architecture:
Enabling Service Oriented Architecture (SOA) is related to designing and implementing services, which is not a direct benefit of metadata management. Metadata management may support SOA by providing detailed data definitions and relationships, but it is not the primary benefit.
Physical model is created by the following role
A) Database Administrator
B) Database Architect
C) Information Architect (IA)
D) Enterprise Architect (EA)
A) Database Administrator
Explanation:
A) Database Administrator (DBA):
DBAs are responsible for designing the physical data models based on logical data models created by database architects or information architects. They translate the logical data models into physical database designs, including schema definitions, indexing strategies, storage allocation, etc.
Not applicable:
B) Database Architect:
Database architects usually focus on designing logical data models and overall database structures rather than the physical implementation.
C) Information Architect (IA):
Information architects design the overall information architecture of an organization, focusing on data integration, governance, and strategy rather than physical database design.
D) Enterprise Architect (EA):
Enterprise architects design and oversee the entire IT architecture of an organization, including systems, applications, and infrastructure, but they are less involved in the detailed physical database design.
BIDS DQM is a modular approach for building problems
A) TRUE
B) FALSE
A) TRUE
Explanation:
BIDS (Business Intelligence Development Studio) DQM (Data Quality Management) is indeed a modular approach for building data quality rules and handling data quality issues within Microsoft’s BI (Business Intelligence) tools environment. This approach allows for flexible customization and integration of data quality processes into BI solutions.
In the DQM Methodology, DQM Framework governs (choose two)
A) Data Cleansing
B) Data Re-migration
C) Data Destruction
D) Data Quality Specification
A) Data Cleansing
D) Data Quality Specification
Explanation:
A) Data Cleansing:
The DQM Framework typically includes processes and rules for data cleansing, which is essential for ensuring data quality by correcting errors and inconsistencies.
D) Data Quality Specification:
The DQM Framework governs the specification of data quality requirements and standards, ensuring that data meets specified criteria for accuracy, completeness, consistency, etc.
Not applicable:
B) Data Re-migration:
Data re-migration refers to the process of migrating data again due to previous migration issues or changes in requirements. This is typically not directly governed by the DQM Framework.
C) Data Destruction:
Data destruction involves securely removing data that is no longer needed, which is more related to data lifecycle management and security policies rather than data quality management specifically.
ETL and target database systems do not have access to update source data. When doing data cleansing at the source, if bad data is encountered, it has to be deleted from the source system.
A) TRUE
B) FALSE
A) TRUE
Explanation:
When ETL (Extract, Transform, Load) processes and target database systems do not have direct access to update source data, and data cleansing is performed at the source:
If bad data is encountered during data cleansing, it typically needs to be deleted from the source system to ensure that only cleansed and correct data is extracted and loaded into the target system. This ensures data integrity and accuracy throughout the ETL process.
CITY column of a table contains information such as Bangalore, Bangalore-64, Bangalore-560001, Mumbai-400002 etc. In order to have just city information in the column following will need to be done
A) Data Merging
B) Data Splitting
C) Data Parsing
D) Data Mapping
C) Data Parsing
Explanation:
C) Data Parsing:
Data parsing involves extracting relevant parts of data from a larger string or field. In this case, parsing would be used to extract only the city name portion from entries that contain additional information like postal codes or codes after a dash. This process helps in standardizing the data format within the CITY column to contain only the city names.
True or False: Data Profiling is not a part of the Data Migration Methodology
A) TRUE
B) FALSE
B) FALSE
Explanation:
Data profiling is indeed a part of the Data Migration Methodology. It involves analyzing and assessing the source data to understand its structure, quality, and characteristics before migration. This analysis helps in planning and executing the migration process effectively, ensuring that data integrity and quality are maintained throughout. Therefore, data profiling plays a crucial role in the initial stages of data migration methodology.
True or False: Parallel running strategy eliminates the problem of having dependencies between systems
A) TRUE
B) FALSE
B) FALSE
Explanation:
Parallel running strategy involves running both old and new systems simultaneously for a period during the transition phase of a system upgrade or migration. While it helps in validating the new system and ensuring continuity of operations, it does not inherently eliminate dependencies between systems. Dependencies can still exist, especially if data or processes need to synchronize or integrate between the old and new systems during parallel running. Dependency management remains crucial even with parallel running to ensure smooth transition and eventual decommissioning of the old system.
Which Metadata spans across the BI Technical Metadata and the Business Metadata?
A) Counterpoint Metadata
B) Back-Room Metadata
C) Front-Room Metadata
A) Counterpoint Metadata
Data Profiling program includes
A) Issue Register Maintenance
B) Analyzing Relationships
C) Threshold analysis of certain fields
D) Cleaning the Incorrect data
B) Analyzing Relationships
C) Threshold analysis of certain fields
Explanation:
B) Analyzing Relationships:
Data profiling involves examining relationships between different data elements to understand dependencies and associations within the dataset.
C) Threshold analysis of certain fields:
Threshold analysis involves setting criteria or thresholds for certain data fields to identify outliers, anomalies, or data quality issues based on predefined rules or thresholds.
Not applicable:
A) Issue Register Maintenance:
Issue register maintenance is typically a separate process for managing and tracking data quality issues identified during data profiling, rather than a direct component of data profiling itself.
D) Cleaning the Incorrect data:
Cleaning incorrect data is part of data cleansing, which is a subsequent step after data profiling identifies data quality issues. It is not typically considered part of the data profiling program itself.
The rule-based approach to Data clean-up includes (Choose all which are applicable)
A) Suggestive rules
B) Detective rules
C) Corrective rules
D) Derivative rules
B) Detective rules
C) Corrective rules
Explanation:
B) Detective rules:
Detective rules are used to identify and detect data quality issues, anomalies, or inconsistencies within the dataset.
C) Corrective rules:
Corrective rules are applied to clean, correct, or standardize the data based on predefined rules or criteria.
Not applicable:
A) Suggestive rules:
Suggestive rules typically provide recommendations or suggestions rather than enforcing specific actions for data clean-up.
D) Derivative rules:
Derivative rules are usually used to derive new data or metrics from existing data rather than directly related to data clean-up processes.
Data Archival and Data Backup are synonymous—both are used for the same purpose of storing a primary copy of data.
A) TRUE
B) FALSE
B) FALSE
Explanation:
Data Archival and Data Backup serve different purposes:
Data Backup:
Data backup is the process of creating copies of data to protect against data loss due to hardware failure, accidental deletion, or other disasters. Backups are typically used for recovery purposes and are often stored temporarily or periodically updated.
Data Archival:
Data archival involves moving data that is no longer actively used but needs to be retained for compliance, historical, or business reasons to a separate storage location. Archival data is stored for long-term retention and retrieval, often in a different storage tier optimized for cost-efficiency and access frequency.
While both involve storing data copies, they serve distinct purposes related to data protection and long-term storage needs.
Select all the Data Quality and Data Profiling Tools among the following
A) First Logic
B) Informatica Data Explorer
C) Powercenter
D) Data Flux Power Studio
B) Informatica Data Explorer
D) Data Flux Power Studio
Data profiling tools help you understand the structure, content, and format of your data, while data quality tools cleanse and improve the accuracy of your data.
First Logic is a software development company and doesn’t specialize in data quality or profiling.
Powercenter is an ETL (Extract, Transform, Load) tool from Informatica and doesn’t focus on data profiling.
Federated Metadata Management ensures relative autonomy for local repositories
A) TRUE
B) FALSE
A) TRUE
Explanation:
Federated Metadata Management allows for relative autonomy of local metadata repositories while enabling centralized management and governance. This approach supports distributed systems and organizations by allowing local repositories to maintain control over their metadata while facilitating interoperability and unified access across the enterprise.
Which among the following is a data management practice that characterizes the content, quality, and structure of your data
A) Data Enrichment
B) Data Caving
C) Data Refinement
D) Data Profiling
D) Data Profiling
Explanation:
Data Profiling involves analyzing and assessing the content, quality, and structure of data within a dataset. It helps in understanding data characteristics such as completeness, consistency, accuracy, and relationships between data elements. This practice is essential for data management and ensuring data meets organizational standards and requirements.
Enterprise Data Management
The best place to implement data quality checks is
A) Source systems
B) ETL
C) Target Systems (Example: Data Warehouse)
D) All of the options
D) All of the options
Explanation:
Implementing data quality checks at various stages ensures comprehensive data quality management throughout the data lifecycle:
A) Source systems: Implementing data quality checks at the source ensures that data entering the system is accurate and consistent from the outset.
B) ETL (Extract, Transform, Load): Implementing data quality checks during ETL processes ensures that data transformations maintain data quality and integrity.
C) Target Systems (e.g., Data Warehouse): Implementing data quality checks in the target system ensures that data stored in the data warehouse or target database meets quality standards and is suitable for reporting and analysis.
By implementing data quality checks across all these stages, organizations can ensure that data is reliable, consistent, and accurate throughout its lifecycle, from acquisition to consumption.
Who is responsible for Data management and Data Quality from a business perspective
A) Data Analyst
B) Data Steward
C) Information Architect
D) Data Keeper
B) Data Steward
Explanation:
B) Data Steward: Data stewards are responsible for overseeing the management, quality, and governance of data within an organization. They ensure that data meets organizational standards, policies, and regulatory requirements. Data stewards collaborate closely with business users, data analysts, and IT teams to improve data quality, integrity, and usability.
While data analysts may analyze data and information architects design data structures, the primary responsibility for managing and ensuring the quality of data usually falls to data stewards within an organization.
Which of the following is\are true about Automated data profiling tools?
A) Generates summary views
B) Enables effective decision-making
C) Can create appropriate cleansing rules
D) Result Interpretation requires the involvement of IT administrators
A) Generates summary views
C) Can create appropriate cleansing rules
Explanation:
A) Generates summary views: Automated data profiling tools generate summary views of data characteristics such as data distributions, patterns, completeness, and quality metrics. These summaries help users quickly understand the overall state of their data.
C) Can create appropriate cleansing rules: Many automated data profiling tools can analyze data patterns and anomalies to suggest or create appropriate cleansing rules. This capability helps in automating the data cleansing process based on identified issues.
Not applicable:
B) Enables effective decision-making: While data profiling is essential for providing insights into data quality and structure, the effectiveness of decision-making depends on how well these insights are interpreted and utilized by business users and stakeholders, not necessarily requiring IT administrators.
D) Result Interpretation requires the involvement of IT administrators: Interpretation of data profiling results can involve various stakeholders, including business analysts, data stewards, and IT administrators. However, it’s not exclusively limited to IT administrators; it depends on the organization’s data governance and management practices.
Which among the following is not an advantage of custom code (select as many as possible)
A) Low cost
B) Easily adaptable to changes
C) Optimization of programs
D) high auditing capabilities
A) Low cost
D) High auditing capabilities
Here’s the breakdown of the options regarding advantages of custom code:
Low cost (A): Not necessarily an advantage. Custom code development can be expensive due to developer time and ongoing maintenance.
Easily adaptable to changes (B): An advantage. Custom code can be tailored to specific needs and easily modified as requirements evolve.
Optimization of programs (C): An advantage. Developers can fine-tune custom code for performance and efficiency.
High auditing capabilities (D): Not necessarily an advantage. While custom code can be audited, it can also be complex and time-consuming to understand compared to pre-built solutions with readily available documentation.
Match the vendor name with the associated appliance server name
A) IBM—Balanced Configuration Unit or Balanced Warehouse
B) Ingres—IceBreaker
C) IBM—Greenplum
D) HP—Neo View
All of them
A) IBM - Balanced Configuration Unit or Balanced Warehouse - IBM offers a family of data warehousing servers called Balanced Configuration Unit (BCU).
B) Ingres - Ice Breaker - Ingres, a database management system, has “IceBreaker” as its data integration appliance.
C) IBM - Greenplum - IBM Greenplum is a data warehousing appliance specifically designed for large datasets.
D) HP - Neo View - Hewlett-Packard (HP) offered “NeoView” as a business intelligence and data visualization appliance (discontinued)
RAID provides fault tolerance against only the disk failures.
A) TRUE
B) FALSE
B) FALSE
Explanation:
RAID (Redundant Array of Independent Disks) provides fault tolerance against disk failures as well as other types of failures, depending on the RAID level used:
Disk failures: RAID protects data against disk failures by storing data redundantly across multiple disks. If a disk fails, data can be reconstructed from the remaining disks.
Other failures: Depending on the RAID level (such as RAID 1, RAID 5, RAID 6, etc.), RAID can also provide protection against other types of failures, such as controller failures, power failures, and even multiple disk failures in certain configurations (e.g., RAID 6).
Therefore, RAID is not limited to protecting against disk failures alone but can enhance data availability and fault tolerance against a range of potential failures within a storage system.
Data Consolidation will have a major impact on
A) Sufficiency
B) Latency
C) Uniqueness
D) Consistency
D) Consistency
Explanation:
Consistency: Data consolidation involves bringing together data from different sources or systems into a single, unified repository. Ensuring consistency across this consolidated data—ensuring that data is accurate, up-to-date, and matches across all sources—is crucial for maintaining data integrity and reliability.
While data consolidation can indirectly affect sufficiency (ensuring enough data is gathered), latency (time delays in data processing), and uniqueness (ensuring data uniqueness and deduplication), consistency is directly impacted because the consolidation process aims to eliminate discrepancies and ensure uniformity across the integrated datasets.
Which is the process of transferring data from online to offline storage
A) Backup
B) Recovery
C) Archiving
D) Purging
The process of transferring data from online to offline storage is:
C) Archiving
Explanation:
Archiving: Archiving involves moving data that is no longer actively used but needs to be retained for long-term storage, regulatory compliance, or historical purposes to offline or secondary storage systems. This process helps free up primary storage space while ensuring data remains accessible when needed.
ETL and target database systems do not have access to update source data. When doing data cleansing at the source, if bad data is encountered, it has to be scrubbed or cleaned manually or by applying some rules.
A) TRUE
B) FALSE
A) TRUE
Explanation:
When ETL (Extract, Transform, Load) processes and target database systems do not have direct access to update source data, any data cleansing or scrubbing required must be done at the source system. This typically involves manual intervention or applying automated rules to clean or correct the bad data before it is extracted and loaded into the target system. This ensures that only cleansed and accurate data is transferred and stored in the data warehouse or target database.
Which of the following is not part of the Enterprise Data Management Framework?
A) Data Quality Management
B) Metadata Management
C) Corporate Performance Management
D) Master Data Management
C) Corporate Performance Management
Explanation:
Data Quality Management: This involves ensuring the accuracy, completeness, consistency, and reliability of data across the enterprise.
Metadata Management: This involves managing data about data, which helps in understanding, tracking, and using data effectively across the organization.
Master Data Management: This involves managing the critical data of an organization to ensure a single, consistent point of reference.
Corporate Performance Management: This is more about managing and monitoring an organization’s performance rather than managing data itself. It focuses on business metrics, performance indicators, and strategic management, which is outside the core scope of Enterprise Data Management.
RAID stands for
A) Redundant Array of Inexpensive Disks
B) Redundant Array using Inexpensive Disks
C) Redundancy Array of Inexpensive Disks
D) Replicated Array using Inexpensive Disks
A) Redundant Array of Independent Disks
Which of the following statements are false? (Choose all which are applicable)
A) The migration process can be reused
B) If a tool does not have source system access privileges, then PULL technology is used
C) The architecture should have the ability to handle document regeneration
D) Generally a Target Look Alike is used for metadata creation and capture
Statement B remains entirely true.
Statement A can be true depending on the reusability of the data quality checks within the migration process.
Statements C and D are less relevant to core DQM architecture.
A) The migration process can be reused - This can still be TRUE or FALSE. Reusability depends on the data involved and the DQM practices implemented. Standardized data structures and transformations can promote reusability within the DQM framework.
B) If a tool does not have source system access privileges, then PULL technology is used - This remains TRUE. DQM tools rely on access to source data for quality assessment. If PULL isn’t possible due to permissions, PUSH would be the alternative.
C) The architecture should have the ability to handle document regeneration - For DQM, this becomes less relevant. The focus is on data quality, and document regeneration might be a secondary concern depending on the data types involved. However, the DQM architecture might need to handle metadata associated with documents.
D) Generally a Target Look Alike is used for metadata creation and capture - This is still FALSE. A TLA is primarily for testing data transformations and target system functionality, not core to DQM metadata creation. DQM tools typically have their own mechanisms for metadata capture from source systems.
Select any two types of Data Consolidation
A) House grouping
B) Business grouping
C) House holding
D) Business holding
The two types of Data Consolidation are:
C) House holding
B) Business grouping
Explanation:
House holding: This involves consolidating data based on households, which is commonly used in contexts like customer data management to aggregate data at the household level rather than the individual level.
Business grouping: This involves consolidating data based on business entities, which is used to aggregate data at the business unit or organizational level to provide a comprehensive view of business performance.
Not applicable:
House grouping: This is not a recognized term in data consolidation.
Business holding: This is not a recognized term in data consolidation.
Which process verifies that the source field threshold is not subject to truncation during the transformation or loading of data
A) Source to target counts
B) Source to target data reconciliation
C) Field to Field verification
D) Domain counts
C) Field to Field verification
Explanation:
Field to Field verification: This process involves comparing the specific fields in the source data with the corresponding fields in the target data to ensure that data has been accurately transformed and loaded without truncation or data loss. It checks the field lengths, data types, and values to verify that the transformation process has not caused any truncation issues.