Data Flashcards

1
Q

What is Data Architecture –

A

DA defines the blue print for managing data assets. It uses artifacts to create a view of data across an organization and promote enterprise data sharing and interoperability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reason for DA Strategy

A
  • Our problem was the VA supports thousands of data stores & recognized the existence of duplicative data sources
  • The fragmented nature of the VA’s architecture made it difficult to determine the most accurate source of information, and had the potential to cause data quality issues
  • Poor data quality exposed the VA to the risk of using bad data to make decisions, failing to comply with regulatory requirements, and allocating resource dollars to support inadequate systems and processes
  • Our assignment was to create a strategy that outlined recommended artifacts and how they linked together to help the VA identify authoritative data sources and support cost saving, data quality, and enterprise sharing and interoperability initiatives.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What did you create in DA

A
  • ECDM – decomposes the VA’s portfolio of data into subject areas and defines the relationship between the subject areas to create a picture of how data is organized within the VA
  • ELDM – defines data entities and attributes and creates common standard data definitions that provide a consistent data communication standard.
  • System entity CRUD matrix – identifies what systems touch the data entities
  • Reference Data Lists - map existing data standards to each system to promote data consistency & interoperability
  • Data Lineage Diagrams – documents the flow of data as well as creation and update points
  • Authoritative Data Source Criteria Model – model that used criteria such as historical data, update frequency, and accuracy to score candidate authoritative data source.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How did you create the DA Schedule

A

Frist I organized the project into smaller phases, with each phases tied to the creation of a recommended artifact.
• I then identify tasks within each phase that need to be performed to execute the Enterprise Data Arch Strategy
• Lastly I estimated the time duration it would take to complete each task within each Strategy Phase
• In total I recommended the creation of 24 deliverables and estimated it would take 3 years to execute this strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Management

A

is a business function comprised of 10 disciplines that supports the planning, enabling, and delivery of data and information assets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Modeling

A

creates the design and blueprint of a DB. It’s the process of structuring data and defining relationships so that a database can support business processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Modeling Process

A

1) Define Entities ; 2) Define Relationships 3) Define Attribute 4) Define cardinality 5) Assign Keys 6) Define Attribute’s Data Type 7)Normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Different Types of Data Models

A
  • Conceptual Data Models: Entity Names and Relationships
  • Logical Data Models (LDMs): Entity Names and Relationships, Attributes Name , Primary and Foreign Keys
  • Physical Data Models: Physical Table name and relationships, Column Name, PK/FK, Column Data Type,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Index

A
  • Indexes optimize query performance by creating alternate paths for accessing data
  • Without an index, the DB will read every row in the table to retrieve the requested data.
  • Indexes should support the most frequently run queries, and use the most frequently referenced keys
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Definition Language (DDL)

A

is a design deliverable for relational databases. DDL is a subset of SQL used to create tables, indexes, views, and DB structures. Example functions are Create, Alter, and Drop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Referential integrity

A

ensure valid values. For example, ―A person can exist without working for a company, but a company cannot exist unless a person is employed by the company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Design principles that should be considered when building a database:

A
  • Performance – what the maximum time a query can take to return results?
  • Availability –what’s the percentage of time a system can be used for productive work?
  • DB size – What’s the expected growth rate of the data? When can data be archived or deleted?
  • Reporting- Will users be doing ad-hoc querying and canned reporting? With which tools?
  • Reusability – Can multiple applications use the data? If so, what data and how?
  • Integrity – Does the data have a valid business meaning and value?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Possible reasons for poor database performance are:

A
  • The query optimizer not being updated with DB statics about data and insufficient indexing
  • Poor SQL coding and having SQL embedded in application code rather than stored procedures
  • Not using views to pre-define complex table joins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Difference between Data Warehouse/Operational Data Store –

A
  • ODS is designed for fast queries on transactional data, DW is designed for queries on static (current/historical) data.
  • ODS data is refreshed frequently throughout the day; the DW is refreshed once at night.
  • An ODS is like short term memory because it stores recent information
  • DW is like long term memory because it stores permanent information.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Online transaction processing (OLTP)

A

are systems that support transaction data. ( e.g., banks & airlines)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

On-Line Analytic Processing (OLAP)

A

are systems that support historical data & multi-dimensional BI reporting. (How is profit changing over the years across different regions?)

17
Q

DG Definition

A

DG is the government of data. It encompasses the people, processes, and technology needed to establish ownership and accountability of data. DG achieved through the development, execution, and enforcement of policies, standards, processes, and metrics surrounding enterprise data. Every DG program is unique because of the unique characteristics of the organization and their culture.

18
Q

Data Governance Drivers

A
  • From a compliance perspective defined roles determine “Who’s accountable for regulatory data? and DG policy define minimum compliance requirements throughout the organization.
  • From an efficiency standpoint common terminology facilitates information sharing and discovery, and the identification of authoritative data sources ensures business processes are being supported by the highest quality data (Improvised DQ better risk models to be built)
  • From a cost reduction standpoint defined SLAs and improved data processes will minimize the # of DQ issues and the cost to remediate those DQ issues. The Improved DQ will also cut costs by avoiding rework.
19
Q

5 Phases of a Data Governance

A
  • Identify – Leverage regulatory requirements, audit findings, & DQ issues to define DG vision and obtain funding
  • Diagnose – Conduct DG assessment to define current and target maturity levels and define the project scope
  • Design – Develop list of KDE, establish DG roles, data domains, ADS, policies, standards, processes & metrics
  • Deliver – Establish DG bodies, rollout DG policy & standards, processes & conduct training, incorporate SDLC
  • Sustain – Monitor scorecards to assess effectiveness of DG program; adjust processes and roles accordingly
20
Q

Data domains

A
  • We recommended organizations organize their data by data domains rather than LOB.
  • The LOB approach made determining accountability difficult because multiple LOBs usually share similar data.
  • If drink data is handled by the juice, alcohol, & water LOBs no 1 LOB would have clear accountability over drink data
  • Data Domains categorize data by final destination rather than location of origination
  • Domains are mapped to business functions and each domain must be manageable across the enterprise
  • Because of this Data Domains are very effective in assigning data ownership across organizations.
21
Q

Data Governance Maturity Models – how to conduct assessment?

A
  • A DGA provide criteria & questionnaires that allow organizations to evaluate DM goals against best practices
  • An industry standard (EDMC) technique should be used to evaluate DM effectiveness across the organization.
  • The assessment involves interview stakeholders (BU heads, IT & Operations personnel) and completing questionnaires
  • Maturity is evaluated by looking at the consistency of how processes are applied across business units and across teams.
  • Gaps should be analyzed and leveraged to create a roadmap; and findings should be presented to key stakeholders
  • Sample questions: To what degree are metadata creation polices enforced? To what degree are DQ policy defined?
22
Q

Maturity Levels

A
  • Level 1 –There is no formal Data Governance plan.
  • Level 2 –The need for DG program has been recognized and is being planned.
  • Level 3 –Polices and standards governed at LOB level.
  • Level 4 – Roles and responsibilities have been defined.
  • Level 5 – Policies and Standards are integrated across the organization.
  • Level 6 – Governance processes are adjusted based on metrics.
23
Q

Governance Roles

A
  1. Executive Committee – executive stakeholders that sponsor, fund, & define vision for the DG program
  2. Chief Data Officer – Executive committee member that heads the Executive Data Office
  3. DG Committee –Stakeholders from all business units responsible for developing and enforcing DG functions.
  4. Data Owner – individual held accountable for the data being compliant
  5. Data Steward – Are found not made. Business SME that oversee DQ activity & define KDE, KPI, & DQ thresholds
  6. Data Custodian – Technical owner of data. Responsible for physical maintenance of systems.
  7. Data User – individual who directly accesses enterprise data
  8. Business Process Owners - approves KDEs & business rules. They also provide key
24
Q

Phases of a Data Quality Project

A
  • Plan – define in-scope processes/reports, identify SME, and define DQ reporting architecture
  • Define – Identify KDE, define business rules, & map each business rule to 1 of the 7 DQ dimensions.
  • Assess – develop & execute DQ tests, generate DQ scorecard, and prioritize defects
  • Remediate – Develop remediation strategy, cleanse the data, validate fixes with SME, & build DQ controls
  • Sustain - Implement a DG framework that defines DG/DQ functions and monitors them
25
Q

Key Data Element

A

data elements that impact the financial statement, external reporting, or critical decision making