Data Flashcards
What is Data Architecture –
DA defines the blue print for managing data assets. It uses artifacts to create a view of data across an organization and promote enterprise data sharing and interoperability
Reason for DA Strategy
- Our problem was the VA supports thousands of data stores & recognized the existence of duplicative data sources
- The fragmented nature of the VA’s architecture made it difficult to determine the most accurate source of information, and had the potential to cause data quality issues
- Poor data quality exposed the VA to the risk of using bad data to make decisions, failing to comply with regulatory requirements, and allocating resource dollars to support inadequate systems and processes
- Our assignment was to create a strategy that outlined recommended artifacts and how they linked together to help the VA identify authoritative data sources and support cost saving, data quality, and enterprise sharing and interoperability initiatives.
What did you create in DA
- ECDM – decomposes the VA’s portfolio of data into subject areas and defines the relationship between the subject areas to create a picture of how data is organized within the VA
- ELDM – defines data entities and attributes and creates common standard data definitions that provide a consistent data communication standard.
- System entity CRUD matrix – identifies what systems touch the data entities
- Reference Data Lists - map existing data standards to each system to promote data consistency & interoperability
- Data Lineage Diagrams – documents the flow of data as well as creation and update points
- Authoritative Data Source Criteria Model – model that used criteria such as historical data, update frequency, and accuracy to score candidate authoritative data source.
How did you create the DA Schedule
Frist I organized the project into smaller phases, with each phases tied to the creation of a recommended artifact.
• I then identify tasks within each phase that need to be performed to execute the Enterprise Data Arch Strategy
• Lastly I estimated the time duration it would take to complete each task within each Strategy Phase
• In total I recommended the creation of 24 deliverables and estimated it would take 3 years to execute this strategy.
Data Management
is a business function comprised of 10 disciplines that supports the planning, enabling, and delivery of data and information assets.
Data Modeling
creates the design and blueprint of a DB. It’s the process of structuring data and defining relationships so that a database can support business processes.
Data Modeling Process
1) Define Entities ; 2) Define Relationships 3) Define Attribute 4) Define cardinality 5) Assign Keys 6) Define Attribute’s Data Type 7)Normalization
Different Types of Data Models
- Conceptual Data Models: Entity Names and Relationships
- Logical Data Models (LDMs): Entity Names and Relationships, Attributes Name , Primary and Foreign Keys
- Physical Data Models: Physical Table name and relationships, Column Name, PK/FK, Column Data Type,
Index
- Indexes optimize query performance by creating alternate paths for accessing data
- Without an index, the DB will read every row in the table to retrieve the requested data.
- Indexes should support the most frequently run queries, and use the most frequently referenced keys
Data Definition Language (DDL)
is a design deliverable for relational databases. DDL is a subset of SQL used to create tables, indexes, views, and DB structures. Example functions are Create, Alter, and Drop.
Referential integrity
ensure valid values. For example, ―A person can exist without working for a company, but a company cannot exist unless a person is employed by the company.
Design principles that should be considered when building a database:
- Performance – what the maximum time a query can take to return results?
- Availability –what’s the percentage of time a system can be used for productive work?
- DB size – What’s the expected growth rate of the data? When can data be archived or deleted?
- Reporting- Will users be doing ad-hoc querying and canned reporting? With which tools?
- Reusability – Can multiple applications use the data? If so, what data and how?
- Integrity – Does the data have a valid business meaning and value?
Possible reasons for poor database performance are:
- The query optimizer not being updated with DB statics about data and insufficient indexing
- Poor SQL coding and having SQL embedded in application code rather than stored procedures
- Not using views to pre-define complex table joins
Difference between Data Warehouse/Operational Data Store –
- ODS is designed for fast queries on transactional data, DW is designed for queries on static (current/historical) data.
- ODS data is refreshed frequently throughout the day; the DW is refreshed once at night.
- An ODS is like short term memory because it stores recent information
- DW is like long term memory because it stores permanent information.
Online transaction processing (OLTP)
are systems that support transaction data. ( e.g., banks & airlines)