L5 - Information-oriented System Integration Flashcards
What is the goal of the lecture?
To explain an overview and requirements of information-oriented system integration and the main existing solutions.
What are the three existing IoSI solutions for enterprise system integration?
Data Warehousing
Data Federation
Data Replication
What is metadata?
Metadata is data about data (information about the actual data).
What is a schema in databases?
A schema is a blueprint that defines the structure of database components such as tables, attributes, and data types.
What is a schema in XML?
A schema is a set of rules defining allowed elements, attributes, structures, and data types in an XML document.
What are data types in databases?
Data types define the type of data that can be stored in a column, such as char, varchar, int, tinyint, float, decimal, and money.
What is latency?
Latency is the time delay when data is sent from one source to a destination.
What is Information-oriented System Integration (IoSI)?
IoSI is an integration approach based on the exchange of simple data between systems, usually between databases.
Why is IoSI needed?
Enterprises process large amounts of data from various sources, which may use different tools, technologies, and formats, making integration necessary.
What are the advantages of IoSI?
Easy to understand and develop
Standardized API support, such as ODBC
What are the disadvantages of IoSI?
Deals only with data sharing
Does not handle business logic, states, or behaviors
Not sufficient for functional or behavior integration
What is the first requirement for IoSI?
Represent data in a canonical format, such as XML, to ensure interoperability.
What is the second requirement for IoSI?
Maintain exchanged data using metadata to enable efficient interoperability.
What is the third requirement for IoSI?
Facilitate data integration in real-time.
What are the three data latency levels in IoSI?
Real-time integration
Near real-time integration
Batch processing (non-real-time)
What is a schema conflict in data integration?
A schema conflict occurs when tables containing the same concept are structured differently, e.g., one table includes an “email” attribute while another does not.
What is a naming conflict in data integration?
A naming conflict occurs when different terms are used for the same metadata, such as “telephone” vs. “phone_number” vs. “TeleF”.
What is a data representation conflict in data integration?
A data representation conflict occurs when different formats are used for the same data, such as “Donald Trump” vs. “D. Trump”.
What is a precision conflict in data integration?
A precision conflict occurs when different data types are used for the same data, such as Double vs. Money vs. Float vs. Int.
What is data warehousing?
A batch information integration technique that extracts, transforms, and loads data from multiple databases into a single repository.
What is data federation?
A real-time information integration technique that allows querying disparate databases through a unified schema without copying data.
What is data replication?
A near-real-time integration technique where data is copied between databases at specific intervals.
What are the key features of data warehousing?
Extracts, transforms, and loads data (ETL).
Stores data in a single database.
Uses dimensional data models with fact and dimension tables.
What are the advantages of data warehousing?
Standardized ETL services make it easy to implement and maintain.
Low cost and reliable as it integrates data into one place.
What are the disadvantages of data warehousing?
Data freshness issues due to batch updates.
Data conflicts must be resolved.
Primarily designed for batch analysis.
What are the key features of database federation?
Provides seamless access to multiple databases.
Uses a meta-database (reference database).
Queries are transformed and executed across multiple sources.
What are the advantages of database federation?
Provides real-time access to original data sources.
Enables users to create integrated views of data.
What are the disadvantages of database federation?
Requires resolving data conflicts.
Needs continuous synchronization and maintenance.
Requires separate tools for implementation.
What are the key features of data replication?
Copies data between databases to ensure consistency.
Uses a master-slave database system.
Can be implemented with different replication modes.
What are the advantages of data replication?
Low-cost integration method.
Simple to develop and implement.
Improves data availability and performance.
What are the disadvantages of data replication?
Increases network overhead and storage requirements.
Requires high maintenance and synchronization.
Does not resolve schema conflicts.
What is full replication?
Copies all data from the master to the slave databases, including new, updated, and existing data.
What is key-based incremental replication?
Copies only data that has changed since the last replication by scanning keys or indexes.
What is log-based incremental replication?
Copies only changed data by scanning the master database’s log files.
What is snapshot replication?
Copies a snapshot of the master database at a specific point in time, without tracking changes.
What is merge replication?
Merges multiple master databases into a single database.