F5 Flashcards
Metadata
Data about data (information about the actual data)
Schema - In databases
it is like a blueprint that defines the structure of database components, such as tables, attributes, data types, etc.
Schema - in XML
Set of rules defining allowed elements, attributes, structures, and data types in XML document
Data types - in databases
types of data that could be stored in a column, e.g., char, varchar, int, tinyint, float, decimal, money, etc.
Latency
the time delay when data is sent from one source to a destination.
IoSI
Information-oriented System Integration (IoSI) is an integration approach based on the exchange of simple information (data) between systems, usually
between databases
Advantages/Disadvantages of IoSI:
Advantages:
- The approach is the easiest to
understand and develop.
- Most sources/targets, especially
databases, use standardized
API, originating from Open
DataBase Connection (ODBC)
initiative.
Disadvantages:
- Dealing with only data sharing
- Does not deal with business
logics, states, behavior, etc.
- An IoSI will not suffice where
functional and / or behavior
integration is needed.
Requirements for IoSI:
Requirement 1: Represent data in a canonical (approved) format:
– Each system needs to be mapped only once to the
canonical format.
– The canonical format should not be proprietary.
– A candidate: eXtensible Markup
Language, XML*
Requirement 2: Maintain the exchanged data using their metadata. Common metadata
enables an efficient interoperability among the systems because the generic data
meanings and mappings can be achieved
Requirement 3: Facilitate the data integration in real-time.
IoSI issues
– Schema conflicts (column/attribute name differences): (Name, Age, Sex) vs. (Name, Age, Sex, e-mail)
– Naming conflicts (telephone vs. phone_number vs. TeleF)
– Data representation conflicts (Donald Trump vs. D. Trump)
– Precision conflicts (Double vs. Money vs. Float vs. Int)
IoSI solutions
Data warehousing:
It solves the need for a unified analysis of disparate data.
The information is obtained by extracting, transforming and
loading the extracts of operational data from different databases to a single database (“warehouse”)
Data federation:
Data (database) federation (real-time information integration)
It solves the need for a seamless access and use of disparate data. The information is obtained by querying disparate data sources through a unified data schema (i.e. a single metadata database)
Data replication:
It solves the need for optimal performance for access to
data. The information is updated / obtained in specific intervals of time, rather than instantaneously.
Data warehousing - overview
Data warehousing integration technique is used to:
– Extract data from different data sources,
– Transform the data to a consolidated view (filter, sort, aggregate) as well as possible data conflicts are resolved (you will see that DF and DW have this issue).
– Load the data into a single database/data warehouse in a batch mode.
The metadata schema of DW differs from source databases.
The technique typically involves large amounts of data to be collected.
Data warehousing advantages/disadvantages
Advantages:
– Relatively easy to implement and maintain (ETL services are standardized).
– Non-expensive (packaged with many RDBMS)
– Reliable (the needed data is integrated into one place, and it is a copy).
Disadvantages:
– “the loaded” data is tightly integrated by copying into a single repository; hence
at the query-time problems can arise with the “freshness” of data; for example, when an original data source gets updated, but the warehouse still contains the
older data.
– Requires to resolve possible data conflicts of the extracted data.
– Difficulties also arise when applications need full data, i.e. not only the summary data.
– It is primarily aimed for batch data analysis.
Database federation - overview
In database federation, the data stored in separate databases and tables is federated together in the way that it can be searched from a single location.
Database federation is a meta-database, that is, it is a database that references (virtual) the source databases containing the data for integration.
Database federation advantages/disadvantages
Advantages:
– Enables the full access to any database (data source) in the enterprise through a single and standardized point.
– Data are provided in real time, from original sources, not from
cumulative databases or duplicates.
– Users can create integrated views of data and libraries of data store, that can be used multiple times.
Disadvantages:
– Requires to resolve possible data conflicts of all data.
– Requires access and queries to all physical databases,
– Requires synchronization of changes (maintenance).
– Requires the trust to reliability of all disparate and dispersed sources.
– Requires the purchase of a separate tool (i.e. not included in a core DBMS).
Data replication - overview
Data(base) replication:
– copying data between two or more database systems to ensure consistency and improved availability of the information between those databases. But why?
On the source (master) side, the data is extracted and then placed to the target (slave) side. Database writes are first sent to the master database system and are then replicated by the slave database systems.
Database replication Advantages/Disadvantages
Advantages:
– Low cost integration type.
– Simple to develop and implement.
– Improves performances when fetching data.
– Increased (improved) data availability
Disadvantages:
– Increased network overhead
– Increased data storage requirements
– Higher administration and maintenance efforts, like ensuring synchronization
– Does not solve the problem of the schema/conflicts (i.e.
metadata) integration addressed by DW and DF approaches.