UNIT 1 Flashcards
Data silo
Data silos are information “stocks” managed by a specific sector, which is kept isolated from the other company systems. A collection of data held by one group that is not easily or fully accessible by other groups in the same organization.
This happens when collaborators choose, for example, to save information on devices or platforms used exclusively by their team, instead of uploading it to a network or other unified directory.
Data silos store big data in various file formats, from emails, to raw data, that have not yet been processed and analyzed.
Some reasons why this (data silos) may happen in a company:
Cultural: Competition or animosity between departments can cause those employees to keep data from each other, rather than working together.
Structural: Especially in large organizations, data silos can exist because a hierarchy separated by many levels of management and highly specialized staff.
Technological: Applications might not be used, or even designed, to cross-reference or add to each other. Or one department may simply not have access to a valuable app from another department because it was not purchased for their specific day-to-day tasks.
In recent years the applications of Artificial Intelligence and Big Data have become key factors in the strategy of many companies. Data offers companies the possibility to analyze their actions, learn from them and even predict and plan for the future.
–> It is necessary to break down data silos and unify information to make the most of its analytical and predictive potential
Marketing
- where you develop your target market
- website, social, D&B, 3rd party marketing tool –> integration to maintain target group information
CRM
- where you develop opportunities and customers
- IoT, social, D&B, Vertical solutions–> integration to enrich existing data
Why is it so important to share data in companies?
- Data silos limit the global view of data: prevent relevant data from being shared. Each department’s analysis is limited by its own view. How can you find hidden opportunities for operational cost savings, for example, if operations and cost data aren’t consolidated?
- Data silos threaten data integrity: If the same information is often stored in different databases, it may cause inconsistencies between departmental data. As data ages, it can become less accurate, and therefore, less useful. For example, if medical data on the same patient is stored in different systems, this data can become out of sync over time.
- Data silos waste resources: The same information is stored in different places, and when users download data into their personal or group storage, resources suffer. Having data into one source frees up precious storage and avoid paying for buying and maintaining storage that may not be needed.
- Data silos discourage collaborative work: Data-driven organizations are embracing collaboration as a powerful tool to find and leverage new insights. In order to encourage collaboration, departments need a way to share their data.
Data warehouse
A Data Warehouse is an electronic store where a company or organization generally maintains a large amount of information.
The data in a data warehouse must be:
- Stored securely
- Reliably
- Easily retrieved
- Easily managed.
Data warehouse (2)
It is a large database, normally measured in gigabytes (billions of characters) or terabytes (billions of letters), which collects information from multiple sources and whose activity is focused on decision-making, that is, in the analysis of information instead of capturing it.
It is important that the company has a single Data Warehouse: Thus, the members of the organization will be able to access the same source of information organized according to conventions determined by the management.
It serves to help analyze the data collected by the company in order to improve its performance.
Data warehousing- definition
Data Warehousing(DW) is a process for collecting and managing data from varied sources to provide meaningful business insights. Typically used to connect and analyze business data from heterogeneous sources. The data warehouse is the core of the BI system which is built for data analysis and reporting.
ETL: Extract-Transform-Load
Is a process responsible for extracting data from source systems and placing it in a data warehouse. ETL software extracts data, transforms inconsistent data values, cleans “bad” data, filters data, and loads data into a destination data warehouse. Designing and maintaining the ETL process is often considered one of the most difficult and resource-intensive parts of a data warehouse project.
ETL involves the following tasks:
Data extraction: obtaining the information from the different sources of origin, both internal and external, including database systems and applications.
Transformation: it is the filtering, cleaning, purification, homogenization and grouping of information, using rules or lookup tables or by combining the data with other data.
Loading: is the process of writing the data to the warehouse.
Some of these most used ETL tools are:
- Oracle Warehouse Builder
- IBM Websphere DataStage
- Microsoft SQL Server Integration Services (SSIS)
Metadata
The data that is used to represent other data. For example, the index of a book serves as a metadata for the contents in the book.
Summary data
Statistical records and reports derived from data on individuals.
Raw data
Unprocessed data
Relational database
It is a type of database that stores and provides access to data points that are related to one another. It uses a structure that allows us to identify and access data in relation to another piece of data in the database. Often, data in a relational database is organized into tables.