Data Integration, Learning from data, Supervised Learning Flashcards
Define Data Integration… Why is it needed?
Process of combining data from heterogenous sources into a single, coherent data store.
Data sources are usually disparate and siloed. Data integration enables the access and interpretation of data from different sources and types.
What are the 5 main ways of integration data? Describe each…
Common User Interface : Manual data integration by a data manager from retrieval to presentation.
Middleware Data Integration : A piece of Middleware that facilitates integration between systems. Usually legacy and new systems.
Application-Based Integration : A Software Application that locates, retrieves and integrates data into storage. Essentially, conducting the entire process, as opposed to Middleware.
Uniform Data Access : Provides a consistent view of data from a variety of sources, but doesn’t retrieve or manipulate the data.
Common Data Storage : E.g a Data Warehouse.
For each type of Data Integration process, give a pro and a con…
Common User Interface :
Pro = Total control and handling.
con = Poor scaling.
Middleware Data Integration :
Pro = Automated integration.
con = Must be maintained.
Application-Based Integration :
Pro = Automated end to end process.
con = Complex setup.
Uniform Access Integration :
Pro = Low storage requirements.
con = Hosts struggle w/ data request count.
Common Data Storage :
Pro = Reduces burden on host system.
con = Increased storage costs.
What are the 3 categories of learning from data? Define each…
Supervised : Learning that has an Input set and Output set. The goal is to establish the mapping function that gives the most precise continuous target or outcome.
Unsupervised : Learning in which we only have input, and we are tasked with making sense of it.
Semi-supervised :
What is the most common type of learning?
Supervised
What are the 2 types of Supervised Learning?
Regression
Classification
Define Regression…
The process of finding a continuous target or outcome.
Define Classification…
The process of classifying inputs.
In Supervised Learning, what are we trying to find?
The mapping function.
What are the inputs of a supervised learning model?
Features, covariates, predictors etc.
What are the outputs of a supervised learning model?
Target, label, response etc.
Give an example of a usage of Supervised Learning. Define the Inputs and Outputs.
An input set of dog photos, a boolean output set, and a model that predicts whether each photo is a dog.
Define Unsupervised Learning…
The process of making sense of a data set by recognising patterns.
Define a Regression Problem and give an example…
A problem in which we need to find a continuous target or outcome.
Define what is meant by inputs, outputs and parameter variables of a Mapping Function…
Inputs : Input value
Parameters : The values that will change as the model learns from the data.
Output : The predicted value