Learning From Data Flashcards
What is Structured Data?
Data that is organised in a predefined schema, for example data stored in a Relational Database.
What is Semi-Structured Data?
Data that has some structure, but not in fixed rows or columns. For example: JSON, XML or NoSQL.
What is Unstructured Data?
Data without a predefined structure, which doesn’t fit neatly in tables. For example: text, images, videos or PDF’s.
What is Data Integration?
The practice of combining data from different sources into a single, coherent data store.
What is Common User Interface?
Manual, controlled data-integration, but not scalable.
What is Middleware Data Integration?
Uses middleware software to bridge and facilitate communication between different systems, consistent but needs maintenance.
What is Application-based integration?
Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another.
What is uniform data access?
Provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Virtual integration without moving data; lighter but may affect integrity.
What is Common data storage?
It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. Stores copies centrally; great for analysis but expensive.
What is the difference between supervised and unsupervised learning?
Supervised learning uses data with labelled outcomes, while unsupervised learning algorithms use data without labelled outcomes.
What is supervised learning?
To learn a mapping function from inputs x to outputs y, where x is features, and y is a label or target.
What is unsupervised learning?
Aim to make sense of data, and uncover patterns.
In supervised learning, what are features?
The input variables (columns in the dataset).
What are model parameters?
Values that the model learns (e.g., coefficients in linear regression).
What’s the difference between y^ and y?
y^ is the model’s prediction; y is the true observed value.
When is Regression used in Supervised Learning?
When the target is a quantitive value.
When is Classification used in Supervised Learning?
When the target is qualitative or a class.
What is the equation for a (the slope) in ax+b for linear regression
a =
Σ((x - x̄)(y - ȳ)) /
Σ(x - x̄)(x - x̄)
What is the equation for b in ax+b for linear regression
b = ȳ - ax̄