Learning From Data Flashcards
(407 cards)
What is Structured Data?
Data that is organised in a predefined schema, for example data stored in a Relational Database.
What is Semi-Structured Data?
Data that has some structure, but not in fixed rows or columns. For example: JSON, XML or NoSQL.
What is Unstructured Data?
Data without a predefined structure, which doesn’t fit neatly in tables. For example: text, images, videos or PDF’s.
What is Data Integration?
The practice of combining data from different sources into a single, coherent data store.
What is Common User Interface?
Manual, controlled data-integration, but not scalable.
What is Middleware Data Integration?
Uses middleware software to bridge and facilitate communication between different systems, consistent but needs maintenance.
What is Application-based integration?
Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another.
What is uniform data access?
Provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Virtual integration without moving data; lighter but may affect integrity.
What is Common data storage?
It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. Stores copies centrally; great for analysis but expensive.
What is the difference between supervised and unsupervised learning?
Supervised learning uses data with labelled outcomes, while unsupervised learning algorithms use data without labelled outcomes.
What is supervised learning?
To learn a mapping function from inputs x to outputs y, where x is features, and y is a label or target.
What is unsupervised learning?
Aim to make sense of data, and uncover patterns.
In supervised learning, what are features?
The input variables (columns in the dataset).
What are model parameters?
Values that the model learns (e.g., coefficients in linear regression).
What’s the difference between y^ and y?
y^ is the model’s prediction; y is the true observed value.
When is Regression used in Supervised Learning?
When the target is a quantitive value.
When is Classification used in Supervised Learning?
When the target is qualitative or a class.
What is the equation for β₁ (the slope) in β₁x + β₀ for OLS linear regression
β₁ = Σ((xᵢ - x̄)(yᵢ - ȳ)) / Σ((xᵢ - x̄)²)
What is the equation for β₀ in β₁x+β₀ for OLS linear regression
β₀ = ȳ − β₁x̄
What is the error in OLS regression?
The difference between actual value (y) and predicted value (ŷ): Error = y − ŷ
What is the Sum of Squared Errors (SSE)?
The sum of all squared differences between predicted and actual values.
How is Mean Squared Error (MSE) calculated?
MSE = SSE divided by the number of observations.
What does R² (Coefficient of Determination) tell us?
The proportion of total variation in the dependent variable explained by the model.
What is the goal of Ordinary Least Squares (OLS)?
To minimize the sum of squared prediction errors.