Lecture 1 Flashcards
Data
A set of discrete, objective facts about events
Dataset
a collection of data with a defined structure
Data point
a single instance in the dataset
Attribute
A single property of the dataset
Data science
a collection of techniques used to extract value from data
process of building a representative model that fits the observational data
Model
representation of a relationship between variable in a dataset
modeling
process in which a representative abstraction is built from the observed dataset
Data science model serves two purposes
- it predicts the output (interest rate) based on the new and unseen set of input variables
- the model can be used to understand the relationship between the output variable and all the input variables
techniques used in the steps of a data science process
- descriptive statistics exploratory visualization dimensional slicing hypothesis-testing data engineering business intelligence
Supervised model
supervised data science tries to infer a function or relationship based on labeled training data and uses this function to map new unlabeled data
Unsupervised data
uncovers hidden patterns in unlabeled data
Classification and regression techniques
predicting a target variables based on input variables
Clustering
the process of identifying the natural groupings in a dataset
recommendation engines
the systems that recommend items to the users based on individual user preference
anomaly or outlier detection
identifies the data points that are significantly different from other data points in a dataset