1: Intro to ML Flashcards
What is meant by Machine Learning?
A mathematical model to define the relations between the inputs and the outputs, and utilize it to predict the outputs for new scenarios or generate insights about new data scenarios.
What are parameters?
Constants that define specific characteristics of the system (e.g., growth rate, decay constant).
What does ML use?
Machine learning uses a computing device to process data and integrates many distinct mathematical tools such as probability and statistics, optimization, and control theory.
What are the possible outcomes in any dataset (generally speaking)?
Either Continous or Discrete.
What kind of model is predicted for Continous outputs?
Regression.
What kind of model is predicted for Discrete (finite, categorical) outputs?
Classification.
What ML techniques are used to discover the hidden patterns within the data (i.e., there are no target outputs provided)?
Clustering Analysis.
What are the four major machine learning paradigms?
Supervised L, Unsupervised L, Semi-supervised L, Reinforcement L.
What is Unsupervised L about?
Concerned with discovering the hidden patterns in the data inputs and includes clustering as an important sub-domain.
What is Supervised L about?
Denotes the learning tasks when data inputs and corresponding target outputs are provided, and includes classification and regression approaches.
What is Semisupervised L about?
Covers problems where only partial label information exists. A basic classification model is designed on the few labeled data instances, which is called the semi-supervised classification step. A semi-supervised clustering step is then performed, where the model is tuned up to operate without supervision on the remaining large unlabeled data instances, and assigns them to the classes from the first step.
What is Reinforcement L about?
Denotes the learning setup where the goal is to find an action policy that achieves a given goal. Follows the“cause and effect” method. A reward function that acts as a feedback to the agent.
What are the two datasets used for building the model?
The training set - Develop the classification model. The testing set Evaluates the accuracy of the developed model.
What questions do precision and recall answer?
Precision: Of all the predicted positive cases, how many are actually positive?
Recall: Of all the actual positive cases, how many did the model identify correctly?
How to validate the best classificaion model?
Several techniques may be tested in parallel, and the technique that returns the highest evaluation performance is selected.