V: Machine Learning Flashcards
What is machine learning?
Machine learning (ML) refers to systems that create models for data sets. It involves training a model on a dataset to identify patterns and relationships, allowing it to generalize and make predictions on unseen data. Machine learning techniques include supervised learning, where the model learns from labeled examples, unsupervised learning, where the model identifies patterns without labeled data, and reinforcement learning, where the model learns by interacting with an environment and receiving feedback. Machine learning has applications in various domains, including image recognition, natural language processing, and predictive analytics. ML is not really learning in the sense that humans learn as when “interacting with the surroundings and acquiring new knowledge and experiences to, thereby, modify behavior and values”.
What is data mining?
Data Mining is when already available information is sorted and categorized to make it work for the data. Similarly to machine learning is that both use data sets and work with discovering patterns and relationships in large data volumes.
Explain the seven steps in development process for machine learning:
Collect data
Prepare the data set: labeling the data, meet an acceptance level and have a certain amount of data.
Choose an algorithm
Train the model: methods and algorithms are applied. Depending on the nature of the data set and the intended purpose, machine learning systems use one of several types of learning: supervised, unsupervised and reinforcement learning.
Evaluate the model
Tune the data
Make predictions
When was machine learning invented? Give an example of machine learning?
The term was coined in 1959 by Arthur Samuel. Popularity of machine learning has been fluctuating and during the 50s several researchers worked with machine learning and applied it to decision making. During the 90s machine learning shifted from a knowledge-driven to a data-driven approach. The computers began analyzing large amounts of data to draw conclusions or learn from the results. In the 2010s, deep learning and machine learning enjoyed a real boom.
Machine learning can be used to find patterns in big data sets, such as satellite images and finding strategies that have never been tried before.
What is underfitting and overfitting?
Underfitting and overfitting are two problems plaguing ML systems. Both are problems in relation to creating accurate mods of data sets. Underfitting occurs when the model can’t fit the data and thus does not adequately and accurately represent the data set. The data set it too small and there are no relationships between input data and target data, which makes the
model unable to capture the relationships between the input examples and the target values.
Overfitting happens when the model matches the data in the training set exactly. This means that the model will not generalize to new data - it just recognizes the data it has already seen in training, which is the reason why it is unable to generalize the unseen examples.
What is a classification task?
The task involves training a model using labeled data, where each input example is associated with a known class. The trained model learns patterns and relationships in the data and develops a decision boundary or rule to classify new, unseen data points into the appropriate classes. Classification algorithms include decision trees, logistic regression, support vector machines, and neural networks. The output of a classification task is a predicted class label for each input instance, enabling the model to classify new data based on learned patterns.
What are the different machine learning types? When are they used?
Supervised are used when the categories are available. IT is about mapping an input to an output with an outcome divided into two or more classes, either as a quantity or a discrete value (0 or 1). It can also be used for regression problems.
Unsupervised is used when the data set lacks categories. The system will learn the structure or distribution of the data set without knowing the corresponding output variables. The data set can create different clusters, this clustering is a great advantage of unsupervised learning. They can also be good for association rule problems and discovering rules for data.
Reinforcement learning works for making a sequence of decisions. It works by means of trial and error with awards and penalties in order to reach a solution.
What is a K-nearest neighbors algorithm?
A K-nearest neighbor algorithm is different from other algorithms since it does not create a model. Instead it is an instance-based learning algorithm. This algorithm is especially good for data where labels are missing and is difficult to classify. The KNN algorithm is different from other algorithms since it does not create a model.