Data Science Algorithms Flashcards

Question 1

Q

Linear Regression

Answer

A

Linear regression is a statistical method that is used to understand the relationship between two continuous variables. It assumes a linear relationship between the input variables (X) and a single output variable (Y).

Question 2

Q

Logistic Regression

Answer

A

Logistic regression is a classification algorithm used when the response variable is categorical. It models the probability that each input belongs to a particular category.

Question 3

Q

Decision Trees

Answer

A

Decision trees split the data into multiple sets based on different conditions. They are used in both classification and regression tasks and are easy to understand and interpret.

Question 4

Q

Random Forest

Answer

A

A Random Forest is an ensemble technique that uses many decision trees. It provides a robust prediction by averaging the predictions of individual decision trees.

Question 5

Q

Naive Bayes

Answer

A

This is a probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.

Question 6

Q

K-Nearest Neighbors (KNN)

Answer

A

KNN is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).

Question 7

Q

Support Vector Machines (SVM)

Answer

A

SVMs are a set of supervised learning methods used for classification and regression. They aim to find a hyperplane in an N-dimensional space that distinctly classifies the data points.

Question 8

Q

Principal Component Analysis (PCA)

Answer

A

PCA is a dimensionality reduction technique that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Question 9

Q

K-Means Clustering

Answer

A

K-Means is an unsupervised learning algorithm used to partition a dataset into K clusters. Each observation belongs to the cluster with the nearest mean.

Question 10

Q

Hierarchical Clustering

Answer

A

This is another unsupervised learning algorithm that is used to group together the objects that are similar to each other and dissimilar to the objects belonging to another cluster.

Question 11

Q

Gradient Boosting Algorithms

Answer

A

These are machine learning techniques for regression and classification problems, which produce a prediction model in the form of an ensemble of weak prediction models.

Question 12

Q

Deep Learning Algorithms

Answer

A

These are a set of algorithms that use artificial neural networks with multiple layers between the input and output. They can model complex non-linear relationships and are particularly powerful for large-scale and high-dimensional data.

Question 13

Q

Algorithmic Process

Answer

A

At its core, an algorithm is a step-by-step procedure or set of rules to be followed in calculations or other problem-solving operations. It’s the logic behind the analysis and how the objective will be achieved.

Question 14

Q

Data Inputs

Answer

A

These are the datasets that the algorithm will use. The quality and nature of the data can significantly affect the outcome. This can include structured data (like numerical data in databases) or unstructured data (like text or images).

Question 15

Q

Data Processing

Answer

A

The algorithm will usually need to clean and preprocess the data. This can involve dealing with missing values, outliers, or scaling the data to make it suitable for the algorithm to process.

Question 16

Q

Analytical Model

Answer

Study These Flashcards

A

This is the specific model that the algorithm uses to analyze the data. There are many types of models, including statistical models, machine learning models, deep learning models, etc.

Question 17

Q

Training Process

Answer

Study These Flashcards

A

In machine learning algorithms, a model is trained on a subset of data, allowing it to learn patterns or relationships in the data. This training process is an iterative procedure where the model makes predictions and then adjusts its parameters based on the error of the predictions.

Question 18

Q

Evaluation

Answer

Study These Flashcards

A

After the model is trained, it’s important to evaluate its performance on unseen data. This is done using a variety of metrics, depending on the specific task.

Question 19

Q

Prediction/Decision Making

Answer

Study These Flashcards

A

Once the model has been trained and evaluated, it can be used to make predictions on new data or to support decision-making processes.

Question 20

Q

Iterative Improvement

Answer

Study These Flashcards

A

As new data becomes available or as the model is used, the algorithm can learn and improve, making better predictions over time.

Data Science Algorithms Flashcards

(20 cards)