Data Science Algorithms Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Linear Regression

A

Linear regression is a statistical method that is used to understand the relationship between two continuous variables. It assumes a linear relationship between the input variables (X) and a single output variable (Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logistic Regression

A

Logistic regression is a classification algorithm used when the response variable is categorical. It models the probability that each input belongs to a particular category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decision Trees

A

Decision trees split the data into multiple sets based on different conditions. They are used in both classification and regression tasks and are easy to understand and interpret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random Forest

A

A Random Forest is an ensemble technique that uses many decision trees. It provides a robust prediction by averaging the predictions of individual decision trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Naive Bayes

A

This is a probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

K-Nearest Neighbors (KNN)

A

KNN is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Support Vector Machines (SVM)

A

SVMs are a set of supervised learning methods used for classification and regression. They aim to find a hyperplane in an N-dimensional space that distinctly classifies the data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Principal Component Analysis (PCA)

A

PCA is a dimensionality reduction technique that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-Means Clustering

A

K-Means is an unsupervised learning algorithm used to partition a dataset into K clusters. Each observation belongs to the cluster with the nearest mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hierarchical Clustering

A

This is another unsupervised learning algorithm that is used to group together the objects that are similar to each other and dissimilar to the objects belonging to another cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Gradient Boosting Algorithms

A

These are machine learning techniques for regression and classification problems, which produce a prediction model in the form of an ensemble of weak prediction models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Deep Learning Algorithms

A

These are a set of algorithms that use artificial neural networks with multiple layers between the input and output. They can model complex non-linear relationships and are particularly powerful for large-scale and high-dimensional data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Algorithmic Process

A

At its core, an algorithm is a step-by-step procedure or set of rules to be followed in calculations or other problem-solving operations. It’s the logic behind the analysis and how the objective will be achieved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Inputs

A

These are the datasets that the algorithm will use. The quality and nature of the data can significantly affect the outcome. This can include structured data (like numerical data in databases) or unstructured data (like text or images).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Processing

A

The algorithm will usually need to clean and preprocess the data. This can involve dealing with missing values, outliers, or scaling the data to make it suitable for the algorithm to process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Analytical Model

A

This is the specific model that the algorithm uses to analyze the data. There are many types of models, including statistical models, machine learning models, deep learning models, etc.

17
Q

Training Process

A

In machine learning algorithms, a model is trained on a subset of data, allowing it to learn patterns or relationships in the data. This training process is an iterative procedure where the model makes predictions and then adjusts its parameters based on the error of the predictions.

18
Q

Evaluation

A

After the model is trained, it’s important to evaluate its performance on unseen data. This is done using a variety of metrics, depending on the specific task.

19
Q

Prediction/Decision Making

A

Once the model has been trained and evaluated, it can be used to make predictions on new data or to support decision-making processes.

20
Q

Iterative Improvement

A

As new data becomes available or as the model is used, the algorithm can learn and improve, making better predictions over time.