lv. 3 - Copy of CS35 Flashcards

1
Q

Series of tasks, activities, or operations to achieve a goal or an outcome

A

Process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Combination of hardware and software to facilitate or automate processes

A

Technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete measurement, fact, or observation representing a real-world process

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the mathematical discipline that studies the methods of collecting, analyzing, and interpreting data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

specific collection of items of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

subset or subcollection of the population

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

two scopes of data

A

Sample & Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logic is built based on business rules

A

Traditional Rule-Based AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logic is built by modelling and training data

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Input and sometimes output data are provided to a machine which will build a logic based on mathematical rules

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine learning algorithms in which the training data includes both input and output

A

Supervised Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inputs are called

A

feature values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

outputs are called

A

label values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the label predicted by the model is a numeric value

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

the model predicts whether a record is an instance of a specific class or category

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

the model predicts whether a record is an instance of one of multiple classes or categories

A

Multiclass Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Training data consists only of input without any known output

A

Unsupervised Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

the model identifies similarities between observations based on their features and groups them into discrete clusters

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A model that groups existing customers into clusters based on age, location, gender, social media usage, and purchasing behavior.

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A model that classifies whether a social media post is positive, negative, or neutral.

A

Multiclass Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A model that predicts whether a customer will cancel their subscription.

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A model that predicts the price of an apartment based on the size, number of rooms, barangay, and date of building.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Used to train the model, data where the algorithm learns patterns from

A

Training Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Used to evaluate the model

A

Test Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Proportion of predictions that the model got right
Accuracy
26
Proportion of predicted positive cases where the true label is actually positive
Precision
27
Proportion of positive cases that the model identified correctly
Recall
28
Overall metric combining Recall and Precision
F1 Score
29
a lazy learning algorithm, predicts the class of a data point based on the majority class of its k nearest neighbors
k-NN classifier
30
predicts the probability that a given data point belongs to a particular class, uses the logistic function
Logistic Regression
31
an S-shaped curve, used to represent logistical regression
logistic function
32
occurs when one class is significantly more frequent than the other
Class Imbalance
33
reducing the number of instances in the majority class by removing samples until the classes are balanced.
Undersampling
34
increasing the number of instances in the minority class by duplicating samples or generating new synthetic examples.
Oversampling
35
Generates synthetic samples for the minority class by interpolating between existing samples
SMOTE (Synthetic Minority Oversampling Technique)
36
Cons of Oversampling
Oversampling can cause overfitting, especially with random oversampling.
37
Cons of Undersampling
Important information from the majority class may be lost, potentially underfitting the model.
38
a measure of the relationship between two variables. If one variable increases when the other one also increases, the correlation is positive.
Correlation
39
means that changes in one variable cause another variable to change. It means one variable directly influences the other.
Causation
40
Measures the average magnitude of the errors in a set of predictions without considering their direction
Mean Absolute Error
41
Measures the average squared difference between actual and predicted values. Larger errors are penalized more.
Mean Squared Error
42
A popular metric because it has the same units as the target variable, making it easier to interpret
Root Mean Squared Error
43
standardizes features by making sure that each feature has a mean of 0 and a standard deviation of 1.
StandardScaler
44
a model used for regression tasks, where the goal is to predict a continuous target variable based on input features. It works by splitting the data into different regions based on feature values, making predictions by averaging the target values in each region.
DecisionTreeRegressor
45
an ensemble model that averages the predictions from multiple different regression models to make a final prediction.
VotingRegressor
46
this modeling technique trains multiple binary classifiers, each focusing on one class versus all others.
One Vs. Rest Classifier
47
? is used for classification problems while ? is used for regression. The approach of both techniques is similar.
Random Forest Classifier, Decision Tree Regressor
48
a natural language processing approach used to determine whether the emotional tone of a piece of text is positive, negative, or neutral.
Sentiment Analysis
49
an automated technological process that converts an image of text into a machine-readable format. It is traditionally known as text recognition.
Optical Character Recognition
50
Layers of a Convolutional Neural Network
Convolutional Layer Pooling Layer Flatten Layer Fully Connected (Dense) Layer
51
compares the performance of two versions of actions to see which one performs better to users or viewers.
A/B Testing
52
the process of creating, sharing, and utilizing knowledge and information within an organization.
Knowledge Management
53
knowledge that can be easily codified into formats such as text, diagrams, or figures
Explicit knowledge
54
knowledge that is not formally documented but can be inferred from explicit knowledge and transferred into practical skills
Implicit knowledge
55
personal and often difficult to articulate, consisting of insights, experiences, and "know-how."
Tacit knowledge
56
facilitates the knowledge management of an organization by capturing and organizing knowledge.
Knowledge Management Software
57
In-House or Captive Operations Pros and Cons
- Intellectual Property Protection - Ultimate Control - Long-term Cost Savings - Internal Expertise - High Initial Investment - Operational Complexity - Inflexibility
58
Outsourcing Pros and Cons
- Flexibility - Access to Varied Expertise - Risk Mitigation - Quality Control - Coordination Effort - Costs can balloon if not managed well
59
How Cloud Computing and Big Data enable Machine Learning
60
- Cloud Computing - provides the necessary infrastructure and computational power to process large datasets efficiently - Big Data - supplies the enormous and complex datasets that are crucial for training ML models
61
delivers resources over the internet, making it possible for organization or user to access systems and services.
Public Cloud
62
the exact opposite of the public cloud deployment model, where a one-on-one environment is dedicated for a single customer or organization
Private Cloud
63
combines both private and public cloud models. With a hybrid solution, an organization may host applications in a safe environment while taking advantage of the cost savings of the public cloud.
Hybrid Cloud
64
a distributed system that is created by integrating the services of different clouds to address the specific needs of a community, industry, or business
Federated Cloud or Community Cloud
65
delivers on-demand infrastructure resources, such as compute, storage, networking, and virtualization
Infrastructure as a Service / IAAS
66
delivers and manages hardware and software resources for developing, testing, delivering, and managing cloud applications
Platform as a Service / PAAS
67
provides a full application stack as a service that customers can access and use.
Software as a Service / SAAS
68
Big Data characteristics
- Volume - Sheer quantity of the data - Velocity - Speed in which the data is gathered - Variety - Type, nature, and source of data - Veracity - Data quality, pertaining to accuracy and reliability - Value - Data has actionable insights and patterns
69
means connectivity to devices with an on and off switch to the internet, enabling them to collect and share data.
Internet of Things
70
How Big Data and IoT revolutionized modern-day machine learning:
- Accuracy - Larger datasets enable machine learning algorithms to identify more intricate patterns and relationships - Reduced Overfitting - With more data, models are less likely to overfit. - Discovering Hidden Patterns - Big data enables the discovery of subtle correlations and trends that might be missed in smaller datasets - Deep Learning - Deep learning models such as neural networks require massive amounts of data to learn complex representations - Natural Language Processing - NLP models, such as those used for language translation and sentiment analysis, benefit from large datasets of text and speech data.
71
the practice of protecting digital information from unauthorized access, corruption, or theft.
Data Security
72
A regulation of the European Union that establishes rules for the protection of personal data. It requires organizations to protect the privacy of EU residents and provides them with greater control over their personal data.
General Data Protection Regulation (GDPR)
73
the process of removing or altering personal information from data so that individuals cannot be easily identified.
De-identification
74
unsupervised learning task where the model groups similar data points together based on their features or attributes
Clustering
75
Applications of Clustering
- Customer Segmentation - Image Segmentation - Anomaly Detection
76
a widely used clustering algorithm that partitions a dataset into K clusters based on the similarity of data points. It is used in data mining and image processing applications.
K-Means Clustering
77
works by recursively partitioning the data into smaller clusters. It merges the two closest clusters at each iteration until all data points belong to a single cluster.
Hierarchical Clustering
78
a widely used dimensionality reduction technique in machine learning and feature extraction.
Principal Component Analysis (PCA)
79
a measure of how well the data points are clustered around the centroids
Inertia
80
measures how well each data point is assigned to its cluster by comparing its similarity to points in its own cluster (cohesion) versus points in the nearest other cluster (separation)
Silhouette Score