lv. 3 - Copy of CS35 Flashcards

1
Q

Series of tasks, activities, or operations to achieve a goal or an outcome

A

Process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Combination of hardware and software to facilitate or automate processes

A

Technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete measurement, fact, or observation representing a real-world process

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the mathematical discipline that studies the methods of collecting, analyzing, and interpreting data.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

specific collection of items of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

subset or subcollection of the population

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

two scopes of data

A

Sample & Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logic is built based on business rules

A

Traditional Rule-Based AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logic is built by modelling and training data

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Input and sometimes output data are provided to a machine which will build a logic based on mathematical rules

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine learning algorithms in which the training data includes both input and output

A

Supervised Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inputs are called

A

feature values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

outputs are called

A

label values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the label predicted by the model is a numeric value

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

the model predicts whether a record is an instance of a specific class or category

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

the model predicts whether a record is an instance of one of multiple classes or categories

A

Multiclass Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Training data consists only of input without any known output

A

Unsupervised Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

the model identifies similarities between observations based on their features and groups them into discrete clusters

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A model that groups existing customers into clusters based on age, location, gender, social media usage, and purchasing behavior.

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A model that classifies whether a social media post is positive, negative, or neutral.

A

Multiclass Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A model that predicts whether a customer will cancel their subscription.

A

Binary Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A model that predicts the price of an apartment based on the size, number of rooms, barangay, and date of building.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Used to train the model, data where the algorithm learns patterns from

A

Training Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Used to evaluate the model

A

Test Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Proportion of predictions that the model got right

A

Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Proportion of predicted positive cases where the true label is actually positive

A

Precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Proportion of positive cases that the model identified correctly

A

Recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Overall metric combining Recall and Precision

A

F1 Score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

a lazy learning algorithm, predicts the class of a data point based on the majority class of its k nearest neighbors

A

k-NN classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

predicts the probability that a given data point belongs to a particular class, uses the logistic function

A

Logistic Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

an S-shaped curve, used to represent logistical regression

A

logistic function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

occurs when one class is significantly more frequent than the other

A

Class Imbalance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

reducing the number of instances in the majority class by removing samples until the classes are balanced.

A

Undersampling

34
Q

increasing the number of instances in the minority class by duplicating samples or generating new synthetic examples.

A

Oversampling

35
Q

Generates synthetic samples for the minority class by interpolating between existing samples

A

SMOTE (Synthetic Minority Oversampling Technique)

36
Q

Cons of Oversampling

A

Oversampling can cause overfitting, especially with random oversampling.

37
Q

Cons of Undersampling

A

Important information from the majority class may be lost, potentially underfitting the model.

38
Q

a measure of the relationship between two variables. If one variable increases when the other one also increases, the correlation is positive.

A

Correlation

39
Q

means that changes in one variable cause another variable to change. It means one variable directly influences the other.

A

Causation

40
Q

Measures the average magnitude of the errors in a set of predictions without considering their direction

A

Mean Absolute Error

41
Q

Measures the average squared difference between actual and predicted values. Larger errors are penalized more.

A

Mean Squared Error

42
Q

A popular metric because it has the same units as the target variable, making it easier to interpret

A

Root Mean Squared Error

43
Q

standardizes features by making sure that each feature has a mean of 0 and a standard deviation of 1.

A

StandardScaler

44
Q

a model used for regression tasks, where the goal is to predict a continuous target variable based on input features. It works by splitting the data into different regions based on feature values, making predictions by averaging the target values in each region.

A

DecisionTreeRegressor

45
Q

an ensemble model that averages the predictions from multiple different regression models to make a final prediction.

A

VotingRegressor

46
Q

this modeling technique trains multiple binary classifiers, each focusing on one class versus all others.

A

One Vs. Rest Classifier

47
Q

? is used for classification problems while ? is used for regression. The approach of both techniques is similar.

A

Random Forest Classifier, Decision Tree Regressor

48
Q

a natural language processing approach used to determine whether the emotional tone of a piece of text is positive, negative, or neutral.

A

Sentiment Analysis

49
Q

an automated technological process that converts an image of text into a machine-readable format. It is traditionally known as text recognition.

A

Optical Character Recognition

50
Q

Layers of a Convolutional Neural Network

A

Convolutional Layer
Pooling Layer
Flatten Layer
Fully Connected (Dense) Layer

51
Q

compares the performance of two versions of actions to see which one performs better to users or viewers.

A

A/B Testing

52
Q

the process of creating, sharing, and utilizing knowledge and information within an organization.

A

Knowledge Management

53
Q

knowledge that can be easily codified into formats such as text, diagrams, or figures

A

Explicit knowledge

54
Q

knowledge that is not formally documented but can be inferred from explicit knowledge and transferred into practical skills

A

Implicit knowledge

55
Q

personal and often difficult to articulate, consisting of insights, experiences, and “know-how.”

A

Tacit knowledge

56
Q

facilitates the knowledge management of an organization by capturing and organizing knowledge.

A

Knowledge Management Software

57
Q

In-House or Captive Operations Pros and Cons

A
  • Intellectual Property Protection
  • Ultimate Control
  • Long-term Cost Savings
  • Internal Expertise
  • High Initial Investment
  • Operational Complexity
  • Inflexibility
58
Q

Outsourcing Pros and Cons

A
  • Flexibility
  • Access to Varied Expertise
  • Risk Mitigation
  • Quality Control
  • Coordination Effort
  • Costs can balloon if not managed well
59
Q

How Cloud Computing and Big Data enable Machine Learning

A
60
Q
A
  • Cloud Computing
    • provides the necessary infrastructure and computational power to process large datasets efficiently
  • Big Data
    • supplies the enormous and complex datasets that are crucial for training ML models
61
Q

delivers resources over the internet, making it possible for organization or user to access systems and services.

A

Public Cloud

62
Q

the exact opposite of the public cloud deployment model, where a one-on-one environment is dedicated for a single customer or organization

A

Private Cloud

63
Q

combines both private and public cloud models. With a hybrid solution, an organization may host applications in a safe environment while taking advantage of the cost savings of the public cloud.

A

Hybrid Cloud

64
Q

a distributed system that is created by integrating the services of different clouds to address the specific needs of a community, industry, or business

A

Federated Cloud or Community Cloud

65
Q

delivers on-demand infrastructure resources, such as compute, storage, networking, and virtualization

A

Infrastructure as a Service / IAAS

66
Q

delivers and manages hardware and software resources for developing, testing, delivering, and managing cloud applications

A

Platform as a Service / PAAS

67
Q

provides a full application stack as a service that customers can access and use.

A

Software as a Service / SAAS

68
Q

Big Data characteristics

A
  • Volume
    • Sheer quantity of the data
  • Velocity
    • Speed in which the data is gathered
  • Variety
    • Type, nature, and source of data
  • Veracity
    • Data quality, pertaining to accuracy and reliability
  • Value
    • Data has actionable insights and patterns
69
Q

means connectivity to devices with an on and off switch to the internet, enabling them to collect and share data.

A

Internet of Things

70
Q

How Big Data and IoT revolutionized modern-day machine learning:

A
  • Accuracy
    • Larger datasets enable machine learning algorithms to identify more intricate patterns and relationships
  • Reduced Overfitting
    • With more data, models are less likely to overfit.
  • Discovering Hidden Patterns
    • Big data enables the discovery of subtle correlations and trends that might be missed in smaller datasets
  • Deep Learning
    • Deep learning models such as neural networks require massive amounts of data to learn complex representations
  • Natural Language Processing
    • NLP models, such as those used for language translation and sentiment analysis, benefit from large datasets of text and speech data.
71
Q

the practice of protecting digital information from unauthorized access, corruption, or theft.

A

Data Security

72
Q

A regulation of the European Union that establishes rules for the protection of personal data. It requires organizations to protect the privacy of EU residents and provides them with greater control over their personal data.

A

General Data Protection Regulation (GDPR)

73
Q

the process of removing or altering personal information from data so that individuals cannot be easily identified.

A

De-identification

74
Q

unsupervised learning task where the model groups similar data points together based on their features or attributes

A

Clustering

75
Q

Applications of Clustering

A
  • Customer Segmentation
  • Image Segmentation
  • Anomaly Detection
76
Q

a widely used clustering algorithm that partitions a dataset into K clusters based on the similarity of data points. It is used in data mining and image processing applications.

A

K-Means Clustering

77
Q

works by recursively partitioning the data into smaller clusters. It merges the two closest clusters at each iteration until all data points belong to a single cluster.

A

Hierarchical Clustering

78
Q

a widely used dimensionality reduction technique in machine learning and feature extraction.

A

Principal Component Analysis (PCA)

79
Q

a measure of how well the data points are clustered around the centroids

A

Inertia

80
Q

measures how well each data point is assigned to its cluster by comparing its similarity to points in its own cluster (cohesion) versus points in the nearest other cluster (separation)

A

Silhouette Score