Final Terms & Concepts List Flashcards

Question 1

Q

Data Science

Answer

A

An interdisciplinary field using scientific methods, processes, algorithms, and systems to extract knowledge or insights from data in various forms.

Question 2

Q

Big Data

Answer

A

Extracting and storing large volumes of data, which data science then refines to produce insights.

Question 3

Q

Qualitative Data

Answer

A

Descriptive piece of information. Example: ‘What a nice day it is’.

Question 4

Q

Quantitative Data

Answer

A

Numerical information. Example: ‘1’, ‘3.65’. Can be further divided into discrete and continuous data.

Question 5

Q

Discrete Data

Answer

A

Quantitative data that can be expressed as a specific value. Example: ‘Number of months in a year’.

Question 6

Q

Continuous Data

Answer

A

Quantitative data that can be any value in an interval. Example: ‘The amount of oxygen in the atmosphere’.

Question 7

Q

Artificial Intelligence (AI)

Answer

A

Creating machines with capabilities that would require intelligence if they were performed by humans.

Question 8

Q

Machine Learning (ML)

Answer

A

A field of AI where systems learn from data without explicit programming.

Question 9

Q

Supervised Learning

Answer

A

A type of machine learning where an algorithm learns from a labeled dataset.

Question 10

Q

Unsupervised Learning

Answer

A

A type of machine learning where an algorithm learns from an unlabeled dataset.

Question 11

Q

Regression Algorithms

Answer

A

Algorithms that predict numerical values of a variable based on historic data.

Question 12

Q

Classification Algorithms

Answer

A

Algorithms that predict which class data belongs to.

Question 13

Q

Anomaly Detection Algorithms

Answer

A

Algorithms used to find outliers or anomalies in data.

Question 14

Q

Cloud Computing

Answer

A

The on-demand delivery of computing resources via the internet with pay-as-you-go pricing.

Question 15

Q

Public Cloud

Answer

A

Services provided over a public network and available to anyone.

Question 16

Q

Private Cloud

Answer

A

Infrastructure dedicated to a single organization, located on-premises or off-premises.

Question 17

Q

Hybrid Cloud

Answer

A

Combines public and private cloud environments, allowing data sharing between them.

Question 18

Q

Community Cloud

Answer

A

Infrastructure and services shared among a specific community or group of organizations.

Question 19

Q

Infrastructure as a Service (IaaS)

Answer

A

Provides virtualized computing resources over the internet.

Question 20

Q

Platform as a Service (PaaS)

Answer

A

Offers a platform for developing, testing, and deploying applications.

Question 21

Q

Software as a Service (SaaS)

Answer

A

Delivers software applications over the internet on a subscription basis.

Question 22

Q

Total Cost of Ownership (TCO)

Answer

A

The financial estimate to help identify direct and indirect costs of a system.

Question 23

Q

AWS Cloud Adoption Framework (CAF)

Answer

A

Provides guidance and best practices to build a comprehensive approach to cloud computing.

Question 24

Q

Industry 4.0

Answer

A

Increasing automation and data exchange in manufacturing technologies.

Question 25

Q

Cognitive Biases

Answer

A

Systematic patterns of deviation from norm or rationality in judgment.

Question 26

Q

Data Visualization

Answer

A

The graphical representation of information and data.

Question 27

Q

Agile Process

Answer

A

Focuses on delivering the highest business value in the shortest time through iterative development.

Question 28

Q

Scrum

Answer

A

An agile framework for managing and controlling complex projects.

Question 29

Q

Sprint

Answer

A

A short, time-boxed period (typically 2-4 weeks) during which a Scrum Team works to complete a set amount of work.

Question 30

Q

Product Owner

Answer

A

Responsible for maximizing the value of the product and managing the Product Backlog.

Question 31

Q

ScrumMaster

Answer

A

A facilitator for the Scrum Team, ensuring they follow Scrum practices and removing impediments.

Question 32

Q

Scrum Team

Answer

A

A self-organizing, cross-functional group responsible for delivering the product increment.

Question 33

Q

Product Backlog

Answer

A

An ordered list of everything that might be needed in the product.

Question 34

Q

Sprint Backlog

Answer

A

The set of Product Backlog items selected for a Sprint, plus a plan for delivering the Sprint Goal.

Question 35

Q

Burndown Chart

Answer

A

A visual representation of the progress of work remaining in a Sprint or Project.

Question 36

Q

Term

Answer

A

Definition

Question 37

Q

Data Science

Answer

A

An interdisciplinary field using scientific methods, processes, algorithms, and systems to extract knowledge or insights from data in various forms.

Question 38

Q

Big Data

Answer

A

Extracting and storing large volumes of data, which data science then refines to produce insights.

Question 39

Q

Qualitative Data

Answer

A

Descriptive piece of information. Example: ‘What a nice day it is’.

Question 40

Q

Quantitative Data

Answer

A

Numerical information. Example: ‘1’, ‘3.65’. Can be further divided into discrete and continuous data.

Question 41

Q

Discrete Data

Answer

A

Quantitative data that can be expressed as a specific value. Example: ‘Number of months in a year’.

Question 42

Q

Continuous Data

Answer

A

Quantitative data that can be any value in an interval. Example: ‘The amount of oxygen in the atmosphere’.

Question 43

Q

Artificial Intelligence (AI)

Answer

A

Creating machines with capabilities that would require intelligence if they were performed by humans.

Question 44

Q

Machine Learning (ML)

Answer

A

A field of AI where systems learn from data without explicit programming.

Question 45

Q

Supervised Learning

Answer

A

A type of machine learning where an algorithm learns from a labeled dataset.

Question 46

Q

Unsupervised Learning

Answer

A

A type of machine learning where an algorithm learns from an unlabeled dataset.

Question 47

Q

Regression Algorithms

Answer

A

Algorithms that predict numerical values of a variable based on historic data.

Question 48

Q

Classification Algorithms

Answer

A

Algorithms that predict which class data belongs to.

Question 49

Q

Anomaly Detection Algorithms

Answer

A

Algorithms used to find outliers or anomalies in data.

Question 50

Q

Cloud Computing

Answer

A

The on-demand delivery of computing resources via the internet with pay-as-you-go pricing.

Question 51

Q

Public Cloud

Answer

A

Services provided over a public network and available to anyone.

Question 52

Q

Private Cloud

Answer

A

Infrastructure dedicated to a single organization, located on-premises or off-premises.

Question 53

Q

Hybrid Cloud

Answer

A

Combines public and private cloud environments, allowing data sharing between them.

Question 54

Q

Community Cloud

Answer

A

Infrastructure and services shared among a specific community or group of organizations.

Question 55

Q

Infrastructure as a Service (IaaS)

Answer

A

Provides virtualized computing resources over the internet.

Question 56

Q

Platform as a Service (PaaS)

Answer

A

Offers a platform for developing, testing, and deploying applications.

Question 57

Q

Software as a Service (SaaS)

Answer

A

Delivers software applications over the internet on a subscription basis.

Question 58

Q

Total Cost of Ownership (TCO)

Answer

A

The financial estimate to help identify direct and indirect costs of a system.

Question 59

Q

AWS Cloud Adoption Framework (CAF)

Answer

A

Provides guidance and best practices to build a comprehensive approach to cloud computing.

Question 60

Q

Industry 4.0

Answer

A

Increasing automation and data exchange in manufacturing technologies.

Question 61

Q

Cognitive Biases

Answer

A

Systematic patterns of deviation from norm or rationality in judgment.

Question 62

Q

Data Visualization

Answer

A

The graphical representation of information and data.

Question 63

Q

Agile Process

Answer

A

Focuses on delivering the highest business value in the shortest time through iterative development.

Question 64

Q

Scrum

Answer

A

An agile framework for managing and controlling complex projects.

Answer 65

A

A short, time-boxed period (typically 2-4 weeks) during which a Scrum Team works to complete a set amount of work.

Answer 66

A

Responsible for maximizing the value of the product and managing the Product Backlog.

Answer 67

A

A facilitator for the Scrum Team, ensuring they follow Scrum practices and removing impediments.

Answer 68

A

A self-organizing, cross-functional group responsible for delivering the product increment.

Answer 69

A

An ordered list of everything that might be needed in the product.

Answer 70

A

The set of Product Backlog items selected for a Sprint, plus a plan for delivering the Sprint Goal.

Answer 71

A

A visual representation of the progress of work remaining in a Sprint or Project.

Answer 72

A

Policies and frameworks ensuring AI development aligns with ethical and legal guidelines.

Answer 73

A

AWS tools for tracking and optimizing service costs.

Answer 74

A

A management service for grouping AWS accounts and applying governance policies.

Answer 75

A

A tool to estimate AWS costs before deployment.

Answer 76

A

A set of values prioritizing individuals, working software, customer collaboration, and responsiveness to change.

Answer 77

A

Computing systems inspired by biological neural networks, used in deep learning.

Answer 78

A

Identifying relationships or patterns in large datasets, commonly used in market basket analysis.

Answer 79

A

The balance between model complexity (overfitting) and generalizability (underfitting).

Answer 80

A

The integration of blockchain for secure, verifiable AI applications.

Answer 81

A

Various data visualization charts like bar charts, scatter plots, line graphs, pie charts, etc.

Answer 82

A

A technique that groups data points based on similarity, commonly used in unsupervised learning.

Answer 83

A

The tendency to search for, interpret, or recall information that confirms pre-existing beliefs.

Answer 84

A

Using contrast (before/after, with/without) to improve clarity in data visualizations.

Answer 85

A

Systems integrating computation with physical processes, essential in Industry 4.0.

Answer 86

A

A short daily meeting where a Scrum team discusses progress and obstacles.

Answer 87

A

A tree-like structure used for classification and regression tasks in ML.

Answer 88

A

Presenting data insights clearly and effectively to inform decision-making.

Answer 89

A

A subset of ML using neural networks with multiple layers to learn representations of data.

Answer 90

A

Processing data near the source rather than relying on centralized cloud servers.

Answer 91

A

Ensuring AI systems are fair, accountable, and do not reinforce biases.

Answer 92

A

AI models designed to provide transparency and interpretability in decision-making.

Answer 93

A

Used to discover insights in data rather than just presenting information.

Answer 94

A

A decentralized computing model that extends cloud services closer to IoT devices.

Answer 95

A

The illusion that something appears more frequently after you first notice it.

Answer 96

A

AI models that create new content, such as images, text, and audio (e.g., ChatGPT, DALL·E).

Answer 97

A

The practice of making decisions as a group in a way that discourages independent thinking.

Answer 98

A

The tendency to let an overall impression of someone influence how we judge their specific traits.

Answer 99

A

The process of optimizing algorithm parameters to improve performance.

Answer 100

A

Algorithms that make predictions based on stored instances rather than general models.

Answer 101

A

A clustering algorithm that partitions data into K clusters based on centroid similarity.

Answer 102

A

A decentralized computing model where data processing occurs directly on devices (e.g., sensors).

Answer 103

A

The degradation of an AI model’s performance over time due to changing data patterns.

Answer 104

A

How the brain processes visual information for decision-making in marketing and design.

Answer 105

A

The belief that we are less likely to experience negative events compared to others.

Answer 106

A

The tendency to see patterns where none exist.

Answer 107

A

AI-driven maintenance strategies that anticipate and prevent equipment failures.

Answer 108

A

A type of ML where an agent learns by interacting with an environment and receiving rewards.

Answer 109

A

The role of fast, instinctive visual processing in human cognition.

Answer 110

A

Software that automates repetitive tasks previously done by humans.

Answer 111

A

The use of AI to optimize supply chains, logistics, and demand forecasting.

Answer 112

A

A method for coordinating multiple Scrum teams in larger organizations.

Answer 113

A

A cloud model where developers deploy code without managing infrastructure.

Answer 114

A

Manufacturing environments that leverage AI, IoT, and automation for efficiency.

Answer 115

A

A Scrum meeting where the team selects work to complete in an upcoming sprint.

Answer 116

A

A Scrum meeting to reflect on a completed sprint and improve future work.

Answer 117

A

A Scrum meeting where the team presents what they accomplished during the sprint.

Answer 118

A

RapidMiner is a data science platform used for machine learning, data preprocessing, and predictive analytics. It allows users to design workflows using a visual, drag-and-drop interface without extensive programming knowledge.

Explanation:
* Used for data cleaning, model training, validation, and deployment.
* Supports both supervised (classification, regression) and unsupervised learning (clustering, anomaly detection).

Answer 119

A

Operators, Connectors, Repositories, Parameters Panel, Results View

*	Operators → The building blocks that perform tasks (e.g., “Read CSV,” “Normalize Data,” “Apply Model”).
*	Connections → The arrows linking operators that define workflow execution order.
*	Repositories → Where datasets, models, and results are stored.
*	Parameters Panel → Where you set options for each operator.
*	Results View → Displays output after execution (tables, charts, performance metrics).

Answer 120

A

Operators are individual tasks (e.g., “Normalize”, “Decision Tree”).
- Processes are full workflows consisting of multiple operators linked together.

Answer 121

A

Using “Read CSV” or “Read Excel” operators for files.
- Connecting to databases using the “Retrieve” operator.
- Manually entering data via the “Create ExampleSet” operator.

Answer 122

A

Handling Missing Values → Use “Replace Missing Values” to fill with mean, median, or mode.
1. Feature Selection → Remove irrelevant attributes using “Select Attributes.”
2. Normalization & Standardization → Use “Normalize” to scale numerical data.
3. Handling Categorical Variables → Use “Nominal to Numerical” for model compatibility.
4. Splitting Data → Use “Split Data” to create training and test sets.

Answer 123

A

Normalization (Min-Max Scaling) → Rescales data between 0 and 1.
Standardization (Z-Score Scaling) → Centers data around mean 0 with unit variance.

Explanation:
* Normalization is best for bounded data (e.g., images with pixel values 0-255).
* Standardization is useful for normally distributed data.

Answer 124

A

Remove missing values if they are few and not critical.
- Replace missing values using mean, median, mode, or predictive models.
- Interpolate missing values using nearest neighbor or regression.

Why is this important: Missing data can bias models, so it must be handled appropriately

Answer 125

A

It creates new features based on existing data using mathematical or logical expressions.
Example: if([Age] > 30, “Senior”, “Junior”)

Answer 126

A

Training Set → Used to train the model.
- Testing Set → Used to evaluate performance on unseen data.

Explanation: Prevents overfitting by ensuring the model generalizes well to new data.

Answer 127

A

It divides data into K subsets, trains on K-1 parts, and tests on the remaining part, repeating K times.

Explanation:
* Prevents overfitting.
* More robust evaluation than a single train/test split.

Answer 128

A

Supervised Learning → Uses labeled data (e.g., classification, regression).
- Unsupervised Learning → Finds patterns in unlabeled data (e.g., clustering, anomaly detection).

Answer 129

A

K-Means is an unsupervised algorithm that groups data into K clusters based on similarity.

Explanation: Used for customer segmentation, anomaly detection, and exploratory data analysis.

Answer 130

A

A Decision Tree splits data into branches based on feature conditions to make predictions.
* Root Node: Initial split based on the most informative feature.
* Leaf Node: Final classification/prediction.

Explanation: Simple, interpretable models useful for classification and regression.

Answer 131

A

It shows the number of True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).
* Accuracy = (TP + TN) / Total Predictions
* Precision = TP / (TP + FP)
* Recall = TP / (TP + FN)

Explanation: Helps evaluate classification performance beyond just accuracy.

⸻

Answer 132

A

Overfitting → Model learns noise instead of patterns, leading to poor performance on new data.
Prevention:
* Pruning Decision Trees (limit depth).
* Regularization (L1/L2 penalties).
* Using simpler models or ensemble learning.

Explanation: Overfit models perform well on training data but fail on unseen data.

Answer 133

A

ROC Curve plots True Positive Rate vs. False Positive Rate.
- AUC (Area Under Curve):
- AUC = 1.0 → Perfect model.
- AUC = 0.5 → Random guessing.

Explanation: Higher AUC means better classification performance.

Answer 134

A

Bagging (e.g., Random Forest) → Reduces variance by training multiple models independently and averaging results.
- Boosting (e.g., XGBoost) → Reduces bias by training models sequentially, improving weak learners.

Final Terms & Concepts List Flashcards

Cover the bulk of terms and concepts taught in the class based on the readings