Terminology Flashcards

1
Q

Watermarking

A

embedding unique, identifiable signals into AI-generated content that is invisible to humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

System card

A

explains how a group of models work together to form a system (similar to model cards)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Synthetic data

A

artificially created data that mimics the statistical properties of real-world data and minimizes privacy risks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Retrieval-augmented generation (RAG)

A

framework that enhances LLM’s by supplementing outputs with reference material which is generally not included in the training data to provide more accurate outputs

(think uploading a doc for summary)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prompt engineering

A

intentional process of structuring detailed instructions, sequences, and keywords to obtain specific outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prompt

A

user input or instruction to generate an output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Adaptive learning

A

ML model that learns a students strengths and weaknesses to tailor personalized instructions and content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance

A

statistical measure of the spread of numbers from the average value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Random forest

A

Supervised ML algorithm that builds and merges multiple decision trees from random data to get more accurate and stable predictions

(useful for data sets with missing data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Greedy algorithm

A

Makes optimal choices for immediate objectives and ignores long term optimal solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Entropy

A

Measure of unpredictability or randomness in an ML dataset

(Higher entropy == greater uncertainty in predictions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bootstrap aggregating

A

ML method that aggregates multiple versions of a model trained on random subsets of data to make it more stable and accurate

Also called “bagging”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Active learning

A

Subfield of ML where the algorithm chooses the data it learns from

Also called “query learning”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Algorithm

A

set of instructions and rules designed to perform a task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Corpus

A

large collection of texts and data AI uses to find patterns and make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inference

A

ML model outputs (predictions or decisions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Input data

A

data provided to the model which is the basis of ML “learning”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Labeled data

A

data with labels, tags, or classes and provides context or meaning for the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

ML model

A

learned representation of patterns and relationships underlying the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Training data

A

subset of data used to train an ML model by recognizing patterns and relationships that the model can identify and make predictions from

60-80% of the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Supervised learning

A

pre-labeled data used to train a model

Example: spam or ham

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data labeling

A

Enriching data for training, validating and texting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Semi-supervised learning

A

using both labeled and unlabeled datasets to train a model to improve reliability while keeping costs down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Unsupervised learning

A

no prelabeled data, so features are extracted from the data by grouping

Example: group animals by type, color, tails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Reinforcement learning
create a model that learns through setting goals and rules to reward and punish decisions without pre-labeled data AKA "Reinforcement Learning from Human Feedback (RLHF)"
26
Clustering
group data that is similar or identical
27
Association rule learning
data mining to identify objects that are associated Example: toothbrush, toothpaste, and mouthwash
28
Discriminative models
determine how to group or categorize data through decision boundaries Example: classify animals
29
Generative models
generate new content based on the underlying characteristics learned
30
Foundation models
large scale neural networks trained on massive amounts of data
31
Neural network
an ML model that has at least one hidden layer through complex nonlinear relationships and patterns
32
Transfer learning
a foundational model trained to perform one task and applied to different but related tasks Think using an LLM that is focused on legal
33
Fine tuning
further training of a model on specific data for a specific purpose. Ex: LLM focused on poetry
34
Artificial Narrow Intelligence (ANI)
weak AI that performs a single function under narrow constraints Ex: Deepblue
35
Broad AI
subset of Artificial Narrow AI that combines multiple narrow AI working together
36
Artificial General Intelligence (AGI)
strong, full, or deep AI human-level intelligence with the capacity for generalization by considering multiple possibilities carrying out discrete tasks to achieve a larger goal (turn $1k into $2k) DOES NOT CURRENTLY EXIST
37
Artificial Super Intelligence (ASI)
intellectual capabilities that far exceed humanity's, including showing consciousness and emotions DOES NOT CURRENTLY EXIST
38
Expert systems
systems designed to emulate human decision-making
39
Fuzzy logic
a method of reasoning that adds in some vagueness to mimic degrees of uncertainty. It relies on: - linguistic variables (cold, cool, warm, hot) - fuzzy rules: if-then statement
40
Robotics
multi-diciplinary field that designs, constructs, operates and programs robots to use AI to interact with the physical world
41
Industry 4.0
4th industrial revolution that improves increased interconnectivity and automation through robotics and manufacturing
42
Machine perception
convergence of AI and robotics through sensors that allow machines to understand their surroundings
43
Robotic Process Automation (RPA)
software robots that automate repetitive tasks through natural language processing (NLP) Example: data entry or forms processing
44
Linear and Statistical Models
models that map a relationship between two variables (temp vs ice cream sales) Model is explainable
45
Decision tree
model that determines the flow of decisions Model is explainable
46
Deep learning
subfield of AI that is a neural network with more than 3 hidden layers Black box that cannot retrace the path to the output
47
Large Language Model
foundation model that uses deep learning and a large number of parameters to understand and interact using language Black box
48
Parameter
internal variable or value that the model learns from the training data used for the basis of ML predictions on new data
49
Variable
measurable attribute that can take on different values, including quantitative or qualitative
50
Hyperparameter
variables that are set before training and control the learning process
51
Features
input variables and attributes that are characteristics of the data used to make predictions
52
Weight
type of parameter that determines the strength of the connection between nodes and can be adjusted during training to optimize predictions
53
Generative Pre-trained Transformer (GPT)
type of neural network that learns the context and relationship between words in a sentence
54
Attention technique
looks at each word in a sentence to guage the relative importance and meaning "he went to the bank..."
55
Multi-modal
can process and produce diverse inputs or outputs
56
Modality
medium of input or output, such as text, speech, image, and video
57
Chatbot
AI designed to simulate human-like conversations or interactions
58
Small Language Model (SLM)
Specialized NLP with few parameters (<1 billion) that are faster to train and more secure
59
Recommendation (AI use case)
proposes suggestions and new content Think Netflix recommendations
60
Recognition (AI use case)
discern different images, speech, faces, and palm geometry Think plagiarism
61
Detection (AI use case)
identify statistical anomalies Think credit card fraud detection
62
Forecasting (AI use cases)
prediction of future changes Think weather forecasting
63
Goal-driven optimization (AI use cases)
determines the best steps from start to finish Think travel route optimization
64
Interaction support (AI use cases)
customer support assistance Think support chatbots
65
Personalization (AI use cases)
tailor user experience to specific preferences through profiling Think news feeds
66
Super computer
very powerful computer used for LLMs
67
FLOPs
Floating Point Operations over time which measure the model training requirements NOT FLOPS
68
Compute
all the computers processing resources, including CPU, GPU, memory, storage, and data processing
69
Serverless compute
processing is not confined to a single server Features: - loose coupling: data from multiple sources - scaling: multiple instances of code
70
High-performance compute
isolated cluster of computer in close proximity that utilizes high-speed networking and specialized chips
71
Trusted execution environments compute
secure area of processor that preserves data and code confidentiality as well as privacy
72
Application
how an AI system is used
73
Platform
software system used to provide an AI system Functions: - data analysis - streamline development and workflows - collaborate - automate tasks - monitor Examples: AWS, Microsoft Azure, Google Cloud
74
Open-source
decentralized development model where the public can contribute and use the code or model
75
Data transformation
processing data into a format that will support a model
76
Data pre-processing
prepare data for an ML model by improving its quality through cleaning, filling in missing data, normalizing, etc Think "cleaning data"
77
Data post-processing
adjusting the model to improve fairness and meet business objectives Think removing the chance that all Nazi's are black
78
Data integrity
ensuring data accuracy and consistency affecting performance
79
Data drift
statistical properties or attributes of data can change over time Think before and after COVID
80
Data observability
monitor the health of the system by comparing pre-determined indices and metrics
81
Validation data
subset of data used for validating the model during the training phase to assess performance, fine tune the parameters, and prevent overfitting 10-20% of data set
82
Testing data
subset of the data used for the final evaluation of the trained model, used to assess performance, ensure real-world readiness, and measure accuracy 10-20% of total data
83
Overfitting
when the model learns the training data to well and fails to generalize to new, unseen data. Results in poor performance and limited real-world applicability
84
Underfitting
when the model fails to capture the complexity of the application due to too few parameters, excessive regularization, and insufficient features Results in poor predictions, low accuracy, and weak performance.
85
Unseen data
data that the model hasn't seen, such as that when the system is implemented
86
Ground truth
known, verified facts that serve as reference data for measuring AI performance
87
Accuracy
Primary indicator of model performance, measuring for correctness, performance, and success
88
Aggregation
combining data into large datasets
89
Data Subject Rights (DSR)
the personal privacy rights of individuals
90
Secondary use
using data for a purpose other than what it was collected for
91
Hallucination
generative AI that creates contradictory or factually inaccurate content
92
Deep fakes
synthetic content intentionally manipulated to cause harm or spread disinformation, such as images, audio, or video
93
Disinformation
deliberately deceptive information meant to confuse or mislead
94
Misinformation
incorrect information which was a mistake and NOT deliberate
95
Echo chamber
individuals exposed only to ideologically similar content or lack of exposure to differing view points
96
AI exceptionalism
assumption that computer systems are infallible and better than humans, causing people to be less likely to challenge outputs
97
Alignment
ability of AI systems to pursue and achieve goals that match the operators intended objectives
98
Bias
a preference or inclination that inhibits impartiality stemming from prejudice and impacts outcomes or creates risks to individual's rights and liberties
99
Adversarial attack
deliberate manipulation of an AI model that causes it to malfunction
100
Model inversion
attacker reverse engineers model to extract information
101
Model extraction
attacker gains access to models parameters
102
Data poisoning
intentionally altering training data to adversely impact model performance
103
Data leakage
unintentional loss of data
104
Data loss
irretrievable data (lost laptop)
105
Filter bubble
Same as echo chamber
106
Model evasion
attacker designs inputs so that the outputs are wrongly classified (noise introduction)
107
Model poisoning
attacker manipulates model parameters to cause model to misbehave
108
Malicious algorithm
model that is trained to attack other systems
109
Data persistence
data outlives the data subject
110
Data repurposing
data used beyond its intended purpose
111
Spillover data
incidental collection of data
112
Harms taxonomy
mapping of harms and negative consequences that could affect data subjects or organizations
113
Risk Management Framework (RMF)
framework to help in the identification, assessment and mitigation of risks
114
Representational harm
reinforcement of unjust societal biases Example: "thugs"
115
Allocative harm
unfair distribution of resources Example: hiring algorithms
116
Quality-of-Service harm
disproportionate underperformance for certain social groups Example: facial recognition for darker skin
117
Interpersonal harm
systems adversely shape relations between people or communities Example: enabling stalking
118
Social system harm
macro level effects that destabilize social systems Example: misinformation
119
Interpretability
ability to explain or present the AI's reasoning for an output or decision
120
Explainability
providing an explanation after a decision is made
121
Opacity
transparency of the ML models decision, including trust, transparency, bias, and fairness
122
Autonomy
decision making oversight of the system, including "Human ___ of the loop"
123
Human in the loop
Humans have oversight of the system, review outputs Example: Autonomous vehicle
124
Humans on the loop
Humans have control over, guide the system Example: recommendation algorithm
125
Humans out of the loop
Humans are not involved in the decision making process Example: Misalignment
126
Accountability
obligation and responsibility of creators and regulators to ensure systems operate in ethical, fair way
127
Contestability
Ensure AI system output and actions can be questioned and challenged, including an appeal to a human for review
128
Reliability
Ensuring a system behaves as expected, including consistency, accuracy
129
Robustness
system performs accurately in a variety of circumstances
130
Transparency
Extent to which info is made available about a system
131
Developer
individual or org that develops or significantly modifies an AI system
132
Deployer
uses an AI system under its authority
133
User
end user or consumer of an AI system
134
Training people
teaching skills to targeted groups
135
Awareness
focus attention on an issue or set of issues for everyone
136
Privacy by Design (PbD)
embedding privacy and data protection into design and operation of IT systems
137
Privacy by Default
highest level of protection is applied automatically combined with PbD to be PbDD
138
Privacy Impact Assessment (PIA)
privacy assessment for processing activities that include personal data
139
Data Protection Impact Assessment (DPIA)
privacy assessment for high risk processing activities that include personal data
140
Data disposition
the amount of time and manner for which data is deleted
141
General Data Protection Regulations (GDPR)
comprehensive privacy laws which went into affect in 2018 for the EU and EEA
142
Anonymization
transforming data into a non-identifiable form, especially making data non-personally identifiable
143
Pseudonymization
protecting data by transforming it, but it could still be identifiable through re-identification
144
Special categories
high risk data that requires enhanced protection as defined by GDPR
145
Sensitive data
high risk data that requires enhanced protection as defined by CCPA
146
Bayesian Improved Surname Geocoding
using less sensitive proxy data for inferring insights
147
Data contoller
organization that processes personal data and determines the purposes and means for processing
148
Intangible assets
inventions, brands, new tech and source code
149
Patents
time limited, exclusive rights protection for inventions
150
Trademark
log, slogan, or brand name
151
Copyright
protects technology (data, code) from unauthorized use and reproduction
152
Trade secrets
confidential information that provides a competitive advantage (secret sauce)
153
Derivative work
expressive creation of an underlying work that is substantial and shows authors personality
154
Fair use
certain cases that allow copyrighted works to be used without authors permission Example: criticism, satire, comment, reporting...
155
Licensing
official permission to do, use or own something Licensor: giving permission Licensee: receiving permission
156
Indemnification
contractual obligation by one part to pay for a loss incurred by the other party
157
Adverse impact
policies or procedures that result in negative effects on a protected group, focusing on outcomes, even if unintentional Also "disparate impact"
158
Automated Decision Making (ADM)
decision making, in part or whole, by means of technology without human involvement
159
Algorithmic disgorgement
when a model is built on data the company should not have had, the model and data must be deleted
160
Federal Reserve Bank
central banking system of the US
161
Article (law)
sets out substantive rules, rights and obligations
162
Annex (law)
provides additional details and technical specifications
163
Recital (law)
Explain reasons, context, and objectives of legislation
164
Extraterritorial
applies to companies that exist in a given region or country, and those that do not but do business in that region or country
165
Provider (EU)
developer of an AI system or GPAI that is on the market Most heavily regulated
166
Deployer (EU)
org that uses an AI system under its authority
167
Importer (EU)
an org that is located or established in the EU and places an AI system on the market (under the developers name)
168
Distributor (EU)
org that makes an AI system available in the EU as a follow-on action to importation and placement on the market
169
Product Manufacturer (EU)
puts an AI system on the market OR operationalizes together with their own product Three types 1. solely product manufacturer (no obligations) 2. becomes a provider 3. becomes a deployer
170
Authorized rep (EU)
person located in the EU that receives a mandate from the AI system or GPAI provider and carries out the provider obligations
171
Principles
guidelines that provide consistency, standards, and ethical use of AI
172
Framework
guidance on operationalizing principles and values
173
AI actor
org or individuals that play an active role in an AI system
174
Risk management
coordinated activities to direct and control an org risk
175
Risk tolerance
readiness to bear risk to achieve objectives
176
Use-case profile
implementation of an AI RMF for a specific setting or application
177
TEVV
testing, evaluation, verification (internal), validation (external) For risk
178
Verification
meets internal stakeholder requirements
179
Validation
meets external stakeholder requirements
180
Red teaming
adversarial approach to testing the safety, security and performance
181
Data wrangling
Data preparation for AI
182
Veracity
accuracy and trustworthy
183
Feature engineering
Pre-processing step for transforming raw data into relevant info to create predictive models
184
Base data pile (3 parts)
Final set of data that includes: - Training data - validation data (used during training phase) - Testing data (assess model and fine tune)
185
Model cards
Description of the model for external stakeholders about how it works and considerations for its use
186
Type 1 error (Confusion matrix)
False positive (unauthorized access)
187
Type 2 error (Confusion matrix)
False negative (inconvenience)
188
Operational controls
systematic measure for day-to-day management of AI systems
189
Benchmarking
evaluation method to assess and compare performance of an AI system
190
Data governance
framework for managing data assets throughout the data lifecycle
191
Data provenance
origin and authenticity of data
192
Data lineage
tracking the flow of data through systems, including transformations and dependencies
193
Data localization
requirement that data must be stored and processed in geographical borders where it was collected
194
Know Your Customer (KYC) regulations
financial institutions must verify customer identity and assess risk in doing business with them
195
Structured data
predefined schema that is searchable and analyzable Example: spreadsheets
196
Unstructured data
data that has no format or organization Example: social media, video
197
Semi-structured data
organizational properties without rigid structure Example: email, medical records
198
Oversight
monitoring of AI systems
199
Assurance
frameworks, policies, processes and controls to measure, evaluate and promote trustworthy AI. Think assessments and certifications
200
Audit
assessment of AI systems for compliance with laws, regulations, and standards
201
Large Language Model (LLM)
GenAI model that trains on a lot of data to be able to analyze, understand, and generate new content
202
Diffusion model
image generation through the refinement of noise into an image
203
Edge case
an input or situation that falls outside of the expectations of the system
204
De-identification
removal of some personal identifiers
205
Obfuscation / Masking
modifying sensitive data
206
Minimization
collecting only the data needed
207
Encryption
mathematical process to encode data
208
Homomorphic encryption
enables encrypted computations and training of data so it is never exposed (not scalable)
209
Secure multi-party computation
orgs compute on a combined data set without revealing the info about the data
210
Federated learning
parties train a shared ML model without aggregating the data by training on the edge and then updating the global model
211
Differential privacy
"noise" is added to the data on a sliding scale to increase privacy OR increase accuracy
212
Repeatable assessment
same team gets the same results with same inputs over time
213
Threat modeling
identify and list threats with their counter measures
214
Brittleness
minor tweaks to inputs ruin a system
215
Catastrophic forgetting
new data overwrites or weakens weights in LLMs
216
Proprietary models
owned and controlled in a closed nature and inaccessible to the public
217