- cheaper, faster and less risky method of enabling LLMs to do useful things with your data - no need to pre-train the model on your own data - avoid risk of leaking data

AIP Logic & LLMs ontologize Flashcards by Dania Alfaro

Two ways of using LLMs in Python Transforms

API Endpoints
Palantir-provided models

How well did you know this?

Not at all

Perfectly

Palantir-provided models

part of AIP
Make working with LLMs from code more ergonomic

How well did you know this?

Not at all

Perfectly

using LLMs in Python transforms

easier to rack up compute costs
rate limits

How well did you know this?

Not at all

Perfectly

Without the AIP library

need to ensure you’re within token limits
more configuration to loop through datasets

How well did you know this?

Not at all

Perfectly

With the transforms-aip Library

Model is an input from the model library
Processes datasets well: each row can serve as its own prompt
maximizes speed given rate limits

How well did you know this?

Not at all

Perfectly

Using LLMs in Pipeline Builder

Quick to implement, but less finetuned control
Can do:
Classification
Sentiment Analysis
Summarization
Translation
Entity Extraction
Use “Empty Prompt” for more open-ended problems

How well did you know this?

Not at all

Perfectly

What is Retrieval Augmented Generation (RAG)

used to augment the capabilities of LLMs by allowing them to generate responses that incorporate information they were not trained on

How well did you know this?

Not at all

Perfectly

Why is RAG useful?

cheaper, faster and less risky method of enabling LLMs to do useful things with your data
no need to pre-train the model on your own data
avoid risk of leaking data

How well did you know this?

Not at all

Perfectly

What are embeddings

Embeddings are vector representations (numbers in matrices) of text that
capture semantic meaning

How well did you know this?

Not at all

Perfectly

Why do we need embeddings

Let us compute relevance

How well did you know this?

Not at all

Perfectly

If you ask an LLM a question, how does it know what data (the text you created embeddings for) is the most relevant for generating a response?

It creates embeddings from your question and then finds the data that is
the closest (in high-dimensional vector space), and therefore most likely to be relevant. It uses the most relevant data to generate a response.

How well did you know this?

Not at all

Perfectly

How do we create
embeddings in
Foundry?

Ingest your Data
Make your data machine-readable (if needed)
Chunk the text
Create embeddings

How well did you know this?

Not at all

Perfectly

Use Media Sets

store PDFs, images, audio files and other non-tabular data

How well did you know this?

Not at all

Perfectly

Use Datasets

store tabular data (e.g. a field that stores free text response from customer interactions)

How well did you know this?

Not at all

Perfectly

Optical Character Recognition (OCR)

Perform in order to extract the text
ex. text stored as images

How well did you know this?

Not at all

Perfectly

If you have audio files

Study These Flashcards

first transcribe the text

Context Preservation

Study These Flashcards

By dividing texts into logical chunks (such as paragraphs or sections),
the embeddings better capture the specific context of each part.

Improved Retrieval

Study These Flashcards

When queries are matched against smaller, more focused chunks,
the system is more likely to retrieve the most relevant text segments
rather than entire documents.

Scalability

Study These Flashcards

Chunking allows parallel processing of text chunks, speeding up the embedding process

Actions

Study These Flashcards

Can edit the Ontology and interact with external systems
Modify objects/links in the ontology
Notifications
Webhooks / API calls

Functions

Study These Flashcards

Can only accept inputs and return outputs
Can’t directly edit to the Ontology

4 components of the Use LLM

Study These Flashcards

System Prompt
Provided Tools (optional)
Task Prompt
Output + Model/Prompting Strategy
Configuration

The Use LLM Block: System Prompt

Study These Flashcards

Tells the LLM what its “role” is
Provides high-level context into the
“frame of mind” it should adopt

The Use LLM Block: Tools

Study These Flashcards

Explicitly provided to the LLM block to use
during processing
Apply Actions – existing Action Types to apply Ontology Edits
Calculator Tool – LLMs’ capabilities with math are still developing; provide this to improve calculation reliability
Call function – existing functions published on Foundry
Current date
Query objects – provision additional, but
controlled access to certain Object Types + Link Types

The Use LLM Block: Task Prompt

- Instructs the LLM with specific tasks to perform - Can reference input variables – including properties if the variable is an Ontology object and the outputs of previous blocks

The Use LLM Block: Output Configuration

- Tells the block what type of data it should return - Selects the Model to use (i.e., OpenAI GPT4, OpenAI GPT4-Turbo, Anthropic Claude 3 Sonnet, etc.) - Prompting Strategy * Chain of Thought * Single Completion

3 Main Components of AIP Logic's interface

1. Inputs, blocks, and outputs configuration 2. Debugger 3. Run panel

AIP Logic & LLMs ontologize Flashcards

(28 cards)