Implement content moderation solutions (10–15%) Flashcards
Which feature of Azure AI Content Safety helps protect large language models from document injection attacks?
Prompt Shields
What is the purpose of the Groundedness detection feature in Azure AI Content Safety?
To verify AI-generated text is based on provided source materials.
Groundedness detection includes a reasoning option in the API response. This adds a reasoning field that explains any ungroundedness detection. However, reasoning increases processing time and costs.
Which social media issues does Azure AI Content Safety address?
The growth of inappropriate online content including bullying and hate speech.
How does Azure AI Content Safety help businesses to protect their brand image?
By moderating comments and messages from customers.
What does moderate text do?
scans text across four categories: violence, hate speech, sexual content, and self-harm. A severity level from 0 to 6 is returned for each category. This level helps to prioritize what needs immediate attention by people, and how urgently. You can also create a blocklist to scan for terms specific to your situation.
Define prompt shields
a unified API to identify and block jailbreak attacks from inputs to LLMs. It includes both user input and documents. These attacks are prompts to LLMs that attempt to bypass the model’s in-built safety features. User prompts are tested to ensure the input to the LLM is safe. Documents are tested to ensure they don’t contain unsafe instructions embedded within the text.
What does protected material do?
checks AI-generated text for protected text such as recipes, copyrighted song lyrics, or other original material.
What image content moderation options exist?
- moderate images
- Moderate multimodal content
What happens in moderate images?
scans for inappropriate content across four categories: violence, self-harm, sexual, and hate. A severity level is returned: safe, low, or high. You then set a threshold level of low, medium, or high. The combination of the severity and threshold level determines whether the image is allowed or blocked for each category.
How does Moderate multimodal content differ from moderate images?
scans both images and text, including text extracted from an image using optical character recognition (OCR). Content is analyzed across four categories: violence, hate speech, sexual content, and self-harm.
Describe custom categories
enables you to create your own categories by providing positive and negative examples, and training the model. Content can then be scanned according to your own category definitions.
What is the purpose of the safety system message?
Safety system message helps you to write effective prompts to guide an AI system’s behavior.
How should content moderation be used?
Azure AI Content Safety works best to support human moderators who can resolve cases of incorrect identification. When people add content to a site, they don’t expect posts to be removed without reason. Communicating with users about why content is removed or flagged as inappropriate helps everyone to understand what is permissible and what isn’t.