Using NSFW & Profanity filter for content moderation
FastPix’s moderation feature is designed to detect and flag harmful or inappropriate content, enabling you to maintain a safe and respectful user environment. Once flagged, you can take corrective actions, such as filtering content or addressing the behavior of users creating offensive material.
Example of NSFW Detection
We tested the NSFW filter using a sample video with diverse content. Here are the results:
- Violence: 0.94
- Graphic Violence: 0.85
- Self-Harm: 0.49
These scores range from 0 to 1, where higher scores indicate a stronger confidence in the detected category. The results demonstrate the filter's precision in identifying potentially harmful content.
For more insights, explore our blog: AI Content Moderation Using NSFW and Profanity Filters.
Getting started with content moderation
To implement content moderation, use FastPix’s Create Media from URL or Upload Media from Device API endpoints. These endpoints analyze video or audio files for harmful content.
Steps for detecting new media
Step 1: Collect the URLs of the media files you wish to analyze for moderation.
{
"inputs": [
{
"type": "video",
"url": "https://static.fastpix.io/sample.mp4"
}
],
"moderation": true,
"accessPolicy": "public",
"maxResolution": "1080p"
}
Step 2: Create a JSON request and send a POST request to the /on-demand endpoint:
Request parameters
- Type: Specify whether the media is a "video" or "audio" file.
- URL: Provide the HTTPS URL of the media file for analysis.
- Moderation: Set this to true to enable content moderation.
- Access policy: Determines the visibility of the media (e.g., "public").
- Max resolution: Specify the resolution for processing (e.g., "1080p").
Categories for moderation
The NSFW filter analyzes content based on the type of media (audio or video) and flags it under specific categories:
For audio files: If the content is audio-only, the following categories will be considered for moderation detection
- Harassment
- Harassment/Threatening
- Hate
- Hate/Threatening
- Illicit
- Illicit/Violent
- Sexual/Minors
- Self-Harm
- Self-Harm/Intent
- Self-Harm/Instructions
- Sexual Content
- Violence
- Violence/Graphic
For video files: If the content is video-only, the categories to be considered for moderation are -
- Self-Harm
- Self-Harm/Intent
- Self-Harm/Instructions
- Sexual Content
- Violence
- Violence/Graphic
Confidence score
Each category is assigned a confidence score between 0 and 1, where:
- A score close to 1 indicates a high confidence that the content belongs to the detected category, meaning the moderation detection is strong.
- A score close to 0 suggests low confidence, indicating that the content is less likely to fall under that category.
Detecting moderation in existing media
If you wish to analyze media that has already been uploaded, follow these steps:
- Step 1: Retrieve the media ID of the existing media you want to analyze for moderation.
- Step 2: Create a JSON request and send a PATCH request to the
/on-demand/<mediaId>/moderation
endpoint. Replace<mediaId>
with the ID of the media you want to analyze for moderation.
Example request body:
{
"moderation": {
"type": "video/audio"
}
}
Input parameters:
- Type: Specify whether the media is a "video" or "audio" file.
Accessing moderation results
Moderation results, including categories and confidence scores, can be accessed through the video.mediaAI.moderation.ready
event.
Example of the event data:
{
"type": "video.mediaAI.moderation.ready",
"object": {
"type": "media",
"id": "eb56a668-0354-40c2-9233-f3197e1baabd"
},
"id": "1ba2b9dc-4a2f-4472-a8e8-3ee3b4c44b35",
"workspace": {
"name": "Development",
"id": "c788be40-91a5-4d2d-abf7-47398a6276a1"
},
"status": "ready",
"data": {
"thumbnail": "https://venus-images.fastpix.dev/cf41c9f7-ece3-4efe-8d31-c6e000dc422b/thumbnail.png",
"id": "eb56a668-0354-40c2-9233-f3197e1baabd",
"workspaceId": "c788be40-91a5-4d2d-abf7-47398a6276a1",
"metadata": {
"key1": "value1"
},
"maxResolution": "1080p",
"sourceResolution": "1080p",
"playbackIds": [
{
"id": "cf41c9f7-ece3-4efe-8d31-c6e000dc422b",
"accessPolicy": "public",
"accessRestrictions": {
"domains": {
"defaultPolicy": "allow",
"allow": [],
"deny": []
},
"userAgents": {
"defaultPolicy": "allow",
"allow": [],
"deny": []
}
}
}
],
"tracks": [
{
"id": "344fd5bc-82af-4d11-bc1c-785d9e6f9aef",
"type": "video",
"width": 1920,
"height": 1080,
"frameRate": "30/1",
"closedCaptions": false
}
],
"moderationResult": {
"categoryScores": [
{
"category": "harassment/threatening",
"score": 0.83
},
{
"category": "harassment",
"score": 0.81
},
{
"category": "violence",
"score": 1
}
]
},
"duration": "00:00:10",
"frameRate": "30/1",
"aspectRatio": "16:9",
"createdAt": "2024-12-06T03:47:26.489888Z",
"updatedAt": "2024-12-06T03:47:47.593400Z"
},
"createdAt": "2024-12-12T10:39:37.671899Z",
"attempts": []
}
Content classifications
Below is an overview of the categories detected by the moderation API, along with their descriptions and supported input types:
Category | Description | Inputs |
---|---|---|
Harassment | Content promoting harassment toward any individual or group. | Audio only |
Harassment/Threatening | Harassment combined with threats of violence or harm. | Audio only |
Hate | Content expressing hate based on protected attributes (e.g., race, gender). | Audio only |
Hate/Threatening | Hate speech that includes violent threats. | Audio only |
Illicit | Instructions or advice on committing illegal acts. | Audio only |
Illicit/Violent | Illicit content with references to violence or weapons. | Audio only |
Self-Harm | Content depicting or promoting acts of self-harm. | Audio, video |
Self-Harm/Intent | Expressions of intent to engage in self-harm. | Audio, video |
Self-Harm/Instructions | Instructions on committing self-harm acts. | Audio, video |
Sexual | Explicit content intended to arouse or promote sexual services. | Audio, video |
Sexual/Minors | Explicit content involving individuals under 18 years old. | Audio only |
Violence | Depictions of physical harm or injury. | Audio, video |
Violence/Graphic | Graphic depictions of physical harm or injury. | Audio, video |
How NSFW detection works
Spritesheets are a core feature in FastPix’s NSFW detection, providing a structured and efficient method to analyze video frames for harmful or inappropriate content. By capturing representative snapshots of a video, spritesheets help streamline the detection process.
We’re evolving about how we use spritesheets for NSFW detection to address varying content moderation needs. Here's an in-depth look at how we are working on it across different phases:
Phase 1: What we currently have
In the first phase of spritesheet-based NSFW detection, a fixed number of thumbnails is generated based on the video duration. This method provides a balance between processing efficiency and accuracy:
For videos shorter than 15 minutes:
A total of 50 fixed thumbnails are generated. For example:
- A 6-minute video (360 seconds) will produce a thumbnail every 7.2 seconds.
- A 10-minute video (600 seconds) will produce a thumbnail every 12 seconds.
For videos longer than 15 minutes:
A total of 100 fixed thumbnails are generated. For example:
- A 1-hour video (3,600 seconds) will produce a thumbnail every 36 seconds.
- A 2-hour video (7,200 seconds) will produce a thumbnail every 72 seconds.
This approach ensures that key frames are captured without overwhelming the processing system. However, it may lead to reduced detection accuracy for longer videos, as fewer frames per second are analyzed.
Phase 2: Upcoming enhancements (Beta)
Recognizing the limitations of Phase 1, our upcoming Phase 2 plans introduces significant enhancements to spritesheet generation to improve accuracy and granularity in detecting NSFW content.
Increased thumbnail granularity
- Thumbnails will be generated at a rate of one per second, regardless of video duration.
- This approach ensures that every second of the video is analyzed, capturing nuanced frames that may contain harmful content.
- Example:
- A 6-minute video will have 360 thumbnails (one for each second).
- A 2-hour video will have 7,200 thumbnails, providing far greater coverage than the fixed 100 in Phase 1.
Enhanced audio analysis
- For audio content, timestamps will be captured whenever explicit or inappropriate language is detected.
- These timestamps can be used for precise interventions, such as censoring or beeping specific words.
Improved processing algorithms
- Advanced algorithms will be integrated to analyze higher volumes of data while maintaining processing speed.
- This will allow the system to scale effectively for longer and more complex videos.
Updated 12 days ago