Amazon Rekognition
Computer vision service for image and video analysis
Rekognition is like having a computer that can see and understand images and videos. You give it a photo, and it tells you what's in it: people, objects, text, scenes. It can detect faces, recognize celebrities, identify unsafe content, and even track people across video frames. It's pre-trained on millions of images, so you don't need ML expertise. Perfect for applications like photo tagging, content moderation, security surveillance, or identity verification. Think of it as giving your application the ability to see and understand visual content.
Rekognition provides APIs for image and video analysis. Image APIs: detect labels (objects, scenes), detect faces (age, gender, emotions), recognize celebrities, detect text (OCR), and detect unsafe content. Video APIs: track people, detect activities, recognize celebrities in videos. Rekognition also supports custom labels, letting you train models to detect your specific objects (logos, products).
Key Capabilities
- DetectLabels identifies thousands of objects, scenes, and activities in images with confidence scores, requiring no model training
- DetectFaces returns facial attributes (age range, emotions, landmarks, pose, quality) and CompareFaces measures similarity between two faces for verification use cases
- Face Collections store and index facial embeddings for SearchFacesByImage, enabling face search across millions of stored faces
- Custom Labels trains Rekognition on your own image dataset to recognize domain-specific objects or scenes not covered by the general-purpose models
- Video analysis supports both stored video (async S3-based jobs) and live streaming video via Kinesis Video Streams with real-time event alerting
- DetectModerationLabels identifies unsafe or explicit content for content moderation pipelines; DetectText performs OCR on images for text extraction
Gotchas & Constraints
Gotcha #1: Rekognition charges per image/video minute analyzed, and costs can add up for high-volume applications. Gotcha #2: Accuracy varies by image quality; low-resolution or poorly lit images may have lower accuracy. Constraints: Maximum 15MB image size (S3) or 5MB (API), maximum 10GB video size, and face collection limited to 20 million faces.
A social media platform needs to moderate user-uploaded content: detect inappropriate images, identify celebrities, and tag photos automatically. Manually reviewing millions of images daily is impossible. They use Rekognition: when a user uploads an image, they call Rekognition's detect moderation labels API. If unsafe content is detected (nudity, violence), the image is flagged for review. They use detect labels to automatically tag images ('beach', 'sunset', 'dog'), making photos searchable. For celebrity detection, they identify famous people in photos and suggest tags. For face verification, they use face comparison to verify user identity during account creation by comparing selfie to ID photo. They process 10 million images/day, and Rekognition handles it all with sub-second latency.
The Result
95% reduction in manual moderation, automatic photo tagging, and improved user experience.