Have I Been Trained?
ProductCheck if your image has been used to train popular AI art models.
Capabilities6 decomposed
reverse-image-lookup-against-training-datasets
Medium confidenceAccepts an image file and performs reverse-lookup queries against indexed snapshots of popular AI art model training datasets (LAION, Stable Diffusion, Midjourney, DALL-E, etc.) using perceptual hashing and semantic embedding matching. The system likely maintains pre-computed hash tables and vector indices of known training data, then compares incoming images against these indices to detect matches or near-duplicates, returning provenance metadata if found.
Specializes in detecting whether images appear in AI model training datasets by maintaining indexed snapshots of LAION, Stable Diffusion, and other public training corpora, using perceptual hashing to match images even after compression or minor modifications, rather than generic reverse-image search
More targeted than Google Images reverse search because it specifically indexes AI training datasets rather than the general web, and more comprehensive than individual model documentation because it aggregates multiple training sources in one query
multi-model-training-dataset-aggregation
Medium confidenceMaintains a unified index across multiple popular generative AI model training datasets (Stable Diffusion, DALL-E, Midjourney, etc.) and exposes a single query interface to check an image against all indexed datasets simultaneously. This likely involves periodic crawling or partnership access to dataset metadata, normalization of dataset schemas, and a federated search architecture that queries multiple indices in parallel and aggregates results.
Aggregates training dataset indices from multiple competing generative AI models into a single queryable interface, rather than requiring users to check each model's dataset separately or use disparate tools
Broader coverage than checking individual model documentation or using model-specific tools, and more efficient than manual searches across multiple platforms
perceptual-image-matching-with-tolerance
Medium confidenceUses perceptual hashing algorithms (likely pHash, dHash, or similar) to match images even when they have been slightly modified (compressed, cropped, color-shifted, watermarked). The system computes a compact hash fingerprint of the query image and compares it against pre-computed hashes of training dataset images, using a configurable similarity threshold to determine matches. This enables detection of images that are visually identical or near-identical to training data despite minor transformations.
Implements perceptual hashing with configurable tolerance thresholds to detect training dataset images even after compression, cropping, or minor modifications, rather than requiring exact pixel-level matches
More robust than cryptographic hashing (MD5, SHA) which fails on any modification, and more practical than deep learning-based similarity because it's faster and doesn't require GPU resources
training-dataset-provenance-reporting
Medium confidenceWhen a match is detected, generates a detailed report showing which dataset(s) contain the image, metadata about the dataset (size, creation date, model association), and links to source documentation or dataset repositories. The system aggregates metadata from multiple sources and formats it into a human-readable report that provides context about how the image entered the training pipeline.
Aggregates and formats provenance metadata from multiple training dataset sources into a structured report suitable for legal or research purposes, rather than just returning a binary match result
More actionable than raw dataset indices because it contextualizes matches with model associations and source documentation, and more comprehensive than individual model transparency reports
batch-image-dataset-scanning
Medium confidenceAccepts multiple images (via file upload, URL list, or API) and processes them in parallel or queued batches against the training dataset indices. The system likely implements job queuing, rate limiting, and asynchronous processing to handle multiple images without blocking, returning results as a consolidated report or per-image breakdown. This enables artists or platforms to audit large collections of images efficiently.
Implements batch processing with job queuing and asynchronous result delivery to handle multiple image scans efficiently, rather than requiring sequential single-image uploads
More scalable than manual per-image uploads for large portfolios, and more practical than building custom batch infrastructure for individual artists or small platforms
training-dataset-index-maintenance
Medium confidencePeriodically crawls, ingests, and updates indices of public training datasets (LAION snapshots, Stable Diffusion dataset releases, etc.) to keep the searchable corpus current. This likely involves automated pipelines that detect new dataset releases, download metadata, compute perceptual hashes for new images, and update the search indices. The system must handle versioning to track which dataset snapshot was used for each match.
Maintains versioned indices of multiple training dataset snapshots with automated update pipelines, enabling users to understand which dataset version was queried and track how training data evolves over time
More transparent than static indices because it tracks versions and update dates, and more comprehensive than relying on individual model documentation which may lag behind actual training data releases
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Have I Been Trained?, ranked by overlap. Discovered automatically through the match graph.
Have I Been Trained?
Check if your image has been used to train popular AI art...
Qwen: Qwen3 VL 235B A22B Thinking
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....
Qwen: Qwen VL Max
Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.
ShareGPT4V
1.2M image-text pairs with GPT-4V captions.
LLaVA-Instruct 150K
150K visual instruction examples for multimodal model training.
MS COCO (Common Objects in Context)
330K images with object detection, segmentation, and captions.
Best For
- ✓artists and photographers concerned about unauthorized use in AI training
- ✓legal teams investigating copyright violations in generative AI
- ✓content creators wanting to audit their digital footprint across ML datasets
- ✓artists wanting a one-stop verification tool across all major models
- ✓legal/compliance teams needing comprehensive training data audits
- ✓platforms building content moderation features around training data transparency
- ✓artists verifying their work against training datasets with tolerance for compression artifacts
- ✓copyright investigators needing to match images despite minor modifications
Known Limitations
- ⚠Only detects images that were actually included in indexed training snapshots; cannot detect images used in private or proprietary training runs
- ⚠Matching accuracy depends on image quality and whether the exact image or only similar variants were in training data
- ⚠Dataset indices are static snapshots and may not reflect real-time training data collection
- ⚠Cannot distinguish between legitimate licensed use and unauthorized scraping
- ⚠Coverage is limited to publicly documented or accessible training datasets; proprietary models with closed training data cannot be queried
- ⚠Index freshness varies by dataset; some may be months or years old depending on update frequency
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Check if your image has been used to train popular AI art models.
Categories
Alternatives to Have I Been Trained?
Are you the builder of Have I Been Trained??
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →