Image deduplication
for AI and social platforms

We're working with large pretraining and social network teams. We flag near-duplicate images across massive datasets and real-time streams, and inform if images are AI generated or not.

In the deepfake age, images can be instantly copied, edited, and weaponized. Provenance is how platforms verify where content came from and what changed along the way—reducing misinformation, protecting creators, and enforcing policy at scale. Proteus brings practical provenance and deduplication to production without exposing user data.

Why Proteus?

Fast, private perceptual hashing with robust matching — deploy via API, batch, or on‑prem.

Lightning Fast

Process millions of images per hour with our optimized models. Real-time deduplication for streaming workloads, optimized on both CPU and GPU.

Adversarially Robust

Built to withstand attacks. Made for provenance across all social media platforms.

State of the Art

12% higher accuracy than state-of-the-art closed-source image matching systems (Apple's NeuralHash). Resistant to filters, compression, crops, and transformations.

Try It Yourself

Upload an original image and its edited version to compare their perceptual hashes, including our DinoHash algorithm and previous algorithms. This will help you understand how closely the hashes match, indicating the degree of similarity between the two images.

Upload original image

Drag and drop, or click to upload!

.jpeg/.png supported!

Upload modified image

Drag and drop, or click to upload!

.jpeg/.png supported!

Research

Proteus is an open-source platform for AI content provenance, leveraging perceptual hashing, digital signatures, and MPC/FHE to create incorruptible, private, and robust watermarks. The Proteus paper was presented at ICML 2025 at the CODEML Workshop. The Dinohash perceptual hashing algorithm can be used independently of the Proteus system.

Key Innovations

  • DinoHash: Perceptual hashing algorithm robust to common image transformations like filters, compression and crops. Algorithm achieves 12% higher bit accuracy than state-of-the-art methods.
  • Provenance Verification: Perceptual hash values are signed by the content generator, establishing provenance.
  • Privacy-Preserving Queries: Multi-Party Fully Homomorphic Encryption to map image provenance, keeps both user queries and registry data private, with a fallback to MPC if the database is too large.
  • Failsafe Detection: Backup classifier identifies synthetic images not found in the registry with state of the art accuracy, showing 25% better classification accuracy on real-world AI generators.
  • Adversarial Defense: DinoHash is adversarially trained against both hash collision and hash aversion attacks, that limit the attack surface wherein an attacker cannot modify the provenance without visually changing the image.

Ready to get started?

Join large pretraining and image verification teams using Proteus for image deduplication.

View Pricing