Semantic Asset Classification and Data Cleanser Enable Safe AI Deployment by Automatically Identifying and Protecting Sensitive Documents

Metomic, a leading data security platform, today announced two groundbreaking AI-powered solutions designed to help enterprises safely deploy artificial intelligence tools while protecting sensitive data. The new Semantic Asset Classification and Data Cleanser address critical security vulnerabilities that emerge when organizations integrate AI agents and large language models into their workflows.

Addressing the AI Security Challenge

As enterprises increasingly adopt AI tools like Gemini Gems, Dust, and Microsoft Copilot, they face significant risks of inadvertently exposing sensitive information. Metomic's research demonstrates how AI agents can easily extract confidential data such as employee emails, financial documents, and intellectual property when fed unredacted datasets.

"The magic trick we demonstrated shows the core problem every company faces with AI deployment," said Ben van Enckevort of Metomic. "When you ask an AI tool to 'give me all the emails referenced in this dataset,' it will comply without hesitation exposing sensitive information that should never be accessible."

Semantic Asset Classification: Intelligent Document Labelling

The first solution automatically identifies and labels entire documents based on their content, supporting categories like Board documents, Financial data, HR data, and Intellectual property. The technology combines keyword detection with AI model validation, using multiple frontier models to confirm classifications with high confidence levels.

"Rather than looking for individual detections within documents, we examine the document as a whole and attach appropriate labels," explained Dane Stevens, who led the development. "This gives organizations unprecedented visibility into what types of documents they have, where they're shared, and who can access them."

Data Cleanser: Sanitizing Data for Safe AI Use

The Data Cleanser addresses the need to redact sensitive information before feeding data to AI tools. The solution processes data from multiple sources including Google Drive and Slack channels, automatically removing emails, phone numbers, and other personally identifiable information. In demonstrations, the tool successfully sanitized over 7,000 messages from Metomic's internal development support channel while preserving contextual information needed for AI training.

"We can now take sensitive data sources like internal Slack channels and safely prepare them for AI tools," said Sandro Dolidze. "The Data Cleanser redacts all sensitive information while maintaining the utility of the data for AI training and analysis."

Availability and Future Development

Both solutions are being released in beta with select enterprise customers, with general availability expected in July 2025. Future roadmap items include custom classification labels, batch processing capabilities, and advanced truncation strategies. The platform already supports integration with Google Workspace, Microsoft 365, Zendesk, and various cloud storage platforms.

Early access can be requested on Metomic website.

