CTO Abhijit Sharma Explains why AI Discovery is Hard

RECENT BLOGS

Discovering the use of AI in the enterprise is a complex problem due to the multi-dimensional nature of AI adoption. AI is the fastest-growing enterprise technology we have ever seen, and it's everywhere. So where do you look for it? The answer is Everywhere! The distributed nature of AI-related activity, the sheer scale of activity and data volume, and the need for precise correlation across multiple sources make the problem even more complex. Unlike traditional IT asset discovery, which relies on relatively standardized inventory tracking, AI discovery requires advanced techniques to profile, link entities, manage highly connected datasets, and assess signal fidelity. Below, we explore the key challenges that make AI discovery a hard problem to solve.

Siloed Data and Entity Linking

AI assets, including models, datasets, applications, and logs, are scattered across internal platforms, third-party repositories (e.g., Hugging Face, S3), and deployed AI workloads. Establishing a unified Generative AI posture requires robust entity linking mechanisms. However, inconsistencies in asset identifiers across systems create significant hurdles in tracking lineage and maintaining an accurate AI inventory.

Example:

An LLM is downloaded from Hugging Face, stored in an internal S3 bucket, processed through a data pipeline, and fine-tuned before deployment in AWS SageMaker. Tracking this lineage requires advanced entity linking across platforms to ensure visibility into its entire lifecycle.

Varied and Highly Connected Data

AI discovery involves correlating multiple data types, such as runtime logs, unstructured prompt exchanges, data flows, user activity, and framework-specific telemetry. These data sources have different update rates, schemas, and identifiers, making it challenging to build a continuously evolving and unified view of AI utilization.

Example:

An XDR detects access activity on an endpoint, providing a minimal user representation. Later, an SSO links user identities to AI-related actions, possibly enhancing the visibility of AI access.
A network log identifies AI model interactions via API calls, prompting a targeted Docker image scan to uncover embedded AI libraries and their security risks.

Massive Data Volume

Enterprise AI discovery requires processing vast amounts of data from endpoints, SASE, cloud environments, authentication providers, and telemetry logs. A naive approach to comprehensive logging and analysis can quickly become cost-prohibitive. Efficient AI discovery solutions must dynamically filter and optimize data collection and quickly focus on relevant assets and activities.

Example:

A system initially drops all network flows until it detects a Gen AI interaction. It then traces it to a workload. Once identified, it begins collecting data on the workload VM and its immediate network neighbors to assess AI-related activities without over whelming the system.

Signal Fidelity & Confidence Levels

AI-related signals range from high-fidelity indicators (e.g., confirmed AI workloads in a cloud environment) to lower-fidelity signals (e.g., network traffic suggesting AI use). The system must dynamically infer confidence levels, refine its understanding over time, and prioritize high-confidence insights for security and compliance enforcement.

Example:

A virtual machine may lack explicit AI indicators but exhibits telltale signs such as high GPU usage, AI-related API call patterns, and unusual resource consumption. By correlating multiple weak signals, the system can infer that the VM is hosting an ML model and adjust its monitoring strategy accordingly.

Conclusion

Good AI governance starts with comprehensive discovery. AI discovery in the enterprise is far more challenging than traditional IT asset discovery due to the need for entity linking, diverse and dynamic data sources, massive data volumes, and variable signal fidelity. Addressing these challenges requires a purpose-built approach that leverages dynamic inference, optimized data collection, and real-time correlation to build a comprehensive and cost-effective AI posture for enterprises. At Singulr, we are excited to solve this problem.

Additional Resources

Look for our next blog from CEO Shiv Agarwal - about how vet the risk of 3x kinds of AI you might discover.

‍

Why Enterprise AI Discovery is Hard

Siloed Data and Entity Linking

Example:

Varied and Highly Connected Data

Example:

Massive Data Volume

Example:

Signal Fidelity & Confidence Levels

Example:

Conclusion

Additional Resources

What are your numbers?

Request a Live Product Demo Now

Your Request has been Successfully Submitted

Why Enterprise AI Discovery is Hard

Newsletter

Siloed Data and Entity Linking

Example:

Varied and Highly Connected Data

Example:

Massive Data Volume

Example:

Signal Fidelity & Confidence Levels

Example:

Conclusion

Additional Resources

What are your numbers?

Request a Live Product Demo Now

Your Request has been Successfully Submitted