February 16, 2025
5
Min Read

Why Enterprise AI Discovery is Hard

Enjoying this article?
Share it with the world!

Discovering the use of AI in the enterprise is a complex problem due to the multi-dimensional nature of AI adoption. AI is the fastest-growing enterprise technology we have ever seen, and it's everywhere. So where do you look for it? The answer is Everywhere! The distributed nature of AI-related activity, the sheer scale of activity and data volume, and the need for precise correlation across multiple sources make the problem even more complex. Unlike traditional IT asset discovery, which relies on relatively standardized inventory tracking, AI discovery requires advanced techniques to profile, link entities, manage highly connected datasets, and assess signal fidelity. Below, we explore the key challenges that make AI discovery a hard problem to solve.

Siloed Data and Entity Linking

AI assets, including models, datasets, applications, and logs, are scattered across internal platforms, third-party repositories (e.g., Hugging Face, S3), and deployed AI workloads. Establishing a unified Generative AI posture requires robust entity linking mechanisms. However, inconsistencies in asset identifiers across systems create significant hurdles in tracking lineage and maintaining an accurate AI inventory.

Example:

An LLM is downloaded from Hugging Face, stored in an internal S3 bucket, processed through a data pipeline, and fine-tuned before deployment in AWS SageMaker. Tracking this lineage requires advanced entity linking across platforms to ensure visibility into its entire lifecycle.

Varied and Highly Connected Data

AI discovery involves correlating multiple data types, such as runtime logs, unstructured prompt exchanges, data flows, user activity, and framework-specific telemetry. These data sources have different update rates, schemas, and identifiers, making it challenging to build a continuously evolving and unified view of AI utilization.

Example:

  • An XDR detects access activity on an endpoint, providing a minimal user representation. Later, an SSO links user identities to AI-related actions, possibly enhancing the visibility of AI access.
  • A network log identifies AI model interactions via API calls, prompting a targeted Docker image scan to uncover embedded AI libraries and their security risks.

Massive Data Volume

Enterprise AI discovery requires processing vast amounts of data from endpoints, SASE, cloud environments, authentication providers, and telemetry logs. A naive approach to comprehensive logging and analysis can quickly become cost-prohibitive. Efficient AI discovery solutions must dynamically filter and optimize data collection and quickly focus on relevant assets and activities.

Example:

A system initially drops all network flows until it detects a Gen AI interaction. It then traces it to a workload. Once identified, it begins collecting data on the workload VM and its immediate network neighbors to assess AI-related activities without over whelming the system.

Signal Fidelity & Confidence Levels

AI-related signals range from high-fidelity indicators (e.g., confirmed AI workloads in a cloud environment) to lower-fidelity signals (e.g., network traffic suggesting AI use). The system must dynamically infer confidence levels, refine its understanding over time, and prioritize high-confidence insights for security and compliance enforcement.

Example:

A virtual machine may lack explicit AI indicators but exhibits telltale signs such as high GPU usage, AI-related API call patterns, and unusual resource consumption. By correlating multiple weak signals, the system can infer that the VM is hosting an ML model and adjust its monitoring strategy accordingly.

Conclusion

Good AI governance starts with comprehensive discovery. AI discovery in the enterprise is far more challenging than traditional IT asset discovery due to the need for entity linking, diverse and dynamic data sources, massive data volumes, and variable signal fidelity. Addressing these challenges requires a purpose-built approach that leverages dynamic inference, optimized data collection, and real-time correlation to build a comprehensive and cost-effective AI posture for enterprises. At Singulr, we are excited to solve this problem.

What are your numbers?

Get a sample report that shows what Singulr can discover.

Request a Live Product Demo Now

By submitting this form, you are agreeing to our Terms & Conditions and Privacy Policy.

Your Request has been Successfully Submitted

Thank you. Our team will contact you shortly.
Oops! Something went wrong while submitting the form.