Model provenance graph
A model provenance graph is a visual and data-driven record that traces the complete origin and lineage of an AI model — documenting where its training data came from, what base model it was derived from, what fine-tuning or modifications were applied, and how it has changed over time. It's essentially the supply chain map for an AI model. Model provenance matters because organizations deploying AI need to know what they're running. A model that was fine-tuned on biased data carries that bias into production. A model derived from a base model with restrictive licensing might create legal exposure. A model that was modified by multiple teams without documentation becomes impossible to audit. Provenance provides the traceability needed to answer these questions and manage these risks. A model provenance graph typically captures several layers of information: the original training dataset and its sources, the base or foundation model and its version, any fine-tuning datasets and processes applied, the team or vendor responsible for each modification, evaluation results at each stage, and the deployment history showing where the model has been used. The graph format is valuable because model lineage isn't linear — a single base model might branch into multiple fine-tuned variants, each with their own downstream derivatives. For enterprises, model provenance is a governance and compliance essential. Regulators and auditors want to see that organizations can trace how a model was built and what data influenced it. Frameworks like the EU AI Act expect documentation of model lineage, and internal risk teams need provenance data to assess whether a model meets organizational standards before it enters production.