Operations and DeploymentIntermediateDraft · pending human review

LLMOps

The operational discipline for keeping generative AI systems reliable, safe, and cost-controlled after they go live.

LLMOps — large language model operations — is the set of practices for running generative AI systems reliably after they go live. Where traditional software versioning tracks code changes, LLMOps tracks what else can shift: the prompts that shape model behavior, the retrieval sources it draws from, the model itself when a provider updates it, and the guardrails meant to prevent unsafe outputs. A generative AI system has more moving parts than most deployed software, and many of those parts are outside the organization's direct control.

Launching a generative AI system is not the hard part — keeping it working is. Prompts drift, retrieval quality degrades as documents go stale, model providers push updates that change behavior without notice, and costs scale unexpectedly when usage grows. Organizations that treat a generative AI launch as a one-time deployment rather than an ongoing service tend to discover these problems through failures: a customer-facing assistant hallucinating outdated information, a sensitive use case suddenly producing unsafe outputs after a model update, or infrastructure costs that weren't budgeted. LLMOps is what prevents each of those discoveries from becoming a crisis.

Related concepts

Generative AI

Optional map

Concept neighborhood

Focused neighborhood

LLMOps

The operational discipline for keeping generative AI systems reliable, safe, and cost-controlled after they go live.

In these paths

Self-Directed

Selected concept

Directly related

One step further

via Large Language Models

via Retrieval-Augmented Generation

via Vector Databases

via Model Monitoring