Learn how to containerize and deploy LangChain applications in production. Best practices for scaling, monitoring, and maintaining AI-powered services.

Building Production-Ready AI Applications with LangChain and Docker

Deploying AI applications to production requires careful consideration of scalability, reliability, and maintainability. This guide covers building production-ready LangChain applications with Docker.

Why Containerize AI Applications?#

Containerization provides:

Consistency - Same environment across dev, staging, and production
Isolation - Dependencies don't conflict
Scalability - Easy horizontal scaling
Portability - Run anywhere Docker runs

Setting Up LangChain with Docker #

Basic Dockerfile #

dockerfile.dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=8000

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Requirements File #

txt.txt

langchain==0.1.0
langchain-openai==0.0.2
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
python-dotenv==1.0.0

Production Considerations #

1. Environment Variables #

python.python

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
LANGCHAIN_TRACING_V2 = os.getenv("LANGCHAIN_TRACING_V2", "false")
LANGCHAIN_ENDPOINT = os.getenv("LANGCHAIN_ENDPOINT")

2. Error Handling #

python.python

from fastapi import FastAPI, HTTPException
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

app = FastAPI()

@app.post("/chat")
async def chat_endpoint(query: str):
    try:
        llm = ChatOpenAI(temperature=0.7)
        prompt = PromptTemplate(
            input_variables=["query"],
            template="Answer the following question: {query}"
        )
        chain = LLMChain(llm=llm, prompt=prompt)
        result = await chain.ainvoke({"query": query})
        return {"response": result["text"]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

3. Rate Limiting #

python.python

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

@app.post("/chat")
@limiter.limit("10/minute")
async def chat_endpoint(request: Request, query: str):
    # Your logic here
    pass

Docker Compose for Development #

yaml.yaml

version: '3.8'

services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - LANGCHAIN_TRACING_V2=true
    volumes:
      - ./:/app
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

volumes:
  redis-data:

Kubernetes Deployment #

yaml.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langchain-api
  template:
    metadata:
      labels:
        app: langchain-api
    spec:
      containers:
      - name: api
        image: myregistry/langchain-api:v1.0.0
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: openai-key
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
---
apiVersion: v1
kind: Service
metadata:
  name: langchain-api
spec:
  selector:
    app: langchain-api
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

Monitoring and Observability #

Health Checks #

python.python

@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "version": "1.0.0",
        "timestamp": datetime.utcnow().isoformat()
    }

Logging #

python.python

import logging
from langchain.callbacks import StdOutCallbackHandler

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

Best Practices #

Use connection pooling for database and external APIs
Implement caching with Redis for frequent queries
Set timeouts on all external API calls
Monitor token usage and costs
Use async/await for better concurrency
Implement retries with exponential backoff

Conclusion #

Building production-ready AI applications requires careful attention to infrastructure, error handling, and observability. Docker and Kubernetes provide the foundation for scalable, reliable deployments.

Production Notes 1 #

For Building Production-Ready AI Applications with LangChain and Docker, define pre-deploy checks, rollout gates, and rollback triggers before release. Track p95 latency, error rate, and cost per request for at least 24 hours after deployment. If the trend regresses from baseline, revert quickly and document the decision in the runbook.

Keep the operating model simple under pressure: one owner per change, one decision channel, and clear stop conditions. Review alert quality regularly to remove noise and ensure on-call engineers can distinguish urgent failures from routine variance.

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Production Notes 2 #

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Production Notes 3 #

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Building Production-Ready AI Applications with LangChain and Docker

Building Production-Ready AI Applications with LangChain and Docker

Why Containerize AI Applications?#

Setting Up LangChain with Docker #

Basic Dockerfile #

Requirements File #

Production Considerations #

1. Environment Variables #

2. Error Handling #

3. Rate Limiting #

Docker Compose for Development #

Kubernetes Deployment #

Monitoring and Observability #

Health Checks #

Logging #

Best Practices #

Conclusion #

Production Notes 1 #

Production Notes 2 #

Production Notes 3 #

Stay Updated

Practical Guide: Linux Performance Baseline Methodology

AWS Cost Optimization: 10 Strategies to Reduce Your Cloud Bill

More from AI

Prompt Engineering Patterns That Actually Work in Production

Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact

Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern

Prompt Engineering Patterns That Actually Work in Production

Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact

Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern

Prompt Versioning and Regression Testing: How Teams Avoid Silent AI Regressions

Monitoring That Actually Helps On-Call: Alerts, Dashboards, and Runbooks

Incident Postmortems That Actually Prevent Repeat Failures

About Kiril Urbonas