Evaluation Overview
This page consolidates evaluation requirements and maps them to corresponding LiteLLM capabilities, documentation links, and examples.
- Core Platform
- Cost & Efficiency
- Enterprise
- Observability
- Performance
- Security & Compliance
Core Platform​
Caching​
| Title | Description | Documentation |
|---|---|---|
| Prompt Caching | Cache repeated prompts to reduce cost and latency across providers. | View Docs |
| Response Caching | Cache model responses with configurable TTL and cache-bypass options. | View Docs |
| Per-Route Cache TTLs | Define different TTLs per route or model for prompt/response cache entries. | View Docs |
| Cache Bypass Controls | Allow clients or rules to skip cache reads/writes for sensitive calls. | View Docs |
| Semantic/Content-Aware Caching | Reduce re-computation by caching semantically-similar requests. | View Docs |
| Cache Invalidation Controls | Clear stale cache entries during rollouts or policy changes. | View Docs |
Routing​
| Title | Description | Documentation |
|---|---|---|
| Unified API Gateway for Multiple LLM Providers | Single endpoint to access local-hosted and multi-cloud LLMs across providers. | View Docs |
| Supported Endpoints Catalog | Core: /chat/completions • /completions • /v1/messagesAudio: /audio/transcriptions • /audio/speechImages: /images/generations • /images/editsEmbeddings & Search: /embeddings • /rerank • /vector_stores • /searchAssistants & Batch: /assistants • /threads • /batches • /responsesOther: /fine_tuning/jobs • /moderations • /ocr • /mcp/tools • /realtime • /files | View Entire List |
| Advanced Routing Strategies | Route based on budget, use-case, availability, rate limits, and lowest cost. | View Docs |
| Reliable Completions | Provider retries and fallbacks for resilient completions with exponential backoff and jitter. | View Docs |
| Cost-Based Routing | Automatically select lowest-cost viable provider/model. | View Docs |
| Rate-Limit-Aware Routing | Choose providers based on available request/token headroom, with fallback to alternate models when nearing RPM/TPM caps. | View Docs |
| Availability-Based Routing | Reroute during provider outages to sustain uptime. | View Docs |
| Budget-Aware Routing | Select route based on remaining budget headroom, with fallback based on per-team/key remaining budget. | View Docs |
| Latency-Aware Routing | Prefer providers with lower observed latency. | View Docs |
| Error-Rate-Aware Routing | Avoid providers showing elevated error rates. | View Docs |
Cost & Efficiency Optimization​
Budgets​
| Title | Description | Documentation |
|---|---|---|
| Usage & Cost Tracking | Track spend and tokens per model, key, user, team, and environment. | View Docs |
| Budget Enforcement Policies | Set and enforce budgets for teams, users, and API keys. | View Docs |
| Budget Refresh Schedules | Support monthly/daily automatic budget refresh windows with configurable duration (seconds, minutes, hours, days). | View Docs |
| Per-Key Budgets | Budget caps for individual API keys. | View Docs |
| Per-User Budgets | Limit spend at the user account level. | View Docs |
| Per-Model Budgets | Assign budgets by model family or provider. | View Docs |
| Team Budgets | Assign budgets and quotas scoped to a team. | View Docs |
Enterprise​
Alerting​
| Title | Description | Documentation |
|---|---|---|
| LLM Performance Alerts | Detect model/provider outages, slow API calls (exceeding alerting_threshold), hanging requests, failed API calls, and sudden error spikes. | View Docs |
| Budget & Spend Alerts | Daily/weekly spend summaries per team or tag, soft budget threshold notifications at X% consumption, and budget limit alerts. | View Docs |
| Daily Health Reports | Automated daily status summaries including top 5 slowest deployments, top 5 deployments with most failed requests, and system health metrics. | View Docs |
Deployment​
| Title | Description | Documentation |
|---|---|---|
| Deployment Options | Deploy via Docker, Kubernetes, Helm, Terraform, AWS CloudFormation, Google Cloud Run, Render, Railway, or Docker Compose with support for database, Redis, and production-ready configurations. | View Docs |
| Control Plane & Data Plane | Separate planes for global management and regional execution with multi-region/multi-cloud failover for high availability. | View Docs |
| Timeout Configuration | Global and per-model/provider timeouts to avoid hung requests. | View Docs |
| Concurrent Usage Testing | Simulate load to validate throughput targets. | View Docs |
Monitoring, Logging & Observability​
Integrations​
| Title | Description | Documentation |
|---|---|---|
| Datadog Integration | Publish metrics and traces to Datadog for dashboards and alerts, including pre-built panels for latency, errors, and usage. | View Docs |
| Prometheus Metrics | Expose proxy metrics for scrape and alert rules. | View Docs |
| SIEM & Tooling Integrations | Forward logs and events to external observability stacks. | View Docs |
Logging​
| Title | Description | Documentation |
|---|---|---|
| Request/Response Logging | Enable or disable logging to capture payloads, identifiers, and outcomes for auditing with structured logging fields (user_id, call_id, model, tokens, latency). | View Docs |
| Logging Payload Specification | Reference documentation for all available fields and data captured in LiteLLM logging payloads. | View Docs |
| Custom Callbacks | Integrate custom logging hooks to process and forward structured logs to external systems (SIEM, observability platforms, databases) with real-time token usage, cost tracking, and event handling. | View Docs |
| PII-Safe Logging Practices | Use Presidio guardrails to mask or block PII, PHI, and sensitive data before logging. | View Docs |
Metrics & Dashboards​
| Title | Description | Documentation |
|---|---|---|
| Latency, Error Rate, Token Usage | Track p50/p95 latency, error counts, and token consumption with dashboards for latency percentiles over time. | View Docs |
| Request Throughput Metrics | Dashboard visualizations showing the number of requests processed over time, broken down by API route, provider, or model. | View Docs |
| Error Rate Panels | Track HTTP status codes and failures. | View Docs |
| Budget & Spend Metrics | Monitor spend/budget usage per team/key with metrics and visualize budget burn-down per team/key. | View Docs |
| Daily Summary Reports | Automated daily summaries of usage and health. | View Docs |
| Cache Metrics | Export cache hit/miss metrics for dashboards. | View Docs |
Performance​
Reliability​
| Title | Description | Documentation |
|---|---|---|
| Production Best Practices | Production deployment recommendations including configuration, machine specifications, Redis optimization, worker management, and database connection pooling. | View Docs |
| Gateway Overhead P50/P90/P99 | LiteLLM proxy adds minimal latency overhead compared to direct provider API calls. | View Docs |
| Provider Latency Comparison | Compare observed latencies across providers. | View Docs |
| Load Test Toolkit | Use mock requests and scenarios to validate SLOs. | View Docs |
Security & Compliance​
Identity​
| Title | Description | Documentation |
|---|---|---|
| RBAC & Team Segmentation | Enforce permissions by roles; segment teams and models. | View Docs |
| User/Team Rate Limits | Set RPM/TPM per user/team/model/key. | View Docs |
| SSO & OAuth | Integrate identity providers via SSO/OAuth. | View Docs |
| MCP Permission Management | Constrain model control permissions by user/team. | View Docs |
| Virtual Keys & Rotation | Create, rotate, and revoke virtual keys at scale with configurable rotation strategy (schedule/events). | View Docs |
| Team-Scoped Keys | Create keys scoped to specific teams for isolation. | View Docs |
| TLS Encryption Policy | TLS 1.2+ for secure transport between clients and gateway for all inbound connections. | View Docs |
| Self-Hosted Data Policy | Ensure no persistent storage of prompts/responses when self-hosted. | View Docs |
| IP Allow/Deny Lists | Enforce network-level access using IP-based policies and prevent lateral movement between teams and models. | View Docs |
| AWS Secrets Manager | Store and rotate provider secrets via AWS Secrets Manager with automation. | View Docs |
Guardrails​
| Title | Description | Documentation |
|---|---|---|
| Guardrails Suite | Configure content filtering, prompt injection detection, PII masking, and security guardrails with support for multiple providers (Presidio, Lakera, Aporia, Bedrock, Pangea, and more). | View Docs |
| PII/PHI Masking | Mask or block personally identifiable information and protected health information using Presidio with configurable entity types and actions. | View Docs |
| Prompt Injection Detection | Detect and block prompt injection attacks and jailbreak attempts using similarity checks, LLM API calls, or third-party services. | View Docs |
| Secret Detection | Detect and mask secrets, API keys, and sensitive credentials in prompts and responses. | View Docs |