|Docs

Deploy an AI API Gateway

gatewayapiproxypython

As AI applications scale, managing multiple LLM providers becomes complex. Different services may use different providers, API keys rotate, costs are hard to track, and a single provider outage can take down your entire application.

An AI API gateway sits between your services and LLM providers. It provides a single endpoint with a unified API, model routing, cost tracking, rate limiting, and provider failover.

LiteLLM Proxy is the most widely used open-source option. It exposes an OpenAI-compatible API that routes requests to 100+ LLM providers.

Architecture

  • LiteLLM Proxy runs as a Railway service with no public domain. Other services in the project call it over private networking.
  • Redis (optional) provides response caching to reduce API costs and latency.
  • Postgres (optional) stores request logs and cost data for analytics.

Your application services send requests to the proxy's internal URL instead of directly to OpenAI, Anthropic, or other providers.

Prerequisites

  • A Railway account
  • API keys for one or more LLM providers

1. Create the proxy repository

Create a new repository with two files:

litellm_config.yaml:

Dockerfile:

2. Deploy the proxy

  1. Create a new project on Railway.
  2. Click + New > GitHub Repo and select your proxy repository.
  3. Set the following environment variables on the proxy service:
VariableValue
OPENAI_API_KEYYour OpenAI API key
ANTHROPIC_API_KEYYour Anthropic API key
LITELLM_MASTER_KEYA secret key for proxy admin access
  1. Railway builds the Dockerfile and starts the proxy.

3. Keep the proxy internal

The proxy should not be publicly accessible. Do not generate a public domain for it. Other services in the same project reach it via private networking at:

Replace litellm-proxy with your service name and PORT with the port number shown in the service's networking settings.

4. Connect your services

Update your application services to point at the proxy instead of directly at LLM providers. Since LiteLLM exposes an OpenAI-compatible API, you only need to change the base URL:

Any OpenAI SDK client (Python, Node.js, Go) works with the proxy by changing base_url.

5. Add Redis for caching

Response caching reduces costs by returning cached results for identical requests:

  1. Add Redis to your project.
  2. Add the Redis connection to the proxy's environment variables:
VariableValue
REDIS_HOST${{Redis.REDISHOST}}
REDIS_PORT${{Redis.REDISPORT}}
REDIS_PASSWORD${{Redis.REDISPASSWORD}}
  1. Enable caching in your litellm_config.yaml:

6. Track costs

LiteLLM logs request metadata including token counts and estimated costs. To persist this data, add Postgres and set:

VariableValue
DATABASE_URL${{Postgres.DATABASE_URL}}

LiteLLM automatically creates its logging tables on first connection. Access cost data through the LiteLLM admin UI or query the database directly.

Next steps