Deploy an AI API Gateway

gatewayapiproxypython

As AI applications scale, managing multiple LLM providers becomes complex. Different services may use different providers, API keys rotate, costs are hard to track, and a single provider outage can take down your entire application.

An AI API gateway sits between your services and LLM providers. It provides a single endpoint with a unified API, model routing, cost tracking, rate limiting, and provider failover.

LiteLLM Proxy is the most widely used open-source option. It exposes an OpenAI-compatible API that routes requests to 100+ LLM providers.

Architecture

LiteLLM Proxy runs as a Railway service with no public domain. Other services in the project call it over private networking.
Redis (optional) provides response caching to reduce API costs and latency.
Postgres (optional) stores request logs and cost data for analytics.

Your application services send requests to the proxy's internal URL instead of directly to OpenAI, Anthropic, or other providers.

Prerequisites

A Railway account
API keys for one or more LLM providers

1. Create the proxy repository

Create a new repository with two files:

litellm_config.yaml:

Dockerfile:

2. Deploy the proxy

Create a new project on Railway.
Click + New > GitHub Repo and select your proxy repository.
Set the following environment variables on the proxy service:

Variable	Value
`OPENAI_API_KEY`	Your OpenAI API key
`ANTHROPIC_API_KEY`	Your Anthropic API key
`LITELLM_MASTER_KEY`	A secret key for proxy admin access

Railway builds the Dockerfile and starts the proxy.

3. Keep the proxy internal

The proxy should not be publicly accessible. Do not generate a public domain for it. Other services in the same project reach it via private networking at:

Replace litellm-proxy with your service name and PORT with the port number shown in the service's networking settings.

4. Connect your services

Update your application services to point at the proxy instead of directly at LLM providers. Since LiteLLM exposes an OpenAI-compatible API, you only need to change the base URL:

Any OpenAI SDK client (Python, Node.js, Go) works with the proxy by changing base_url.

5. Add Redis for caching

Response caching reduces costs by returning cached results for identical requests:

Add Redis to your project.
Add the Redis connection to the proxy's environment variables:

Variable	Value
`REDIS_HOST`	`${{Redis.REDISHOST}}`
`REDIS_PORT`	`${{Redis.REDISPORT}}`
`REDIS_PASSWORD`	`${{Redis.REDISPASSWORD}}`

Enable caching in your litellm_config.yaml:

6. Track costs

LiteLLM logs request metadata including token counts and estimated costs. To persist this data, add Postgres and set:

Variable	Value
`DATABASE_URL`	`${{Postgres.DATABASE_URL}}`

LiteLLM automatically creates its logging tables on first connection. Access cost data through the LiteLLM admin UI or query the database directly.

Next steps

Deploy an AI-Powered SaaS App: Build a product that uses the gateway.
Private Networking: How services communicate within a project.
Redis on Railway: Persistence settings and memory management.
PostgreSQL on Railway: Connection pooling, backups, and configuration.