Deploy an AI API Gateway
As AI applications scale, managing multiple LLM providers becomes complex. Different services may use different providers, API keys rotate, costs are hard to track, and a single provider outage can take down your entire application.
An AI API gateway sits between your services and LLM providers. It provides a single endpoint with a unified API, model routing, cost tracking, rate limiting, and provider failover.
LiteLLM Proxy is the most widely used open-source option. It exposes an OpenAI-compatible API that routes requests to 100+ LLM providers.
Architecture
- LiteLLM Proxy runs as a Railway service with no public domain. Other services in the project call it over private networking.
- Redis (optional) provides response caching to reduce API costs and latency.
- Postgres (optional) stores request logs and cost data for analytics.
Your application services send requests to the proxy's internal URL instead of directly to OpenAI, Anthropic, or other providers.
Prerequisites
- A Railway account
- API keys for one or more LLM providers
1. Create the proxy repository
Create a new repository with two files:
litellm_config.yaml:
Dockerfile:
2. Deploy the proxy
- Create a new project on Railway.
- Click + New > GitHub Repo and select your proxy repository.
- Set the following environment variables on the proxy service:
| Variable | Value |
|---|---|
OPENAI_API_KEY | Your OpenAI API key |
ANTHROPIC_API_KEY | Your Anthropic API key |
LITELLM_MASTER_KEY | A secret key for proxy admin access |
- Railway builds the Dockerfile and starts the proxy.
3. Keep the proxy internal
The proxy should not be publicly accessible. Do not generate a public domain for it. Other services in the same project reach it via private networking at:
Replace litellm-proxy with your service name and PORT with the port number shown in the service's networking settings.
4. Connect your services
Update your application services to point at the proxy instead of directly at LLM providers. Since LiteLLM exposes an OpenAI-compatible API, you only need to change the base URL:
Any OpenAI SDK client (Python, Node.js, Go) works with the proxy by changing base_url.
5. Add Redis for caching
Response caching reduces costs by returning cached results for identical requests:
- Add Redis to your project.
- Add the Redis connection to the proxy's environment variables:
| Variable | Value |
|---|---|
REDIS_HOST | ${{Redis.REDISHOST}} |
REDIS_PORT | ${{Redis.REDISPORT}} |
REDIS_PASSWORD | ${{Redis.REDISPASSWORD}} |
- Enable caching in your
litellm_config.yaml:
6. Track costs
LiteLLM logs request metadata including token counts and estimated costs. To persist this data, add Postgres and set:
| Variable | Value |
|---|---|
DATABASE_URL | ${{Postgres.DATABASE_URL}} |
LiteLLM automatically creates its logging tables on first connection. Access cost data through the LiteLLM admin UI or query the database directly.
Next steps
- Deploy an AI-Powered SaaS App: Build a product that uses the gateway.
- Private Networking: How services communicate within a project.
- Redis on Railway: Persistence settings and memory management.
- PostgreSQL on Railway: Connection pooling, backups, and configuration.