Deploy an AI-Powered SaaS App on Railway

saasnodejsexpresspostgres

This guide covers deploying a SaaS application that integrates LLM APIs on Railway. The pattern applies to any product that takes user input, processes it through an LLM, and returns or stores the result: resume builders, content generators, code reviewers, data analyzers, and similar tools.

Railway is a CPU-based platform. Your application calls external LLM APIs (OpenAI, Anthropic, etc.) over HTTP. No models run locally.

Architecture overview

A typical AI SaaS app on Railway uses three components:

API service (Express, FastAPI, etc.) handles user requests, calls the LLM API, and returns results.
Postgres stores user data, cached LLM responses, and job status for async tasks.
Redis (optional) provides job queuing for tasks that take longer than a few seconds.

For tasks that complete in under 30 seconds, a synchronous request/response pattern works well. For longer tasks, use the async workers pattern.

Prerequisites

A Railway account
An API key from OpenAI or Anthropic
Node.js 18+

Set up the project

1. Create the project and database

Create a new project on Railway.
Add PostgreSQL: click + New > Database > PostgreSQL.
The database will be accessible to all services in the project via reference variables.

2. Deploy the API service

Push your code to a GitHub repository.
In your project, click + New > GitHub Repo and select your repository.
Set the start command to: node app.js
Set environment variables under the Variables tab:
- Reference DATABASE_URL from Postgres.
- Add your LLM API key (e.g., OPENAI_API_KEY or ANTHROPIC_API_KEY).
Generate a public domain under Settings > Networking > Public Networking.

3. Structure LLM API calls

Wrap your LLM calls in a function that handles retries and errors. LLM APIs return rate limit errors (HTTP 429) under load:

4. Cache LLM responses

LLM API calls are slow (1-10 seconds) and expensive. Cache responses for identical inputs to reduce latency and cost:

Create the cache table:

5. Handle longer tasks asynchronously

If some requests take more than a few seconds (batch processing, multi-step generation), return a job ID immediately and process in the background. See Deploy an AI Agent with Async Workers for the full pattern with Redis.

For moderate workloads, process jobs in the background within the same service:

Create the jobs table:

This works for single-replica services. For multiple replicas or heavy workloads, use Redis-backed workers instead.

6. Manage costs

LLM API costs scale with usage. Track spending by logging token counts per request:

Set a per-user or per-request token budget to prevent runaway costs. Consider using smaller models (GPT-4o mini, Claude Haiku) for tasks that do not require the most capable model.

Next steps

Deploy an AI Agent with Async Workers: Full async processing pattern with Redis queues.
Deploy an AI Chatbot with Streaming Responses: Add a streaming chat interface.
Deploy a RAG Pipeline with pgvector: Add knowledge retrieval to your app.
Scaling: Configure horizontal and vertical scaling for your services.
PostgreSQL on Railway: Connection pooling, backups, and configuration.