Uptime and Reliability

RevKeen is built on infrastructure designed for high availability. This page covers how we keep the platform running, what happens when things go wrong, and what uptime you can expect.

Infrastructure Overview

RevKeen's core services run on AWS in the eu-central-1 (Frankfurt) region.

Component	Technology	Availability Design
API and application server	AWS Fargate (ECS)	Multiple containers across availability zones
Database	AWS RDS PostgreSQL	Automated backups, point-in-time recovery
Background jobs	Trigger.dev Cloud	Managed execution with automatic retries
Edge and CDN	Cloudflare / Vercel	Global edge network with automatic failover
Real-time messaging	Upstash Redis	Managed Redis with replication
DNS	Cloudflare	Anycast DNS with DDoS protection

RevKeen's Fargate services run across multiple AWS Availability Zones. If one AZ experiences an outage, traffic is automatically routed to healthy containers in other zones. There is no single point of failure at the application tier.

Auto-scaling

Application containers scale automatically based on CPU and memory utilization. During traffic spikes -- such as a large batch of invoices being sent -- additional containers are launched to handle the load without degrading response times.

Database Reliability

RevKeen's primary database runs on AWS RDS PostgreSQL 18 in the eu-central-1 (Frankfurt) region:

Automated daily backups with configurable retention.
Point-in-time recovery (PITR) allows restoring the database to any second within the retention window.
Connection pooling ensures stable performance under high connection counts.
Read replicas can be provisioned for read-heavy workloads without impacting write performance.

Database maintenance (such as PostgreSQL version upgrades) is managed by AWS RDS with minimal downtime, typically during low-traffic windows.

Status Page

RevKeen publishes real-time platform status and historical uptime at:

status.revkeen.com

The status page shows:

Current operational status for all services (API, dashboard, checkout, webhooks).
Active and resolved incidents with timestamps and impact descriptions.
Scheduled maintenance windows.
Historical uptime metrics.

You can subscribe to status updates via email or RSS to receive notifications when incidents are reported or resolved.

Incident Response

When an issue is detected, RevKeen follows a structured incident response process:

Detection

Incidents are detected through multiple channels:

Automated monitoring -- Grafana alerts on error rates, latency spikes, and infrastructure anomalies.
Synthetic checks -- Periodic health checks against critical endpoints (API, checkout, webhooks).
Customer reports -- Issues reported through support channels.

Response Timeline

Severity	Definition	Response Target	Update Frequency
Critical	Payment processing or checkout is unavailable	15 minutes	Every 30 minutes
High	Major feature degraded (dashboard, webhooks)	30 minutes	Every hour
Medium	Non-critical feature impacted	2 hours	As progress is made
Low	Minor issue, no customer impact	Next business day	On resolution

Process

Acknowledge -- The on-call engineer acknowledges the alert and begins investigation.
Communicate -- A status page update is posted describing the issue and estimated impact.
Mitigate -- The immediate priority is restoring service, even if the root cause is not yet identified.
Resolve -- The underlying issue is fixed and verified.
Review -- A post-incident review identifies root cause, contributing factors, and preventive measures.

Post-incident reviews for Critical and High severity incidents are shared with affected merchants upon request.

Planned Maintenance

RevKeen schedules maintenance windows to minimize disruption:

Routine maintenance is performed during low-traffic periods, typically weekday mornings (UTC).
Advance notice is provided at least 48 hours before any maintenance that may cause downtime.
Zero-downtime deployments are the default for application updates. New containers are started and verified before old containers are drained.
Database maintenance follows AWS RDS's managed upgrade process, which typically involves seconds of downtime rather than minutes.

Scheduled maintenance is announced on the status page and via email to account administrators.

SLA Overview

RevKeen targets the following service levels:

Metric	Target
API availability (monthly)	99.9%
Checkout availability (monthly)	99.9%
Webhook delivery (first attempt)	Within 30 seconds of event
Webhook delivery (with retries)	Within 24 hours, with exponential backoff
Dashboard availability	99.5%
Planned maintenance downtime	Less than 1 hour per month

Availability is measured as the percentage of time the service responds to valid requests with non-error responses, excluding scheduled maintenance windows.

For merchants on enterprise plans, custom SLAs with financial commitments are available. Contact sales@revkeen.com for details.

What Happens During an Outage

If RevKeen experiences downtime, here is what you can expect:

Checkout -- If the checkout service is unavailable, customers will see an error page. No partial charges will be created.
Webhooks -- Events that occur during an outage are queued and delivered with retries once service is restored. You will not miss events.
Subscriptions -- Renewal attempts that fail due to a RevKeen outage are automatically retried. No subscriptions are cancelled due to platform downtime.
Dashboard -- The dashboard may be temporarily unavailable, but no data is lost. All transactions continue to be recorded and will appear once the dashboard is restored.