Use Cases

The pattern is always the same: tag your traffic by priority, let the reflex engine react to real backend health, and let the SDK make single-digit microsecond decisions at the edge. No static rate limits. No manual intervention. Your existing infrastructure gets smarter.

GraphQL API

GraphQL lets clients ask for anything, which means a single query can be cheap or devastating. Think of APIs like the Discogs music catalog (60 authenticated requests/minute), the GitHub GraphQL API, or the Shopify Storefront API. These all enforce static rate limits. WaitState layers adaptive gating on top, based on what your backend can actually handle right now.

The key insight: tag by API tier. Anonymous traffic gets the lowest weight. Authenticated tiers get progressively higher weights. Create reflex rules that block tiers by name under load — anonymous bots and free-tier scrapers are shed before a single paying customer is affected.

Tags

anonymous weight: 1

free weight: 5

pro weight: 10

Integration

import express from 'express';
import { WaitState } from '@waitstate/sdk';

const app = express();
const ws = new WaitState({
  publishKey: process.env.WAITSTATE_PUBLISH_KEY,
  secretKey: process.env.WAITSTATE_SECRET_KEY,
});

// Tier determines both the tag and the gate weight.
// Create matching tags in the dashboard with these weights.
const tierWeights: Record<string, number> = {
  anonymous: 1,
  free: 5,
  pro: 10,
};

app.use('/graphql', express.json(), (req, res, next) => {
  const tier = req.headers['x-api-tier'] as string || 'anonymous';
  const weight = tierWeights[tier] ?? 1;
  const decision = ws.gate(tier, weight);

  if (!decision.allowed) {
    return res.status(429).json({
      errors: [{ message: 'Rate limited', extensions: { code: 'RATE_LIMITED' } }],
    });
  }

  // Track latency and errors per tier for reflex rules
  const stop = ws.startTimer(tier);
  res.on('finish', () => {
    stop();
    if (res.statusCode >= 500) ws.reportError(tier);
  });

  next();
});

Reflex rules

When latency > 300ms

Then block anonymous

When latency > 500ms

Then block free

When errors > 20

Then block anonymous

When errors > 50

Then block free

A bot starts hammering /graphql with deeply nested queries to map your schema. Latency rises to 350ms. Rule #1 fires and blocks all anonymous traffic. Your authenticated users are unaffected. If latency keeps climbing past 500ms, rule #2 blocks free too — only pro users get through. If errors spike past 20, rule #3 reinforces the anonymous block. Past 50 errors, rule #4 blocks free-tier as well. Pro users keep flowing throughout the incident.

Compare this to Discogs' static 60 req/min. That limit applies equally whether your database is idle or on fire. WaitState lets you serve 10,000 req/min when healthy and gracefully shed to 100 req/min when your Postgres connection pool is saturated.

Apollo Router integration

If you run Apollo Router, you don't need any middleware code. The WaitState Rust agent includes a /coprocess route that implements Apollo's external coprocessor protocol at the SupergraphRequest stage. Deploy the agent as a DaemonSet sidecar and point your router config at it.

# router.yaml — Apollo Router external coprocessor config
# The WaitState agent runs as a DaemonSet sidecar on port 9000.
# Point the coprocessor at its /coprocess route.

coprocessor:
  url: http://waitstate-agent:9000/coprocess
  router:
    request:
      headers: true
      body: true

The agent reads the tag from the x-waitstate-tag header first, falling back to operationName from the GraphQL body. Allowed requests pass through unchanged (control.break: null). Denied requests return a 429 with a GraphQL-formatted error body:

// Denied response from the agent coprocessor
{
  "control": { "break": 429 },
  "body": {
    "errors": [{
      "message": "Request denied: over_weight",
      "extensions": { "code": "RATE_LIMITED" }
    }]
  }
}

The agent also accepts optional latencyMs and error fields in the request body, feeding observed performance data back into the telemetry pipeline. No SDK initialization, no Express middleware — just a YAML config change and a sidecar.

AI / LLM API Gateway

You're building something like OpenRouter or an internal AI proxy that routes requests to multiple LLM providers. Each provider has different capacity, latency, and failure modes. A single WaitState instance can't capture that. Use site sharding to give each provider its own health domain.

Tags

free weight: 1

pro weight: 5

enterprise weight: 10

Integration

import express from 'express';
import { WaitState } from '@waitstate/sdk';

const app = express();

// Separate site per upstream provider - independent health tracking
const gates = {
  openai: new WaitState({
    publishKey: process.env.WAITSTATE_PUBLISH_KEY,
    secretKey: process.env.WAITSTATE_SECRET_KEY,
    siteId: 'openai',
  }),
  anthropic: new WaitState({
    publishKey: process.env.WAITSTATE_PUBLISH_KEY,
    secretKey: process.env.WAITSTATE_SECRET_KEY,
    siteId: 'anthropic',
  }),
};

app.post('/v1/chat/completions', (req, res, next) => {
  const tier = req.user?.plan ?? 'free';
  const weight = { free: 1, pro: 5, enterprise: 10 }[tier] ?? 1;
  const provider = req.body.model.startsWith('claude') ? 'anthropic' : 'openai';
  const decision = gates[provider].gate(tier, weight);

  if (!decision.allowed) {
    return res.status(429).json({ error: 'rate_limited', provider });
  }
  next();
});

Reflex rules

When latency > 5s

Then block free

When latency > 10s

Then block pro

When errors > 10

Then block free

OpenAI starts returning slow responses (latency > 5s). The openai site's reflex engine fires rule #1 and blocks free-tier traffic to that provider. Pro and enterprise users still get through. Meanwhile, the anthropic site is completely unaffected because each site has its own aggregator with independent health tracking. If latency climbs past 10s, rule #2 blocks pro too — only enterprise flows through. If OpenAI starts returning 500s, rule #3 reinforces the free-tier block on errors. Enterprise users always flow through because no rule targets them.

E-Commerce

Flash sales, product drops, and seasonal spikes can overwhelm checkout, inventory, and payment services. The worst outcome is losing customers who already have items in their cart. Tag by funnel stage so the people closest to buying are the last to be shed.

Tags

browse weight: 1

cart weight: 5

checkout weight: 10

Integration

import express from 'express';
import { WaitState } from '@waitstate/sdk';

const app = express();
const ws = new WaitState({
  publishKey: process.env.WAITSTATE_PUBLISH_KEY,
  secretKey: process.env.WAITSTATE_SECRET_KEY,
  siteId: 'storefront',
});

// Map routes to traffic tiers (paths relative to mount point)
const routeWeights: Record<string, [string, number]> = {
  '/products':  ['browse', 1],
  '/search':    ['browse', 1],
  '/cart':      ['cart', 5],
  '/checkout':  ['checkout', 10],
  '/payment':   ['checkout', 10],
};

app.use('/api', (req, res, next) => {
  const [tag, weight] = routeWeights[req.path] ?? ['browse', 1];
  const decision = ws.gate(tag, weight);

  if (!decision.allowed) {
    return res.status(429).json({ error: 'rate_limited', retry_after: 5 });
  }
  next();
});

Use separate sites for storefront vs. admin so a merchant refreshing their dashboard never throttles a buyer.

Reflex rules

When latency > 300ms

Then block browse

When latency > 800ms

Then block cart

When errors > 20

Then block browse

When errors > 50

Then block cart

A flash sale drives 50x normal browsing traffic. Latency hits 400ms. Rule #1 fires and blocks all browse traffic — product listing pages show a "try again in a moment" message. Cart and checkout are unaffected. If latency keeps climbing past 800ms, rule #2 blocks cart operations too. Checkout flows through at full speed — customers mid-purchase complete their orders. If errors spike past 50, rule #4 blocks cart on the error signal as well. The fail-open guarantee means checkout is never gated unless you add a rule that explicitly targets it.

Multi-Tenant SaaS

Platforms like Vercel, PlanetScale, or any multi-tenant service serve hundreds of customers on shared infrastructure. A single noisy tenant running an expensive migration can degrade the platform for everyone. Tag by tenant plan tier so that paying customers are prioritized.

Tags

free weight: 1

starter weight: 3

team weight: 5

enterprise weight: 10

Integration

import express from 'express';
import { WaitState } from '@waitstate/sdk';

const app = express();
const ws = new WaitState({
  publishKey: process.env.WAITSTATE_PUBLISH_KEY,
  secretKey: process.env.WAITSTATE_SECRET_KEY,
});

// Tenant plan determines tag and weight
const planWeights = { free: 1, starter: 3, team: 5, enterprise: 10 };

app.use('/api', (req, res, next) => {
  const plan = req.tenant?.plan ?? 'free';
  const weight = planWeights[plan] ?? 1;
  const decision = ws.gate(plan, weight);

  if (!decision.allowed) {
    res.set('Retry-After', '30');
    return res.status(429).json({
      error: 'rate_limited',
      message: 'Your plan is temporarily throttled due to high platform load.',
      upgrade_url: '/billing/upgrade',
    });
  }
  next();
});

Reflex rules

When latency > 300ms

Then block free

When latency > 800ms

Then block starter

When errors > 50

Then block free

A free-tier tenant runs a bulk import that spikes latency to 400ms. Rule #1 fires and blocks all free-tier traffic. Paying tenants are unaffected. If the spike continues and latency hits 900ms, rule #2 blocks starter tier too. Team and enterprise tenants never see degradation. The response includes a Retry-After header and an upgrade link.

Serverless / Edge

Serverless functions auto-scale compute, but they don't scale the things they depend on. Your Lambda or Cloudflare Worker can spin up 10,000 instances, but the database behind it has the same connection pool, and the third-party APIs you call still have the same rate limits. Without gating, a traffic spike burns through your compute budget processing requests that will fail downstream anyway.

The problem with putting the SDK inside a serverless function is lifecycle. The SDK assumes a long-running process with a pulse timer. In serverless, the function might handle one request and freeze. The pulse never fires. Telemetry is lost.

The better pattern: gate at the edge, before the function spins up. The WASM SDK (@waitstate/wasm) gives you the same Rust gate logic and HMAC signing compiled to WebAssembly. Import gate() and sign_pulse(), pass the cached policy, and you get identical behavior to the full SDK — without initializing a client, starting a pulse timer, or paying cold start overhead. Denied requests never reach your function. Zero compute waste.

Tags

anonymous weight: 1

free weight: 1

paid weight: 1

Integration

This example shows a Cloudflare Worker acting as an edge gateway. It caches the policy, gates inbound requests using the WASM SDK's gate(), accumulates latency and error metrics in module-scoped state, and flushes a batched pulse every 20 seconds using sign_pulse() and waitUntil (fire-and-forget, no added latency).

// Edge gateway using the WaitState WASM SDK
// gate() and sign_pulse() run as compiled WASM — no JS reimplementation needed

import { gate, sign_pulse } from '@waitstate/wasm';

const POLICY_URL = 'https://api.waitstate.io/v1/policy/org_xxx';
const PULSE_URL  = 'https://api.waitstate.io/v1/pulse';

// Module-scoped state persists across requests in the same isolate
let cachedPolicy = { globalMaxWeight: null, tagMaxWeights: {}, killSignal: false };
let policyFetchedAt = 0;

// Accumulate telemetry and flush periodically instead of per-request
let pending = { usage: 0, bounced: 0, latencySum: 0, latencyCount: 0, errors: 0 };
let lastPulseAt = 0;

export default {
  async fetch(request, env, ctx) {
    // Refresh cached policy every 5s
    if (Date.now() - policyFetchedAt > 5000) {
      const res = await fetch(POLICY_URL, {
        headers: { authorization: 'Bearer <jwt>' },
      });
      if (res.ok) cachedPolicy = await res.json();
      policyFetchedAt = Date.now();
    }

    // gate() is the same Rust gate logic compiled to WASM
    // Returns { allowed: bool, reason: "allowed" | "kill_signal" | ... }
    const tag = request.headers.get('x-api-tier') || 'anonymous';
    const weights = { anonymous: 1, free: 3, paid: 10 };
    const decision = gate(cachedPolicy, tag, weights[tag] ?? 1);

    if (!decision.allowed) {
      pending.bounced += 1;
      return new Response('Rate limited', { status: 429 });
    }

    // Forward to origin and measure latency/errors
    const start = Date.now();
    const response = await fetch(request);
    const latencyMs = Date.now() - start;

    pending.usage += 1;
    pending.latencySum += latencyMs;
    pending.latencyCount += 1;
    if (response.status >= 500) pending.errors += 1;

    // Flush batched telemetry every 20s — matches the SDK's default pulse interval.
    // sign_pulse() is HMAC-SHA256 compiled to WASM.
    if (Date.now() - lastPulseAt > 20_000 && pending.latencyCount > 0) {
      const avgLatency = Math.round(pending.latencySum / pending.latencyCount);
      const body = JSON.stringify({
        instanceId: 'edge-gw',
        usageDelta: pending.usage,
        bouncedUnits: pending.bounced,
        metrics: { latency: avgLatency, errors: pending.errors },
        ts: Date.now(),
      });
      const timestamp = Date.now().toString();
      const signature = sign_pulse(body, timestamp, env.WAITSTATE_SECRET_KEY);

      ctx.waitUntil(
        fetch(PULSE_URL, {
          method: 'POST',
          headers: {
            'content-type': 'application/json',
            'x-waitstate-id': env.WAITSTATE_PUBLISH_KEY,
            'x-waitstate-signature': signature,
            'x-waitstate-timestamp': timestamp,
          },
          body,
        })
      );

      pending = { usage: 0, bounced: 0, latencySum: 0, latencyCount: 0, errors: 0 };
      lastPulseAt = Date.now();
    }

    return response;
  },
};

Reflex rules

When latency > 500ms

Then block anonymous

When latency > 1s

Then block free

When errors > 20

Then block free

When errors > 50

Then block paid

A viral spike sends 50x normal traffic to your API. The edge middleware reads the cached policy and gates against the last known health state. If the backend was already showing stress, low-priority requests are shed before your serverless function even spins up. Paid-tier requests flow through normally.

As the spike continues, the middleware reports rising latency and errors back to the control plane. Within seconds, the reflex engine updates the policy: rule #1 blocks anonymous traffic, rule #3 blocks free tier. The updated policy propagates to the edge cache on the next cycle. Your paid customers see normal performance throughout the incident. When load subsides, the reflex engine relaxes the policy and traffic recovers automatically.

Important: the edge gate makes decisions based on a cached policy, not live health. There's a lag of seconds between conditions changing and the policy updating. For gradual degradation this works well. For instant spikes, the first few seconds of traffic will flow through on the stale policy before the reflex engine catches up.

Why not put the SDK inside the function?

Cold start overhead. The SDK needs to authenticate, fetch policy, and start a pulse timer. That adds 50-200ms to the first invocation.
Lost telemetry. If the function freezes before the pulse fires, metrics are lost. Calling shutdown() on every request adds a network round-trip.
Wasted compute. If you're going to deny the request anyway, spinning up the function first wastes money. Gating at the edge costs nearly nothing.

For serverless architectures, the edge middleware handles both gating and telemetry. The function itself never needs to know about WaitState.

Internal Microservices

Large organizations run dozens of internal services calling each other. A shared dependency — database, cache, message queue — degrades, and suddenly every service that depends on it is queueing requests and burning resources on calls that will fail.

The distinct pattern here: you're not tagging by the request type, you're tagging by the caller's criticality. A batch job and a customer-facing service might hit the same endpoint. The difference is which one you shed first. Put gate() at each service boundary, tag by who's calling, and let the reflex engine decide which callers get through based on the downstream service's actual health.

Use siteId per service so each one gets its own health domain. A latency spike in the orders service doesn't affect the policy for the users service.

Tags

batch_job weight: 1

internal_tool weight: 3

customer_facing weight: 10

Integration

import express from 'express';
import { WaitState } from '@waitstate/sdk';

const app = express();

// Each service gets its own WaitState instance with a unique siteId.
// This gives every service independent health tracking via its own
// PulseAggregator — a latency spike in the orders service won't
// affect the policy for the users service.
const ws = new WaitState({
  publishKey: process.env.WAITSTATE_PUBLISH_KEY,
  secretKey: process.env.WAITSTATE_SECRET_KEY,
  siteId: 'orders-service',
});

// Tag by the calling service's criticality, not the request itself.
// A batch job hitting the same endpoint as a customer-facing service
// should be shed first.
app.use((req, res, next) => {
  const caller = req.headers['x-service-name'] ?? 'unknown';
  const callerWeights = {
    batch_job: 1,
    internal_tool: 3,
    customer_facing: 10,
  };
  const weight = callerWeights[caller] ?? 1;
  const decision = ws.gate(caller, weight);

  if (!decision.allowed) {
    return res.status(429).json({ error: 'shed', reason: decision.reason });
  }

  // Report latency back so the reflex engine sees this service's health
  const start = Date.now();
  res.on('finish', () => ws.reportLatency(Date.now() - start, caller));
  next();
});

Reflex rules

When latency > 300ms

Then block batch_job

When latency > 500ms

Then block internal_tool

When errors > 20

Then block batch_job

The shared Postgres instance starts lagging. The orders service reports 350ms latency via its pulse. Within seconds, the reflex engine fires rule #1 and blocks batch_job callers. Nightly data exports and report generators get a 429 and retry later. Internal tooling is blocked once latency passes 500ms. Customer-facing checkout and order status calls flow through at full speed.

If errors start spiking past 20, rule #3 reinforces the batch block on the error signal. Now only customer-facing and internal traffic reaches the database. The connection pool recovers, latency drops, and the reflex engine progressively relaxes the policy. Batch jobs resume automatically.

No service mesh configuration. No manual runbook. Each service independently adapts based on its own downstream health, and the callers with the lowest business value are always shed first.

Fintech / Payments

Transaction processing can't drop requests silently. Tag by operation type: webhook (weight 1), balance_check (weight 3), transfer (weight 10). When latency spikes, shed inbound webhooks and background reconciliation before touching live transfers. The fail-open guarantee means WaitState itself never blocks a payment.

Tags

webhook weight: 1

balance_check weight: 3

transfer weight: 10

Reflex rules

When latency > 300ms

Then block webhook

When latency > 500ms

Then block balance_check

When errors > 50

Then block balance_check

Transfers pass through because no rule targets the transfer tag. Webhooks can be replayed by the sender. Balance checks can retry. But a failed transfer means lost money.

Gaming Backend

Launch events and tournaments create sudden load spikes on matchmaking, leaderboards, and in-game purchase APIs. Tag by action criticality: leaderboard (weight 1), matchmaking (weight 5), purchase (weight 10). Under pressure, leaderboard reads are shed first. Matchmaking degrades gracefully. Purchases always complete.

Tags

leaderboard weight: 1

matchmaking weight: 5

purchase weight: 10

Reflex rules

When latency > 300ms

Then block leaderboard

When latency > 600ms

Then block matchmaking

When errors > 20

Then block leaderboard

Healthcare / Telehealth

Patient-facing systems serve real-time consultations alongside background data syncs. Tag by interaction type: sync (weight 1) for medical record replication, appointment (weight 5) for scheduling, live_session (weight 10) for active video consultations. Under load, background syncs pause while live sessions are never interrupted.

Tags

sync weight: 1

appointment weight: 5

live_session weight: 10

Reflex rules

When latency > 400ms

Then block sync

When latency > 800ms

Then block appointment

Media / Encoding Pipeline

Video platforms process uploads, transcode files, and serve streams simultaneously. Tag by job type: transcode (weight 1), upload (weight 5), live_stream (weight 10). When latency spikes from a transcoding backlog, live streams maintain quality while batch jobs are deferred. Separate sites for each content pipeline prevent a viral upload from degrading live broadcasts.

Tags

transcode weight: 1

upload weight: 5

live_stream weight: 10

Reflex rules

When latency > 500ms

Then block transcode

When latency > 1s

Then block upload

API Marketplace / Aggregator

You resell or aggregate third-party APIs (like RapidAPI) to downstream customers. Tag by customer tier. Use site sharding to give each downstream customer an independent aggregator so one customer's traffic spike won't affect another's policy. The per-plan site cap scales with your pricing: hobby customers get one site, pro gets ten.

Tags

free weight: 1

basic weight: 5

premium weight: 10

Reflex rules

When errors > 20

Then block free

When latency > 500ms

Then block basic

Logistics / Delivery

Delivery platforms serve real-time driver tracking, route optimization, and back-office analytics from the same infrastructure. Tag by latency sensitivity: analytics (weight 1), route_planning (weight 5), live_tracking (weight 10). During peak delivery hours, analytics queries are shed first. Drivers always see accurate ETAs.

Tags

analytics weight: 1

route_planning weight: 5

live_tracking weight: 10

Reflex rules

When latency > 400ms

Then block analytics

When latency > 800ms

Then block route_planning

Use Cases

GraphQL API

Tags

Integration

Reflex rules

Apollo Router integration

AI / LLM API Gateway

Tags

Integration

Reflex rules

E-Commerce

Tags

Integration

Reflex rules

Multi-Tenant SaaS

Tags

Integration

Reflex rules

Serverless / Edge

Tags

Integration

Reflex rules

Why not put the SDK inside the function?

Internal Microservices

Tags

Integration

Reflex rules

Fintech / Payments

Tags

Reflex rules

Gaming Backend

Tags

Reflex rules

Healthcare / Telehealth

Tags

Reflex rules

Media / Encoding Pipeline

Tags

Reflex rules

API Marketplace / Aggregator

Tags

Reflex rules

Logistics / Delivery

Tags

Reflex rules

Will your rate limiter know before your customers do?