RESILIENCE AND RATE LIMITS

Neura combines Redis backed rate limiting, Opossum circuit breakers, and structured logging to keep the API responsive under load. This document summarizes the protections and offers guidance for client side retry logic.

Rate limiting

  • Global budget: 300 requests per second per agent, measured by route grouping.

  • Penalty window: Exceeding the budget blocks the offending agent for 5 seconds.

  • HTTP response: 429 Too Many Requests with JSON body:

    {
      "success": false,
      "error": "Rate limit exceeded",
      "retryAfter": 5,
      "timestamp": "2025-01-15T12:34:56.789Z"
    }
  • Headers: The middleware does not override Retry-After, but the JSON payload exposes retryAfter in seconds.

Best practices

  • Implement exponential backoff with jitter. Start with 1 second and cap near the retryAfter value.

  • Throttle discovery calls; invoice responses count towards rate limits even if unpaid.

  • Spread polling across /solana and /base prefixes if you operate separate clients per network.

Circuit breakers

Each upstream data call is protected by an Opossum breaker with the following configuration:

Setting
Value

Timeout

30 seconds

Error threshold

50 percent

Volume threshold

10 requests

Reset timeout

30 seconds

When the breaker opens, the API immediately returns a 503 style error, which the router surfaces as 500 with a descriptive message. The breaker transitions to half open after the reset window and closes once calls succeed again.

Client guidance

  • Treat repeated 500 errors with the same timestamp as transient. Retry with a 10 to 30 second delay.

  • Tune your monitoring to alert if breaker related errors exceed normal baselines.

Retries

The API retries upstream calls up to three times with exponential backoff (1 to 10 seconds). If all attempts fail, the error cascades to the client with success: false.

Client applications should:

  • Avoid immediate retries if the payload already failed after backend retries.

  • Surface error messages to operators; they include enough context to determine whether the issue is user input or infrastructure.

Observability

Structured logs

  • Logging uses Pino. Set LOG_LEVEL=debug to capture retry and breaker events locally.

  • Production deployments commonly use info to reduce noise while retaining key resilience events.

Health endpoints

Endpoint
Description

GET /health

Returns uptime, memory usage, and deployment version. Use for liveness checks.

GET /health/resilience

Returns rate limiter counters for tracked keys in the form rateLimit_neura:key along with remaining points and msBeforeNext.

Sample GET /health/resilience response:

{
  "status": "healthy",
  "timestamp": "2025-01-15T12:34:56.789Z",
  "resilience": {
    "rateLimit_neura:token": {
      "consumedPoints": 42,
      "remainingPoints": 258,
      "msBeforeNext": 1200
    }
  }
}

The key prefixes follow internal naming conventions. Treat them as opaque identifiers and focus on the remaining point totals.

Production checklist

  • Monitor 429 rates and circuit breaker warnings.

  • Scale Redis with sufficient throughput to handle spikes.

  • Mirror resilience configuration across /solana and /base if you run separate clusters.

  • Automate invoice replay with retries, respecting the guidance above to avoid flapping breakers.

Last updated