BASE44DEVS

FIX · PERFORMANCE · HIGH

Base44 Function Timeout Error — 4 Causes, 25-Second Budget Fix

Base44 functions time out for four reasons: a synchronous waterfall of database queries that should run in parallel, unbatched external API calls firing one HTTP request per row, AI-agent-generated nested loops that quietly multiply work, and cron tasks contending with live request load on the same isolate. The rule is simple — get the function under the per-request execution budget by parallelizing what is independent, batching what is bulk, or moving long work off-platform.

Last verified
2026-05-08
Category
PERFORMANCE
Difficulty
MODERATE
DIY possible
YES

What's happening

A user clicks a button. The spinner runs. Twenty seconds pass. The SDK throws a timeout, or the function quietly returns null. The dashboard shows the function started. Nothing shows that it finished. Refreshing does not help. Retrying hangs the same way.

Base44 function timeouts happen because the function exceeded the platform's per-request execution budget, which is documented as 60 seconds but is closer to 25-30 seconds under real production load. The four causes that drive this in roughly that order of frequency are: a synchronous waterfall of database queries that should run in parallel, external API calls firing one HTTP request per row instead of batched, AI-agent-generated nested loops that quietly multiply the work, and scheduled cron tasks contending with live request load on the same isolate.

In 11 of the last 30 timeout audits we ran for clients, the function code looked clean on first read. The timeout was not a bug in any single line — it was an architectural pattern that was fine at 50 rows and broke at 5,000.

Why functions time out (the four root causes)

Almost every base44 function timeout we have diagnosed maps to one of these four patterns. They show up in different combinations, but the diagnosis order below holds in roughly 90 percent of cases.

1. Synchronous database query waterfall

The most common cause. The agent generates code that loops over a list and awaits a database query inside the loop. Each query waits for the previous one to finish. Twenty queries at 80ms each run as 1,600ms instead of 80ms. Two hundred queries blow the budget.

// BEFORE — sequential, blows budget at scale
const enriched = [];
for (const order of orders) {
  const customer = await db.customers.get(order.customerId);
  const items = await db.items.list({ orderId: order.id });
  enriched.push({ ...order, customer, items });
}

This is the pattern the agent emits because it reads naturally. It is wrong by default for any list longer than a dozen items.

2. Unbatched external API calls

Functions that enrich data with external APIs — Stripe, SendGrid, Shopify, OpenAI — frequently fire one HTTP request per row. Each external call adds 100-400ms of network latency. A function looping over 80 customers for their Stripe subscriptions can sit at 30+ seconds before any of your own logic runs.

Provider rate limits compound this. When the provider throttles, the SDK retries with backoff, so a 429 at row 30 can stretch the function past the timeout even though your code is correct.

3. AI-agent-generated nested loops

The agent generates a clean-looking outer loop, then inside the body adds a "for each related item" loop that nobody noticed in review. Outer loop 100 rows, inner loop 12 items, and the function silently does 1,200 operations instead of 100.

This compounds with causes #1 and #2 — a nested loop with awaited queries inside is how a function goes from fast in preview to always-timing-out in production. The agent will regenerate this exact pattern on every nearby prompt, which is the broader regression issue documented at /fix/ai-agent-regression-loop-breaks-code.

4. Cron and request-load contention

Base44 runs scheduled tasks on the same isolate pool that serves user requests. When a cron job kicks off at 9:00am and processes 10,000 rows, every user request in that window queues behind it. Your code has not changed — it is just waiting for an isolate.

This is the same isolate-pressure pattern documented at /fix/functions-stop-working-after-hours. If your timeouts cluster at predictable times of day, this is the cause and code optimization will not fix it.

The diagnostic checklist

Run these in order. Each step rules out one of the four causes before moving to the next.

  1. Capture the actual duration. Add console.time('fn') at entry and console.timeEnd('fn') at every return path. Reproduce the timeout and read the elapsed ms. If you do not have a real number, you are guessing.
  2. Check duration vs input size. Run with 10, 100, and 500 rows. Linear growth points to cause #1 or #2. Quadratic growth is cause #3. Constant duration with time-of-day clustering is cause #4.
  3. Inspect every await inside a loop body. Grep the function for for and while, read each body for await. Each one is a parallelization candidate.
  4. Count external HTTP calls per invocation. If the count scales with input size, you are in cause #2.
  5. Trace operations per input row. If one row triggers more than one query or HTTP call, write down the multiplier. This catches cause #3.
  6. Check failure timing across 24 hours. Same-time-daily clusters point to cron contention.
  7. Check the SDK retry config. Default retries can triple the credit burn and mask the real duration. Confirm whether the timeout count is real or amplified.
  8. Test via direct curl. If curl is fast and SDK is slow, the bottleneck is in the SDK layer.
  9. Diff against the last known-good commit. Slow this week but fast last week means an agent edit introduced the regression. The diff will show it.

By the end you should have one dominant cause and a measured baseline. Do not start fixing until you do.

The fix — by root cause

Each cause has a specific fix. Apply only the fix for the cause you confirmed. Applying all four blindly will introduce its own regressions.

Fix for cause #1 — parallelize independent queries

Wrap independent queries in Promise.all. Only parallelize calls that do not depend on each other's results. Most enrichment loops are independent and can be flattened.

// AFTER — parallel, holds budget at any list size
const enriched = await Promise.all(
  orders.map(async (order) => {
    const [customer, items] = await Promise.all([
      db.customers.get(order.customerId),
      db.items.list({ orderId: order.id }),
    ]);
    return { ...order, customer, items };
  })
);

This drops O(n) round trips to O(1) in the best case. In one client audit a 24-second order-enrichment function dropped to 1.8 seconds with this single change.

Watch for connection-pool exhaustion. Parallelizing 1,000 queries at once can exhaust the database connection limit. Cap concurrency with a semaphore or chunking when the list is over 200 items.

Fix for cause #2 — batch external API calls

Most external APIs expose batch endpoints the agent does not know about. Replace the per-row loop with a single bulk call.

// BEFORE — one Stripe call per customer
const subs = [];
for (const customerId of customerIds) {
  const sub = await stripe.subscriptions.list({ customer: customerId, limit: 1 });
  subs.push(sub.data[0]);
}

// AFTER — one Stripe call total via search
const query = customerIds.map((id) => `customer:"${id}"`).join(" OR ");
const result = await stripe.subscriptions.search({ query, limit: 100 });
const subs = customerIds.map((id) => result.data.find((s) => s.customer === id));

For APIs without a true batch endpoint, parallelize with Promise.all and a pLimit(5) semaphore — that keeps you under provider rate limits while cutting wall-clock by 5x. If the provider throttles anyway, move to a queue pattern below.

Fix for cause #3 — flatten nested loops or precompute

Nested iteration that needs both lists is usually solvable by precomputing a lookup map outside the loop.

// BEFORE — O(n*m), times out at production scale
for (const order of orders) {
  for (const item of allItems) {
    if (item.orderId === order.id) { /* ... */ }
  }
}

// AFTER — O(n+m), one pass to build map then one pass to use it
const itemsByOrder = new Map<string, typeof allItems>();
for (const item of allItems) {
  const list = itemsByOrder.get(item.orderId) ?? [];
  list.push(item);
  itemsByOrder.set(item.orderId, list);
}
for (const order of orders) {
  const items = itemsByOrder.get(order.id) ?? [];
  // ...
}

If the nested loop also contains awaited work the duration multiplies — refactor the inner loop into a parallel operation first, then question whether it needs to be nested at all. After fixing, pin the file with a comment noting the duration target. The agent will re-introduce the nested loop on its next pass unless the constraint is visible in the code itself.

Fix for cause #4 — separate cron from request-load

If timeouts cluster at predictable times of day, push cron work off the user-request isolate. Two options.

Option A — chunked self-rescheduling cron. Break the cron into 20-second slices and have it re-enqueue itself for the next slice.

export default async function processBatch() {
  const start = Date.now();
  const BUDGET_MS = 20_000;

  while (Date.now() - start < BUDGET_MS) {
    const batch = await db.queue.next({ limit: 25 });
    if (batch.length === 0) return { done: true };
    await processItems(batch);
  }

  // Out of budget — enqueue continuation and return.
  await scheduler.enqueue("processBatch", { delay: 5 });
  return { done: false };
}

Option B — move the cron off-platform entirely. Run scheduled work on Vercel Cron, Inngest, or a Cloudflare Worker, calling back into base44 only for data writes. This is the right answer for any cron that consistently runs over 30 seconds.

Architecture-level escapes

Some workloads will never fit a 25-second synchronous budget no matter how clean the code is — PDF generation over many pages, image pipelines, multi-step LLM chains, large report exports. For those, move the heavy work off-platform and use base44 only as the trigger and the final result reader.

Three escape shapes work well:

  • Queue + worker. User clicks a button, the base44 function enqueues a job and returns in under a second, an external worker (Inngest, Trigger.dev, a Cloudflare Worker) processes the job, a webhook writes the result back. The user polls a status field or receives a notification.
  • Edge function for pure compute. If the work is stateless and CPU-bound (image transforms, PDF rendering, streaming LLM calls), a Cloudflare or Vercel edge function with a longer execution budget can take the workload while base44 keeps the data tier.
  • Separate microservice. When the workload needs heavy dependencies (puppeteer, ffmpeg, ML models), neither base44 nor an edge function will fit. A small Node or Go service on Fly.io or Railway with a base44-callable HTTP API is the right shape.

The full migration path is at /migrate. The decision rule: if you have applied the four root-cause fixes and the function is still timing out, the workload does not belong on the platform.

For surrounding observability and error patterns, see /blog/base44-performance-optimization-guide and /blog/base44-error-reference. For rate-limit failures that masquerade as timeouts, see /fix/rate-limit-429-production-throttle.

When to call us

If you have run the checklist, identified the dominant cause, applied the matching fix, and the function still times out — the next step is an audit. We have run this exact diagnostic on 30+ base44 function fleets and the patterns repeat enough that we identify the dominant cause within the first hour of an audit call. Start at /audit for a structured engagement, or /fix for a 48-72 hour fix-sprint that hardens a single function or a small fleet.

Start a fix sprint for timing-out functions

QUERIES

Frequently asked questions

Q.01What is the actual base44 function timeout limit?
A.01

Base44's documentation lists a 60-second per-request execution budget for backend functions, and that is the hard ceiling at the platform's edge. Observed behavior under real load is shorter. In 11 of the last 30 timeout audits we ran, functions started returning timeouts at roughly 28-35 seconds during peak hours, well before the documented 60. The reason is that the documented number measures the isolate's wall clock, while real users see the round trip including queue time, isolate cold-start, and the SDK's own retry layer. Treat 25 seconds as your practical budget for any synchronous response. Anything heavier needs to be queued, streamed, or moved off-platform before it ever approaches the 60-second number.

Q.02Why do timeouts only happen in production but not preview?
A.02

Preview runs against a single warm isolate with no concurrent traffic. The SDK's internal connection pool is already established, the database has no contention, and external APIs you call are not rate-limited because no other tenant is sharing the path. Production hits the same code with concurrent users, cold isolates, throttled outbound HTTP, and a connection pool that has to warm up under load. A function that returns in 4 seconds in preview can take 32 seconds in production at peak. We see this constantly — teams ship from preview, get paged at 9am, and assume the platform broke overnight. It did not. The latency was always there; preview was hiding it.

Q.03Can I split a long function into smaller ones?
A.03

Yes, with a queue pattern. Break the work into a fast trigger function that enqueues work and a worker function that drains the queue on a schedule. The trigger returns to the user in under a second; the worker processes batches of 10-50 items per invocation and chains itself until empty. This sidesteps the per-request budget entirely because no single invocation does more than a slice. The catch is that base44's scheduled tasks have known reliability problems documented at [/fix/functions-stop-working-after-hours](/fix/functions-stop-working-after-hours) — the worker can stall silently after isolate recycling. Always pair the queue pattern with an external cron monitor that can re-invoke the worker if it goes quiet for more than 10 minutes.

Q.04Do timeouts cost me credits even on failure?
A.04

Yes. Partial execution is billed. The platform charges for compute time consumed up to the timeout, not for the response delivered. A function that runs 27 seconds and then times out costs roughly the same as a function that runs 27 seconds and returns successfully. We have seen this pattern blow through monthly credit budgets in days when a single buggy function timed out on every retry. The SDK's default retry policy compounds the bleed — three retries of a 25-second timeout is 75 seconds of billed compute for one user click. Cap your retries, log every timeout with its duration, and read [/fix/excessive-credit-burn-minor-changes](/fix/excessive-credit-burn-minor-changes) for the broader credit-burn pattern.

Q.05How do I monitor timeouts before users report them?
A.05

Log every function entry and exit with a duration and a request id, and ship those logs somewhere queryable. Base44's dashboard alone is insufficient — it shows error counts but not duration distributions, and it groups timeouts under the same generic 500 bucket as other failures. The pattern we use is: emit a structured log line at function entry, another at exit with elapsed milliseconds, and a third on any catch path. Pipe those to BetterStack, Logtail, or a free Vercel log drain, then alert when the p95 duration crosses 18 seconds for any function. By the time you see a real timeout in production, the p95 has usually been climbing for hours. See [/blog/base44-production-readiness-guide](/blog/base44-production-readiness-guide) for the full instrumentation pattern.

Q.06When should I move work off the platform entirely?
A.06

Move work off-platform when the timeout pattern is architectural rather than tactical. Tactical means one function is slow because of a bad query or an unbatched call — fixable inside base44. Architectural means the work itself does not fit a 25-second synchronous budget no matter how clean the code is: PDF generation over 50 pages, image processing pipelines, multi-step LLM chains, scheduled report generation across thousands of rows. For those, the right move is a small worker service on Vercel, Cloudflare Workers, or a lightweight queue like Inngest, with base44 only handling the user-facing trigger and the final result. We document the migration shape at [/migrate](/migrate) — it is the pattern we run for clients whose timeout pages keep recurring no matter how hard their function code is tuned.

NEXT STEP

Need this fix shipped this week?

Book a free 15-minute call or order a $497 audit. We will respond within one business day.