What's happening
A user clicks a button. The spinner runs. Twenty seconds pass. The SDK throws a timeout, or the function quietly returns null. The dashboard shows the function started. Nothing shows that it finished. Refreshing does not help. Retrying hangs the same way.
Base44 function timeouts happen because the function exceeded the platform's per-request execution budget, which is documented as 60 seconds but is closer to 25-30 seconds under real production load. The four causes that drive this in roughly that order of frequency are: a synchronous waterfall of database queries that should run in parallel, external API calls firing one HTTP request per row instead of batched, AI-agent-generated nested loops that quietly multiply the work, and scheduled cron tasks contending with live request load on the same isolate.
In 11 of the last 30 timeout audits we ran for clients, the function code looked clean on first read. The timeout was not a bug in any single line — it was an architectural pattern that was fine at 50 rows and broke at 5,000.
Why functions time out (the four root causes)
Almost every base44 function timeout we have diagnosed maps to one of these four patterns. They show up in different combinations, but the diagnosis order below holds in roughly 90 percent of cases.
1. Synchronous database query waterfall
The most common cause. The agent generates code that loops over a list and awaits a database query inside the loop. Each query waits for the previous one to finish. Twenty queries at 80ms each run as 1,600ms instead of 80ms. Two hundred queries blow the budget.
// BEFORE — sequential, blows budget at scale
const enriched = [];
for (const order of orders) {
const customer = await db.customers.get(order.customerId);
const items = await db.items.list({ orderId: order.id });
enriched.push({ ...order, customer, items });
}
This is the pattern the agent emits because it reads naturally. It is wrong by default for any list longer than a dozen items.
2. Unbatched external API calls
Functions that enrich data with external APIs — Stripe, SendGrid, Shopify, OpenAI — frequently fire one HTTP request per row. Each external call adds 100-400ms of network latency. A function looping over 80 customers for their Stripe subscriptions can sit at 30+ seconds before any of your own logic runs.
Provider rate limits compound this. When the provider throttles, the SDK retries with backoff, so a 429 at row 30 can stretch the function past the timeout even though your code is correct.
3. AI-agent-generated nested loops
The agent generates a clean-looking outer loop, then inside the body adds a "for each related item" loop that nobody noticed in review. Outer loop 100 rows, inner loop 12 items, and the function silently does 1,200 operations instead of 100.
This compounds with causes #1 and #2 — a nested loop with awaited queries inside is how a function goes from fast in preview to always-timing-out in production. The agent will regenerate this exact pattern on every nearby prompt, which is the broader regression issue documented at /fix/ai-agent-regression-loop-breaks-code.
4. Cron and request-load contention
Base44 runs scheduled tasks on the same isolate pool that serves user requests. When a cron job kicks off at 9:00am and processes 10,000 rows, every user request in that window queues behind it. Your code has not changed — it is just waiting for an isolate.
This is the same isolate-pressure pattern documented at /fix/functions-stop-working-after-hours. If your timeouts cluster at predictable times of day, this is the cause and code optimization will not fix it.
The diagnostic checklist
Run these in order. Each step rules out one of the four causes before moving to the next.
- Capture the actual duration. Add
console.time('fn')at entry andconsole.timeEnd('fn')at every return path. Reproduce the timeout and read the elapsed ms. If you do not have a real number, you are guessing. - Check duration vs input size. Run with 10, 100, and 500 rows. Linear growth points to cause #1 or #2. Quadratic growth is cause #3. Constant duration with time-of-day clustering is cause #4.
- Inspect every
awaitinside a loop body. Grep the function forforandwhile, read each body forawait. Each one is a parallelization candidate. - Count external HTTP calls per invocation. If the count scales with input size, you are in cause #2.
- Trace operations per input row. If one row triggers more than one query or HTTP call, write down the multiplier. This catches cause #3.
- Check failure timing across 24 hours. Same-time-daily clusters point to cron contention.
- Check the SDK retry config. Default retries can triple the credit burn and mask the real duration. Confirm whether the timeout count is real or amplified.
- Test via direct curl. If curl is fast and SDK is slow, the bottleneck is in the SDK layer.
- Diff against the last known-good commit. Slow this week but fast last week means an agent edit introduced the regression. The diff will show it.
By the end you should have one dominant cause and a measured baseline. Do not start fixing until you do.
The fix — by root cause
Each cause has a specific fix. Apply only the fix for the cause you confirmed. Applying all four blindly will introduce its own regressions.
Fix for cause #1 — parallelize independent queries
Wrap independent queries in Promise.all. Only parallelize calls that do not depend on each other's results. Most enrichment loops are independent and can be flattened.
// AFTER — parallel, holds budget at any list size
const enriched = await Promise.all(
orders.map(async (order) => {
const [customer, items] = await Promise.all([
db.customers.get(order.customerId),
db.items.list({ orderId: order.id }),
]);
return { ...order, customer, items };
})
);
This drops O(n) round trips to O(1) in the best case. In one client audit a 24-second order-enrichment function dropped to 1.8 seconds with this single change.
Watch for connection-pool exhaustion. Parallelizing 1,000 queries at once can exhaust the database connection limit. Cap concurrency with a semaphore or chunking when the list is over 200 items.
Fix for cause #2 — batch external API calls
Most external APIs expose batch endpoints the agent does not know about. Replace the per-row loop with a single bulk call.
// BEFORE — one Stripe call per customer
const subs = [];
for (const customerId of customerIds) {
const sub = await stripe.subscriptions.list({ customer: customerId, limit: 1 });
subs.push(sub.data[0]);
}
// AFTER — one Stripe call total via search
const query = customerIds.map((id) => `customer:"${id}"`).join(" OR ");
const result = await stripe.subscriptions.search({ query, limit: 100 });
const subs = customerIds.map((id) => result.data.find((s) => s.customer === id));
For APIs without a true batch endpoint, parallelize with Promise.all and a pLimit(5) semaphore — that keeps you under provider rate limits while cutting wall-clock by 5x. If the provider throttles anyway, move to a queue pattern below.
Fix for cause #3 — flatten nested loops or precompute
Nested iteration that needs both lists is usually solvable by precomputing a lookup map outside the loop.
// BEFORE — O(n*m), times out at production scale
for (const order of orders) {
for (const item of allItems) {
if (item.orderId === order.id) { /* ... */ }
}
}
// AFTER — O(n+m), one pass to build map then one pass to use it
const itemsByOrder = new Map<string, typeof allItems>();
for (const item of allItems) {
const list = itemsByOrder.get(item.orderId) ?? [];
list.push(item);
itemsByOrder.set(item.orderId, list);
}
for (const order of orders) {
const items = itemsByOrder.get(order.id) ?? [];
// ...
}
If the nested loop also contains awaited work the duration multiplies — refactor the inner loop into a parallel operation first, then question whether it needs to be nested at all. After fixing, pin the file with a comment noting the duration target. The agent will re-introduce the nested loop on its next pass unless the constraint is visible in the code itself.
Fix for cause #4 — separate cron from request-load
If timeouts cluster at predictable times of day, push cron work off the user-request isolate. Two options.
Option A — chunked self-rescheduling cron. Break the cron into 20-second slices and have it re-enqueue itself for the next slice.
export default async function processBatch() {
const start = Date.now();
const BUDGET_MS = 20_000;
while (Date.now() - start < BUDGET_MS) {
const batch = await db.queue.next({ limit: 25 });
if (batch.length === 0) return { done: true };
await processItems(batch);
}
// Out of budget — enqueue continuation and return.
await scheduler.enqueue("processBatch", { delay: 5 });
return { done: false };
}
Option B — move the cron off-platform entirely. Run scheduled work on Vercel Cron, Inngest, or a Cloudflare Worker, calling back into base44 only for data writes. This is the right answer for any cron that consistently runs over 30 seconds.
Architecture-level escapes
Some workloads will never fit a 25-second synchronous budget no matter how clean the code is — PDF generation over many pages, image pipelines, multi-step LLM chains, large report exports. For those, move the heavy work off-platform and use base44 only as the trigger and the final result reader.
Three escape shapes work well:
- Queue + worker. User clicks a button, the base44 function enqueues a job and returns in under a second, an external worker (Inngest, Trigger.dev, a Cloudflare Worker) processes the job, a webhook writes the result back. The user polls a status field or receives a notification.
- Edge function for pure compute. If the work is stateless and CPU-bound (image transforms, PDF rendering, streaming LLM calls), a Cloudflare or Vercel edge function with a longer execution budget can take the workload while base44 keeps the data tier.
- Separate microservice. When the workload needs heavy dependencies (puppeteer, ffmpeg, ML models), neither base44 nor an edge function will fit. A small Node or Go service on Fly.io or Railway with a base44-callable HTTP API is the right shape.
The full migration path is at /migrate. The decision rule: if you have applied the four root-cause fixes and the function is still timing out, the workload does not belong on the platform.
For surrounding observability and error patterns, see /blog/base44-performance-optimization-guide and /blog/base44-error-reference. For rate-limit failures that masquerade as timeouts, see /fix/rate-limit-429-production-throttle.
When to call us
If you have run the checklist, identified the dominant cause, applied the matching fix, and the function still times out — the next step is an audit. We have run this exact diagnostic on 30+ base44 function fleets and the patterns repeat enough that we identify the dominant cause within the first hour of an audit call. Start at /audit for a structured engagement, or /fix for a 48-72 hour fix-sprint that hardens a single function or a small fleet.
Start a fix sprint for timing-out functions
Related problems
- Base44 functions stop working after a few hours — the sibling failure mode where isolate recycling and cron contention overlap with the timeout pattern.
- Production throttle from 429 rate-limit errors — when external API rate limits push your function past the timeout boundary.
- Editor hangs and crashes on large projects — the editor-side analog of the same isolate-pressure problem.
- Excessive credit burn from minor changes — what happens to your credit budget when timeouts retry repeatedly.