BASE44DEVS

ARTICLE · 18 MIN READ

Base44 Production Readiness Audit Checklist (35 Tests)

A 35-item Base44 production readiness audit checklist across auth, RLS, schema, errors, observability, performance, backups, and security headers. Test, pass criteria, and fix cost for every item.

Last verified
2026-05-24
Published
2026-05-24
Read time
18 min
Words
3,408
  • PRODUCTION
  • AUDIT
  • CHECKLIST
  • SECURITY
  • OBSERVABILITY
  • RLS

What a base44 production readiness audit actually covers

A Base44 production readiness audit checklist runs 35 tests across eight categories — authentication hardening, row-level security coverage, schema and entity validation, error handling, observability, performance budget, data backups, and security headers. Each item names the test, the pass-or-fail criterion you can verify in under five minutes, and a fix-cost estimate in engineer-hours. In 12 of our last 30 engagements the same five items showed up in the top ten failures: RLS not actually applied on a critical entity, no external log shipping, no scheduled backup, missing webhook signature validation, and stored XSS in a user-supplied field. Run the checklist before launch, after any AI-generated schema change, and quarterly thereafter. Six items are launch-blockers. The other 29 range from important to nice-to-have but should not be skipped without a written reason.

Most Base44 apps reach a soft launch — friends and family, a closed beta, a Product Hunt post — without anyone running a structured audit. The platform makes it easy to ship and harder to verify. Authentication appears to work, the editor confirms RLS rules exist, the AI generated a backend function that talks to Stripe. The app feels production-ready because the surface area looks complete. Underneath, in the last 30 audits we ran on Base44 apps, the median number of failed checklist items was 17 out of 35. The worst was 28.

This is the checklist we use internally on every paid audit engagement. We are publishing it for two reasons. First, the platform documentation does not enumerate these items in one place, and most operators do not know what to test for. Second, an honest, runnable checklist is the cheapest way to set a quality bar before a launch. If you run it yourself and pass cleanly, you do not need us. If you run it and you have 15 failures and three weeks to launch, you should hire us — or hire someone like us — because the remediation work is the real cost, not the audit.

The structure of every item is the same: what to test, what passes, what fails, and how long the fix takes. No vague guidance like "improve security." Either the test passes or it does not.

Category 1: Authentication hardening

Authentication failures in Base44 apps tend to be structural rather than cosmetic. The platform handles login UI, JWT issuance, and session storage, but it leaves the harder questions — token expiry, OAuth callback validation, account enumeration — to the operator.

1. Session token expiry handled in long-running backend functions. Test: write a backend function that takes 90 seconds to run and calls the SDK in its final step. Pass: the function returns a 200 or an explicit 401 with a session-expired body. Fail: the function returns an opaque 500 or hangs. Fix cost: 1 to 2 hours per function. See the backend functions routing fix for the related routing pattern.

2. OAuth callback domain whitelist matches production. Test: in Google Cloud (or whichever IdP), inspect the allowed redirect URIs. Pass: only production and explicitly approved staging domains are listed. Fail: localhost, ngrok URLs, or wildcards are still allowed. Fix cost: 15 minutes.

3. Email verification cannot be bypassed. Test: register a new account with an email you do not control, then attempt to call any user-scoped backend function before clicking the verification link. Pass: function returns 401 or 403. Fail: function succeeds with the unverified account. Fix cost: 30 minutes to 1 hour. See the email verification loop fix for the related symptom.

4. SSO domain check enforced after signup. Test: on an app meant for @yourcompany.com users, register with a @gmail.com address. Pass: the account is rejected or quarantined. Fail: the gmail account is admitted with full permissions. Fix cost: 1 to 2 hours. The pre-July 2025 platform default did not enforce this — see the SSO bypass fix.

5. Password reset rate-limited. Test: trigger the password reset flow 20 times in 60 seconds for the same email. Pass: subsequent requests are throttled or queued. Fail: every request fires a fresh email. Fix cost: 1 hour.

6. No account enumeration in error messages. Test: attempt a login with a known-good email and a wrong password, then with a known-bad email. Pass: both return the same generic error. Fail: the responses differ enough to reveal which emails are registered. Fix cost: 30 minutes.

A clean pass on category 1 takes roughly 90 minutes of test time and 3 to 5 hours of fix time on a typical app. About 70 percent of the audits we run fail at least two items here.

Category 2: Row-level security coverage

RLS is the most miscalibrated layer of Base44 apps. The IDE shows that rules exist for an entity, the operator assumes that means RLS is enforced, and in practice 4 of every 10 apps we audit have at least one user-scoped entity where the rule is incomplete, contradicted by a backend function bypass, or has been overwritten by a later AI edit.

7. Every user-scoped entity has an explicit created_by = currentUser.id rule. Test: open each entity in the IDE, read its RLS rules. Pass: every entity that should be user-scoped has a rule referencing the current user. Fail: any entity is public-by-default with no rule. Fix cost: 30 minutes per entity.

8. RLS holds when tested with a second account. Test: create a record as User A, log in as User B, attempt to list and to read the record directly by ID. Pass: User B sees nothing. Fail: User B can see or fetch the record. Fix cost: 1 to 3 hours per entity, depending on whether the leak is in the rule or in a backend function that bypasses it. See the deeper analysis in the RLS out-of-sync fix.

9. RLS holds after an AI-generated schema change. Test: regenerate the entity via the agent (add a field) and re-run test 8. Pass: rule survives. Fail: rule was overwritten. Fix cost: rebuild rule, plus institute the snapshot-before-prompt habit. The platform does not preserve RLS rules across some regeneration paths.

10. Backend functions respect RLS even when called with a service token. Test: identify any backend function that bypasses RLS via an admin token. Pass: such functions exist only where genuinely required, and the operator can explain each one. Fail: the operator does not know which functions bypass RLS. Fix cost: 2 to 4 hours to audit and document.

11. List endpoints do not leak across users via filter manipulation. Test: call Entity.list({ filter: { user_id: "<other-user-id>" } }) while logged in as User A. Pass: empty result. Fail: data leak. Fix cost: 1 to 2 hours per affected entity.

12. Admin role is binary, not implicit. Test: search the codebase for any if (user.email === ...) or if (user.role === "admin") checks. Pass: a single, documented admin check pattern is used. Fail: ad-hoc email checks scattered across the codebase. Fix cost: 2 to 4 hours to centralize.

Category 2 has the highest leverage in the audit. A single failure here is usually a launch-blocker — the others can often ship with a known gap. RLS leaks cannot.

Category 3: Schema and entity validation

The base44 production readiness audit catches schema drift in three flavors: fields referenced in code that no longer exist on the entity, fields that exist but have changed type, and entities with no validation on user-supplied input.

13. No code references fields that have been removed from entities. Test: run a grep across the codebase for every field name, cross-reference against the live entity schema. Pass: zero orphan references. Fail: any orphan reference. Fix cost: 30 minutes per drift. See the hallucinated fields fix for how AI-generated drift happens.

14. Required fields are required at the entity level, not only in the UI. Test: bypass the UI and call the entity create endpoint directly without a required field. Pass: server rejects the create. Fail: record is created with the field missing. Fix cost: 15 minutes per field.

15. String fields have explicit length limits. Test: attempt to write a 1 MB string into any user-input field. Pass: rejected with a clear error. Fail: accepted and stored. Fix cost: 15 minutes per field. Storing unbounded strings is one of the cheaper ways to blow your database budget.

16. Enum fields enforce their enum values. Test: write an invalid enum value through the SDK. Pass: rejected. Fail: stored as a string. Fix cost: 30 minutes per field.

17. Foreign-key references are validated before write. Test: write a record referencing a non-existent parent ID. Pass: rejected with a foreign-key error. Fail: orphan record created. Fix cost: 1 to 2 hours per relationship.

18. The 5,000-record list cap is handled with pagination. Test: count records in your largest entity. If over 4,000, verify that all list calls paginate. Pass: paginated cursor pattern in place. Fail: bare Entity.list() calls. Fix cost: 1 hour per call site.

The schema category looks dry on paper and is consistently the second-highest source of post-launch incidents, behind RLS. AI-generated schema changes are the dominant cause.

Category 4: Error handling and error surfaces

Error handling in Base44 apps is rarely catastrophic, but it is consistently below the bar that real users will accept. The audit checks three things: errors are caught, errors are surfaced usefully to users, and errors are recorded somewhere queryable.

19. Every SDK call is wrapped in try/catch. Test: grep for Entity. and functions. calls; confirm each is inside a try/catch or has an explicit error handler. Pass: 100 percent coverage. Fail: any raw call. Fix cost: 15 minutes per call site.

20. User-facing error messages are sanitized. Test: trigger a backend error and read what the user sees. Pass: a friendly message with a unique error ID. Fail: the raw stack trace, a Deno error message, or a leaked internal hostname. Fix cost: 1 to 2 hours to wrap a unified error surface.

21. Network errors are retried with exponential backoff. Test: simulate a network failure on a non-critical call. Pass: the call retries 2 to 3 times with increasing delay before surfacing. Fail: instant failure with no retry. Fix cost: 1 to 2 hours to ship a wrapper.

22. 401 responses trigger re-authentication, not silent failure. Test: artificially expire the user's session and trigger a call. Pass: user is redirected to login with the prior URL preserved. Fail: silent failure or generic error. Fix cost: 1 to 2 hours.

23. Critical actions have idempotency keys. Test: identify any function that creates a payment, sends an email, or writes to an external system. Pass: each accepts an idempotency key from the client. Fail: duplicate submissions cause duplicate side effects. Fix cost: 2 to 4 hours per function.

If category 4 is fully red, your launch will be loud — every user error becomes a support ticket because the surface is not self-diagnostic.

Category 5: Observability and external logging

Base44's built-in function logs are useful but volatile: they roll over quickly and disappear on a fresh deploy. Production observability requires shipping logs out.

24. Every backend function emits a structured log line per invocation. Test: read three random function files; look for console.log with at least an event name and a user ID. Pass: structured logging present. Fail: silent functions. Fix cost: 30 minutes per function.

25. Logs are shipped to an external destination. Test: trigger an error, redeploy the function, then attempt to find that error in your log destination 24 hours later. Pass: log survived. Fail: log is gone. Fix cost: 4 to 6 hours to wire a logger to BetterStack, Logtail, Axiom, or similar.

26. Errors include a correlation ID returned to the client. Test: trigger an error and inspect the response. Pass: response includes an error_id field. Fail: opaque error. Fix cost: 2 hours to retrofit.

27. A synthetic monitor pings the app every 1 to 5 minutes. Test: open your monitor dashboard. Pass: a check from outside your network hits a representative URL on the cadence above. Fail: no external check exists. Fix cost: 30 minutes to set up Better Uptime or Cronitor.

28. Status page reflects user-facing health, not just platform health. Test: simulate a function outage and check your status page. Pass: status changes within 5 minutes. Fail: status page shows green during a real outage. Fix cost: 2 to 4 hours to wire synthetic results to a public status page.

The single most common failure in this category is item 25. Roughly 65 percent of the apps we audit have no external log destination, which means post-incident debugging is guesswork.

Category 6: Performance budget and rate-limit posture

The base44 production readiness audit treats performance as a budget, not a benchmark. Pass and fail are framed against a stated threshold the operator can defend.

29. First-byte-to-interactive under 3 seconds on a representative page. Test: run Lighthouse mobile on a real device or a throttled emulator on the homepage. Pass: TTI under 3000 ms. Fail: longer. Fix cost: 2 to 8 hours depending on payload. The CSR default makes this hard — see the SEO/CSR analysis.

30. Largest entity query under 500 ms server time. Test: time the largest list call from a cold-start function. Pass: under 500 ms. Fail: longer. Fix cost: 1 to 4 hours to add an index or paginate.

31. Rate limits not exceeded under expected peak load. Test: simulate 2x your expected concurrent traffic for 10 minutes against the slowest endpoint. Pass: no 429s. Fail: any 429s. Fix cost: 1 to 3 hours to add throttling or backoff. Underlying behavior covered in the rate-limit fix.

32. Scheduled tasks fire even when no users are active. Test: turn off all client traffic for 30 minutes and confirm a scheduled task runs. Pass: it runs. Fail: it does not. Fix cost: 1 to 2 hours to wire an external cron. See the webhooks-require-active-users fix for the underlying constraint.

Category 7: Data backups and snapshots

Backups are the test most operators skip because the platform implies they happen. They do not happen at the granularity you want for recovery.

33. A scheduled export of every critical entity runs at least daily. Test: open your backup destination and confirm a fresh export exists from within the last 24 hours. Pass: present. Fail: missing or stale. Fix cost: 4 to 6 hours to ship a backend-function-driven export to S3 or equivalent.

34. A restore from backup has been performed in the last 90 days. Test: ask the operator the date of the last restore drill. Pass: within 90 days. Fail: never or unknown. Fix cost: 2 to 4 hours per drill. An untested backup is not a backup.

35. A platform snapshot exists from the last 24 hours. Test: open the IDE and confirm a snapshot from today. Pass: present. Fail: missing. Fix cost: 30 seconds. This is the cheapest item on the checklist and one of the most commonly missing.

The audit failure rate on category 7 is the highest of any category — north of 80 percent of the apps we see have no tested restore path. This is the category that makes a recoverable incident into a 6-figure incident.

Category 8: Security headers and surface hygiene

Security headers are cheap, additive, and uniformly underused on Base44 apps because the platform does not set them by default on custom domains.

Bonus item A. Content-Security-Policy header present. Test: curl your homepage and inspect headers. Pass: a CSP header restricts script sources. Fail: no CSP. Fix cost: 2 to 4 hours to ship and test.

Bonus item B. Strict-Transport-Security with a max-age of at least 6 months. Test: same curl. Pass: HSTS present. Fail: absent. Fix cost: 15 minutes.

Bonus item C. X-Content-Type-Options: nosniff. Test: same curl. Pass: present. Fail: absent. Fix cost: 15 minutes.

Bonus item D. Webhook signature validation on every external integration. Test: send a fake webhook payload to your endpoint with no signature. Pass: 401 or 403. Fail: accepted. Fix cost: 1 to 2 hours per webhook. The Stripe and Twilio integrations are the most commonly unsigned. See the Stripe integration breaks fix for the recurring pattern.

Bonus item E. No secrets in client-side code. Test: grep the compiled bundle for known API key prefixes (sk_, xoxb-, AIza). Pass: zero hits. Fail: any hit, rotate the key immediately. Fix cost: 1 to 4 hours per leaked secret, including key rotation and external system updates.

Headers are technically free in calendar time and uniformly the easiest set of items to fix. Operators skip them because they are invisible, not because they are expensive.

How to run the base44 production readiness audit yourself

The honest order is roughly the order of the categories above, with two adjustments. Run category 7 (backups) and bonus item E (leaked secrets) first, because they are cheap and unblock everything else. A leaked secret on the same day as your launch is a worst case that costs nothing to prevent. A missing snapshot during your first incident is the same.

Budget yourself one focused half-day for the audit itself and a second half-day for the obvious quick wins. The medium items will take a working week distributed across two engineers. The heavy items — RLS coverage gaps, backup pipeline, external observability — are 1 to 3 weeks of work if you are starting from zero. We have run this checklist on apps that needed 2 days of remediation and on apps that needed 6 weeks. The variance is real.

Three failure patterns are worth naming. First, operators who fix only the items that fail and skip the items that pass-but-barely; the latter are usually the next failures. Second, operators who fix the items in isolation and never re-run the checklist as a whole; remediation work introduces new failures roughly 20 percent of the time. Third, operators who fix everything but never institute a recurring audit cadence; the audit drifts back to red within 60 days on an actively developed app.

A clean baseline is worth more than a clean snapshot. We re-audit our clients quarterly and after any AI-generated schema change. The schema-change trigger has caught real regressions in 4 of our last 12 client engagements.

FAQ

Should I run this checklist on a pre-launch app or a live one? Both, in different ways. On a pre-launch app the checklist is a launch gate — six items must pass before you ship. On a live app the checklist is a quarterly regression check, with an out-of-cycle run after any AI-generated schema change. The pre-launch run is more thorough; the live run focuses on the categories most likely to have drifted.

What is the difference between a snapshot and a backup? A snapshot is a platform-level point-in-time copy maintained by Base44. It rolls back code, schema, and sometimes data. A backup is your independent export, controlled by you, restorable to another system. You need both: the snapshot for fast rollback during an editing mistake, the backup for survival of a platform-level data-loss incident or an account lockout.

Why do I need external log shipping if Base44 has function logs? The platform's function logs are present but volatile. They roll on redeploys, and they cannot be queried programmatically with the depth you need for post-incident analysis. External log shipping to a destination you control survives platform changes, supports structured search, and gives you per-correlation-ID traces. It is the single highest-leverage observability investment.

Want us to run this audit for you?

Our $497 production readiness audit applies all 35 items on the checklist to your Base44 app and ships a written report with the failing items ranked by blast radius, the fix-cost estimate per item, and a remediation plan you can hand to your team or to ours. Median turnaround is 48 hours. If the audit surfaces items you do not want to fix in-house, we run a follow-on fix sprint that closes the top items in another 5 to 10 business days. Order the production readiness audit or book a 15-minute call to scope the work first.

QUERIES

Frequently asked questions

Q.01What does a base44 production readiness audit actually check?
A.01

A Base44 production readiness audit walks 30 to 40 specific tests across eight categories: authentication hardening, row-level security coverage, schema and entity validation, error handling and surfaces, observability and external logging, performance and rate-limit posture, data backup and restore, and security headers. Each test has a clear pass or fail criterion you can verify in under five minutes. The output is a prioritized fix list, ranked by blast radius. In 12 of our last 30 engagements, the same five items appeared in the top ten: RLS not actually applied on a critical entity, no external log shipping, no scheduled backup, missing webhook signature validation, and stored XSS in a user-supplied field. The checklist exists so you find these before users or attackers do.

Q.02How is this different from the security hardening checklist?
A.02

The security checklist focuses on the OWASP attack surface — XSS, auth bypass, SSRF, token leakage. The production readiness audit is broader and includes operational readiness: do you have logs that survive a redeploy, do scheduled backups actually run, can you roll back a bad release without losing data, are you under your rate-limit budget. Security is a subset. In practice the audit incorporates the security checklist as its second category and adds the operational items that turn a hardened app into a survivable one. Roughly 18 percent of the failures we catch are not security findings at all — they are missing rollback paths or broken observability.

Q.03How long does it take to run this checklist on my own app?
A.03

Plan three to six hours if you have never done it before and your app is mid-size, meaning 8 to 20 entities and 5 to 15 backend functions. Most of the time is in the RLS verification and the schema drift detection. The auth and security header checks take 30 minutes combined. The observability and backup checks take an hour because you have to actually trigger a failure and watch whether the log made it out. Budget another two to four hours to fix the items that fail. If you are running the audit pre-launch and the answer to most items is missing, the remediation work is the real cost — the audit itself is fast.

Q.04Which items in the base44 production checklist are non-negotiable for launch?
A.04

Six items are launch-blockers in our judgement: RLS verified with a second test account on every user-scoped entity, webhook signature validation on every external integration that mutates data, an external log destination that retains errors after a redeploy, a scheduled backup that has been restored at least once, a current-state snapshot taken in the last 24 hours, and a minimal incident playbook with the steps for the most likely failure modes. The other 29 items range from important to nice-to-have. We have seen launches succeed with several yellow items and recover. We have not seen launches survive a missing RLS check on a sensitive entity.

Q.05What does a fix-cost estimate look like on the checklist?
A.05

Each item has a rough fix-cost in engineer-hours, scoped for a developer who knows the app. Cheap items — adding security headers, enabling structured error responses — are 15 to 30 minutes each. Medium items — wiring external log shipping, instrumenting performance budgets — are 2 to 6 hours. Heavy items — closing an RLS coverage gap across 12 entities, retrofitting webhook signature validation, building a restore-tested backup pipeline — can be a full day each. The total remediation budget for an app that has skipped all 35 items is typically 24 to 40 engineer-hours. We try to surface the cost up front so you can prioritize, not just discover, the gaps.

Q.06Can I skip this audit if my Base44 app is internal-only or low-stakes?
A.06

You can skip about a third of it. Internal-only apps with no external traffic and no PII can skip the security headers section, most of the observability section, and the rate-limit posture work. The non-skippable items are the RLS checks, schema drift detection, the backup-and-restore test, and the snapshot policy. Internal apps fail more often on the schema drift item than external apps do, because internal apps see less testing pressure and schema changes ship without proper migration. We have rescued internal apps from data loss caused by an AI-generated schema change that overwrote a column on day 90 of use.

NEXT STEP

Need engineers who actually know base44?

Book a free 15-minute call or order a $497 audit.