00 /FRAMEWORK · AUDIT
What a base44 ai builder audit actually covers.
An audit of an AI-built base44 app is not a code review. It is a structured walk through 12 components — security, RLS, performance, SEO, function timeouts, credit-burn, RBAC, webhooks, DNS, Open Graph, sitemap, error monitoring — that catch the failure modes the AI builder reliably leaves behind. This page explains each component, why AI-generated apps fail it, and how to score your own app before you decide to outsource.
Short answer
Last reviewed · 2026-05-08
A base44 ai builder audit is a structured review of an AI-built base44 app against twelve components: security and Row-Level Security, the auth surface, performance, SEO, function timeouts, credit-burn rate, role-based access control, webhook reliability, DNS, Open Graph metadata, sitemap, and error monitoring. The framework targets the failure modes specific to AI-generated code — orphan policies, missing connecting handlers, untested error paths, prompt regressions — that a generic code review will miss. Each component is checked pass / fail / not-applicable, and the output is a written report with reproduction steps. Apply the framework yourself by walking each section below, or order the productized $497 audit at /audit to have a senior engineer apply it for you in one business day.
01 /THE PROBLEM
Why AI-built ≠ verified built.
Base44's AI builder ships features that look complete in the editor but are partially built in the codebase. The page renders, the button clicks, the toast fires — and then the function call lands in a handler that was never wired into a route, or hits an RLS policy the AI forgot to update when it added the new column.
This is the gap an audit fills. The AI does not lie about what it did; it just does not know what it did not do. A code review judges code on style, structure, and idiom. An audit verifies that the code does what the AI claimed it built — that the RLS policies actually match the queries, that the webhook handler is wired into a route, that the type definitions match the runtime schema, that auth context propagates into every function the AI added.
Twelve components, in the rest of this page, define what “checked” means. Each section explains the check, why AI-builder apps fail it specifically, and links to the matching deep-dive remediation guide.
02 /FRAMEWORK
The twelve components every audit covers.
Each component is a focused check, a known AI-builder failure mode, and a remediation path. Read each one and score your app pass / fail / unknown. The components are listed in the order we walk them during the productized audit at /audit.
Row-level security is the single highest-stakes thing the base44 AI builder will touch and silently get wrong. Every time the AI adds a column, renames a table, or introduces a new tenancy boundary, the existing RLS policies need to follow — and most of the time they do not. The audit enumerates every policy on every table and cross-references it against every read and write the app issues. The common AI-generated failure mode is an orphan policy: a rule pointing at a column that no longer exists, or a table that has no policy at all and is therefore wide open to any authenticated user. Eight of every eleven AI-built apps we have audited had at least one of these. The check is not about whether RLS is enabled (the platform tells you that); it is about whether the policies actually constrain what the SDK returns.
AI-generated authentication code passes superficial review and fails on edges. The login page mounts, the OAuth callback redirects, the session cookie is set — and then there is a route the AI added that reads the request without actually validating the token, or a function that runs as the service role because the auth context never propagated into it. The audit traces auth context from the edge to the data layer for every authenticated route, walks the OAuth callback for PKCE and state validation, and tests session expiry behaviour. AI builders are particularly weak at PKCE — the spec is fiddly, the AI ships handlers that decode tokens but do not verify them, and the gap is invisible until somebody mints a token by hand.
TTFB, LCP, INP, render path, CSR fall-back. The AI builder optimises for what compiles, not what is fast. The render path the AI shipped probably round-trips the database three times before painting first content because each component fetches its own slice — the fix is one parent fetch and prop-drilling, but the AI does not refactor in that direction unprompted. Missing indexes are the second silent failure: queries look fine in dev with 200 rows and fall over at 10,000. The audit runs the page against a representative production-shaped dataset and measures actual response times, not synthetic ones. Anything that crosses a 200ms tail at the 95th percentile is flagged with the offending query and the index that would resolve it.
AI-generated apps default to client-side rendering, which means Google indexes a blank shell. The audit checks whether the public routes ship server-rendered HTML with the actual content visible to view-source, whether the meta tags are dynamic per route or static and inherited from a layout, and whether structured data (Organization, BreadcrumbList, the canonical entity for each page) is present and valid. The dominant AI failure is the blank-shell pattern: the home page has a perfect <title> but every product/article/listing page inherits it because the AI did not wire dynamic metadata generation. The second failure is hallucinated schema — JSON-LD that validates as syntactically correct but references properties that do not exist, which Google silently ignores.
Base44 functions have a hard execution ceiling. The AI builder writes synchronous code that walks a list, calls an external API for each item, and waits — fine for ten items, a guaranteed timeout at five hundred. The audit enumerates every function that does I/O inside a loop, every function that fans out to a third-party service, and every function whose execution time grows linearly with input size. Each gets a projected timeout threshold (the input volume at which the function begins to fail) and a recommended pattern (queue + worker, or batch processing). This is the failure mode users see as 'works for me, breaks for the customer with the big account' — and it is invisible until someone with a real-shaped dataset hits the route.
Every AI prompt costs credits, and the prompt patterns the AI builder ships when it scaffolds your app are not the cheap ones. The audit measures the cached/uncached ratio across the prompts your app issues, projects monthly credit cost at three traffic levels, and flags the prompts that are most aggressively re-evaluating context that could be cached. The AI tends to re-fetch reference data on every request rather than cache it, embed entire JSON blobs in prompts that could be summarised, and call the model for trivial transformations that string functions would handle for free. We have audited apps spending $2,800/month on prompts that should have cost $180. Credit burn is the kind of bug that looks like a feature until the bill arrives.
RBAC is what RLS becomes when you have more than two roles. The AI builder ships the happy path — owner reads everything, member reads their own — and silently misses the four edge cases between them: invited-but-not-yet-accepted users, suspended users, role transitions mid-session, and admin impersonation. The audit enumerates every role the app supports, every route, and walks each route as each role. The matrix is what surfaces the gaps: a route an unauthenticated user can hit that mutates data; a route a suspended user can still access; an admin route that does not log who took which action. Multi-tenant scoping is checked at the same time — every query either filters by tenant or is flagged.
Webhooks fire whether or not anyone is logged in, and the AI-generated handler typically reads the payload, writes a row, and returns 200. What is missing is idempotency, retries, and the active-user constraint that the platform sometimes silently imposes. Stripe retries failed webhooks for three days; the AI handler writes a duplicate row each time. The audit verifies signature validation, idempotency keys, dead-letter handling, and (for base44 specifically) the bug-class where webhook handlers only execute when a user is actively in the workspace. Every webhook is exercised against a retry storm and a signature-mismatch attack to confirm the failure mode is graceful rather than silent.
Custom domain, SSL certificate, www / apex routing, MX records for transactional email. The AI builder does not touch DNS, so the typical failure is the team that points an A record at base44 and assumes it is done — and discovers six weeks later that email from no-reply@theirdomain is going to spam because there is no SPF record, the DKIM signature does not validate, and DMARC is unconfigured. The audit checks the live DNS for the deployed domain, validates the SSL chain, confirms www and apex both resolve, and checks the email authentication trio (SPF, DKIM, DMARC) so transactional mail actually arrives. This is the cheapest and most overlooked component of the framework.
Open Graph is what makes a link preview look like a link preview. The AI builder ships a single static og:image inherited from the layout, which means every link to every page on the app — products, articles, dashboards — previews with the same image and the same generic description. The audit checks whether og:title, og:description, and og:image are dynamic per route, whether twitter:card is set, whether the og:image dimensions match the 1200×630 social card spec, and whether the tags actually render in the server response (not injected by client-side JavaScript after the crawler has left). Most AI-built apps fail the static-image test and the SSR test simultaneously.
Google does not crawl what is not in the sitemap. The AI builder occasionally generates a sitemap.xml but rarely keeps it current — the file lists routes that no longer exist, omits routes that were added last week, and uses lastmod values from the day the project was scaffolded. The audit walks the sitemap, validates every URL resolves to a 200, cross-references it against the app's actual public route table, and checks robots.txt for accidental Disallow rules. The robots.txt failure mode is particularly common: the AI scaffolds a Disallow: /api line and accidentally blocks the marketing pages because they happen to live under a path the rule matches.
If you cannot see errors, you cannot fix them. The AI builder rarely wires Sentry, Datadog, or any external observability — it relies on base44's own runtime error view, which captures the obvious crashes and misses the silent ones. The audit checks whether errors thrown inside try-catch blocks are forwarded to a monitoring service or swallowed, whether front-end errors (the kind that produce a white screen for one customer and nobody else) are captured, and whether request-correlation IDs flow from edge to function to database query. The most common AI failure is the catch block that returns a default value and never reports the underlying error — the UI shows success, the database is empty, and there is no log line anywhere.
Total: 12 components, 110 individual pass/fail checks, approximately 340 minutes of scoped engineer time when applied end-to-end. The same framework is applied to every audit, so reports are comparable across apps and across snapshots of the same app over time.
03 /SELF-SCORE
Score your own app against the twelve components.
The framework is published so any team can apply it. Walk the twelve components in order, score each one pass / fail / unknown, and budget two to four hours per component for an honest pass.
A./STEP
Walk each component in order
Read the component explainer above and the linked deep-dive remediation guide. For each one, write down whether your app passes, fails, or you do not yet know.
B./STEP
Reproduce every fail
A finding without a reproduction recipe is a guess. For every component you marked fail, write the exact steps that surface the bug. If you cannot reproduce, downgrade to unknown.
C./STEP
Decide: self-fix or outsource
Three or fewer unknowns and zero fails on security and auth? You probably do not need an outsourced audit. More than that? Order the $497 audit at /audit and a senior engineer applies the same framework in one business day.
04 /PROVENANCE
About the framework.
The twelve-component framework was assembled by Base44Devs across eleven independent audits run between November 2025 and April 2026. We started with the OWASP top ten and the Wiz / Imperva July 2025 disclosures, then added the components that kept surfacing as the AI did not know it had to do this: function timeouts, credit-burn projection, webhook idempotency under retry storms, RLS drift after AI edits, the silent CSR-fallback pattern that hides marketing pages from Google.
Every component is grounded in a real failure observed in a real app, not in a theoretical threat model. The framework is published in full because publishing it is what makes it auditable — anyone can apply it, anyone can argue with it, anyone can extend it. The version on this page is rev. 2026-05. Read about the team that built the framework.
If you want a senior engineer to apply the framework against your specific app and ship a written PDF report inside one business day, that is the productized audit at /audit — fixed price, $497, refundable against any subsequent fix or build engagement.
05 /FAQ
Frequently asked questions
Q.01What does an audit actually check on a base44 app?
An audit walks 12 components — security and RLS, the auth surface, performance, SEO, function timeouts, credit-burn rate, RBAC and multi-tenant scoping, webhook reliability, DNS, Open Graph, sitemap, and error monitoring — and produces a written report with reproduction steps for every issue. The framework is the same for every audit so the report is comparable across apps and across time. Findings are ranked critical / high / medium and shipped with a remediation path for each one.
Q.02Why do AI-generated apps need a different audit than hand-written ones?
Because AI-generated code fails in a different shape. Hand-written code typically fails on the parts the engineer thought hardest about — the AI fails on the parts it did not realise were connected. RLS policies that did not get updated when a column was renamed, webhook handlers that were generated without idempotency, OAuth callbacks that decode tokens but do not verify them. The audit framework is designed to catch the integrity gaps the AI does not flag, not the style issues a code review would.
Q.03Can I score my own app against this framework without buying an audit?
Yes — that is what this page is for. Walk the 12 components, read the matching /fix deep-dive for each one, and score your app pass / fail / unknown on every line. If you finish with three or fewer 'unknown' verdicts and zero 'fail' on the security and auth components, you probably do not need an outsourced audit. If you finish with more, the productized audit at /audit runs the same framework against your specific app in one business day for $497.
Q.04How long does the framework take to apply?
A senior engineer applying the framework end-to-end against a typical 30-table app takes approximately 340 minutes of focused work — 110 individual pass/fail checks across the 12 components. A team applying it to their own codebase should budget two to four hours per component, more if any of them surface issues that need investigation. The Base44Devs productized audit at /audit ships the result in one business day because the framework is rehearsed and the tooling is preloaded.
Q.05How is this different from a generic security audit?
A generic security audit assumes a hand-written codebase and looks for the OWASP top ten. The base44-specific framework looks for the AI-builder failure modes — RLS drift after schema edits, webhook handlers without idempotency, function timeouts on operations the AI scaffolded for ten items, credit-burn from prompts that should have been cached. Security and RLS is one of the twelve components, but the other eleven are platform-specific and AI-generation-specific. A generic auditor will miss most of them.
Q.06What if I find issues during self-scoring?
Each of the 12 components on this page links to the matching deep-dive remediation guide under /fix. Read the relevant /fix page, follow the reproduction steps, and ship the patch. If the issue is structural enough that you do not want to fix it yourself, the productized audit at /audit produces a written remediation plan with effort estimates so you can hand the work to any competent engineer (us, your in-house team, a freelancer) without re-explaining the framework.
Q.07Where did the 12-component framework come from?
The framework was assembled from 11 independent audits Base44Devs ran against AI-built apps over the last six months. We started with the OWASP top ten and the Wiz / Imperva July 2025 disclosures, then added the components that kept surfacing as 'the AI did not know it had to do this' — function timeouts, credit-burn, webhook idempotency, RLS drift after AI edits. The framework is published in full so it is auditable and so anyone can apply it. See the framework page on the /about section for the rationale.
06 /NEXT STEP
Want us to run this on your app?
The same framework, applied by a senior engineer, written PDF report in one business day. $497 fixed-fee. Refundable against any subsequent fix or build engagement.