A screening call that filters 90 percent of the market
How to vet a Base44 developer in one forty-five minute call: ask seven specific questions that test for production experience on the platform, not for generic vibe-coding skill. The questions cover portfolio with real production apps, Row Level Security rule knowledge, backend function debugging history, SDK decoupling experience, schema migration scars, async tool integration patterns, and verifiable references. Each question has a green-flag answer pattern to listen for and a red-flag answer that should end the call. The seven-question format filters out roughly nine in ten candidates who claim Base44 experience because most have built one personal project and have never shipped production code that survived contact with real users, scale, or the platform's known failure modes.
This post is the working interview script — the seven questions, the green-flag answers, the red-flag answers. It is part of our larger Base44 developer vetting guide, which is the full framework around the call: the five-gate vetting model, the portfolio-review checklist, the paid trial task brief, the six red flags, the rate benchmarks, the engagement-letter must-haves, and the reference-check protocol. Read this post for the call itself; read the hub for everything around it.
We have run roughly 180 candidate screening calls for our internal bench at base44devs.com over the last fourteen months. We extend offers to about 8 percent. The screening call is where most of the filtering happens. Skill assessments and paid tests happen after — they are too expensive to run on the volume of candidates who reach out claiming to be Base44 specialists.
The seven questions below are what the call covers. Each one targets a specific failure mode that we have seen kill real engagements. The green-flag answer pattern is what a candidate who has actually shipped production Base44 apps will say. The red-flag pattern is what a candidate who has built one weekend personal project and is overclaiming will say. The patterns are stable across roles, rates, and seniority levels.
Founders who use this format report cutting bad hires by roughly two-thirds. The unit economics work: a one-hour screening call that prevents a fifteen-thousand-dollar bad engagement is the highest-ROI hour in the hiring process.
Question 1: "Can you share the URL of a production Base44 app where you were the primary developer?"
This is the first question because it is the highest-signal question. About 35 percent of candidates fail it on the first ask.
Green flag. The candidate gives you a URL within ten seconds. The URL resolves to an app with real user traffic — sign-up flow that actually creates an account, a homepage that mentions specific customers or use cases, a pricing page with a real Stripe checkout. The candidate walks you through which parts of the app they built, where they used Base44 entities versus where they wrote backend functions, and which decisions they would make differently now. The walkthrough is specific and unrehearsed. They are comfortable pulling up the IDE and showing you the structure of a backend function or the schema of an entity. If they signed an NDA on the app, they say so directly and offer to walk through a different app or to do a recorded screen-share where they describe the work without showing client-identifying details.
Red flag. The candidate hesitates, then sends you a Figma file, a personal portfolio site that is not on Base44, or a tutorial app they built following the platform's onboarding flow. They describe the app in generic terms — "a SaaS app," "a project management tool" — without showing you specific features they built. They say the app is "no longer live" or "in private beta with no public URL" for every example. They send you a generated app that obviously came out of a single prompt with no follow-up work.
We track this question's hit rate carefully. About 12 of our last 30 final hires gave a clean answer to question one within twenty seconds. The rest needed prompting and gave smaller portfolios. We have never extended an offer to a candidate who could not produce a single production URL.
Question 2: "Walk me through a Row Level Security rule you shipped to production. What was the rule, who was it protecting, and how did you test it?"
Row Level Security is the single most common production gap we see in Base44 apps. About 60 percent of the candidates who claim Base44 experience cannot answer this question concretely.
Green flag. The candidate names a specific entity — Orders, Documents, TeamMembers — and describes the exact rule expression in plain English. They explain who the rule was protecting from whom (typically: a tenant's data from being read or modified by other tenants in a multi-tenant app, or a user's private records from being read by other users on the same account). They describe how they tested it — opening a second browser as a different user, attempting to call Entity.list or Entity.get(otherUserId) from the console, and confirming the call returned an empty result or a 403. They mention the gotcha that RLS rules apply differently to read versus write operations and that the IDE separates them. They mention that AI-generated code frequently bypasses RLS by using admin SDK calls inside backend functions and that they audit for this.
Red flag. The candidate describes RLS in textbook terms — "row level security restricts which rows a user can see" — without naming a specific rule they shipped. They cannot tell you where in the Base44 IDE the rule editor lives. They have never tested a rule with a second user account. They say they "use roles instead of RLS" without being able to explain that Base44's role system runs on top of RLS, not as a replacement. They confuse RLS with frontend conditional rendering ("I just hide the button if the user does not own the record") which is not security at all.
This question maps directly to the RLS out-of-sync fix, which is the single fix we ship most often in audit engagements. If your developer cannot answer this question, your app probably has an RLS hole right now.
Question 3: "Tell me about the most painful backend function bug you have shipped a fix for in production."
Backend functions are where Base44 apps actually break. ISOLATE_INTERNAL_FAILURE, the 405 routing bug, token expiry inside long-running operations, Deno import incompatibilities — none of these show up in tutorials. Developers learn them by shipping and recovering.
Green flag. The candidate names a specific platform error or symptom. The story has a clear arc: the symptom users reported, what the function logs showed, what the candidate hypothesized, what they tried first that did not work, what they tried second that did work, and how they prevented the class of bug from recurring. They mention the platform-specific gotchas — that function logs roll quickly so they ship logs out to an external service, that they wrap every SDK call in try/catch because default platform error messages leak implementation details, that they snapshot before deploying any function change because revert is the fastest recovery path. They acknowledge the platform's debugging limitations and have built workarounds for them.
Red flag. The candidate gives a generic "I had a bug and I fixed it" story with no specifics. They cannot name a single Base44 error message they have personally debugged. They claim they have never had a backend function bug in production, which is either a lie or evidence they have never run a backend function in production. They describe debugging by deleting the function and asking the AI to regenerate it, which is the pattern we see in roughly 18 percent of cases that escalate to us as emergency rescues.
The Base44 error reference catalogs the errors a real production developer should have seen. If their answer does not include at least one error from that list, you are talking to someone who has not shipped to production on this platform.
Question 4: "Have you ever decoupled an app from the Base44 SDK so the data layer could be swapped or migrated? Walk me through the approach."
SDK decoupling is the test for senior-tier judgment. Base44's SDK is convenient but creates lock-in: Entity.list(), User.me(), functions.x() calls scattered across hundreds of components become a migration cliff. Developers who have only built greenfield apps have never had to think about this.
Green flag. The candidate describes a specific pattern. The most common one we hear from strong candidates is a thin repository layer — a repositories/orders.ts module that wraps Entity calls behind a stable interface, so the rest of the app calls OrdersRepo.list() instead of Entity.list("Order"). They explain that the wrapper layer pays off when you need to swap Base44 for Supabase or Postgres, when you need to add caching, when you need to add observability, and when you need to mock the data layer for tests. They mention that they introduce the wrapper at the start of any engagement that might exceed six months because retrofitting it later is expensive. They have read or can describe the patterns in the SDK reference and vendor lock-in deep dive.
Red flag. The candidate has never heard of the problem. They say SDK decoupling is "premature optimization" and the SDK calls "are easy to find and replace later" — this is wrong, and any developer who has actually tried to migrate a Base44 app will tell you it is wrong. They confuse SDK decoupling with general code organization. They describe an over-engineered wrapper layer with five layers of abstraction that no real engagement would ship.
// Green-flag pattern: a thin wrapper that buys migration optionality
// without adding complexity to call sites.
// repositories/orders.ts
import { Entity } from "@base44/sdk";
export type Order = {
id: string;
customerId: string;
total: number;
createdAt: string;
};
export const OrdersRepo = {
async list(filter: { customerId?: string } = {}): Promise<Order[]> {
return Entity.list<Order>("Order", filter);
},
async get(id: string): Promise<Order | null> {
return Entity.get<Order>("Order", id);
},
async create(input: Omit<Order, "id" | "createdAt">): Promise<Order> {
return Entity.create<Order>("Order", input);
},
};
If they hand-wave this question, they will leave your app with hundreds of direct SDK calls and a six-figure migration bill when you outgrow the platform.
Question 5: "Tell me about a Base44 schema migration where you had to add or rename a field on an entity with live production data."
Schema migrations are where data loss happens. Base44 does not have first-class migrations the way Rails or Django do — adding a non-nullable field to an entity with existing rows requires careful coordination, and rolling back is often impossible.
Green flag. The candidate describes a specific migration. They walk through the sequence: snapshot the entity first (because Base44's revert system is the only real rollback path), add the new field as nullable, backfill existing rows in a backend function with explicit logging, deploy the UI code that reads the new field, only then add validation or make the field required. They mention that they always run the migration on a test workspace first by copying the entity. They mention that they have hit the hallucinated fields bug where the AI generates code referencing a field that does not exist, and that they audit for this after every AI edit. They describe the data loss return-to-app bug and how they have shipped fixes for it. The whole answer is grounded in specifics; the candidate has the scars.
Red flag. The candidate has never done a schema migration with live data. They describe schema changes in greenfield terms — "I just delete the old field and add a new one" — which loses data and produces blank screens for users whose UI was reading the old field. They have never used the snapshot system. They have never tested a migration on a separate workspace. They suggest using the AI agent to "just regenerate the schema" which is how 23 percent of the data-loss incidents we audit begin.
The schema migration best practices guide covers the safe pattern. A developer who cannot recite it is one bad prompt away from corrupting your production data.
Question 6: "What is your approach when a Base44 integration with a third-party tool — Stripe, Zapier, Twilio — fails silently in production?"
Integration failure is where many Base44 apps lose revenue or user trust. Stripe subscription webhooks that fire only when users are active (a documented platform bug), Zapier triggers that drop on rate limits, Twilio sends that succeed at the API layer but fail at the carrier — these are the bugs that take down conversion funnels.
Green flag. The candidate's answer starts with observability. They describe shipping logs from every integration point to an external service like Logtail, Axiom, or a free Loki instance, because Base44's platform logs roll too quickly for post-incident forensics. They describe instrumenting every webhook with idempotency keys and replay handling. They mention specific failure modes — Stripe signature validation failures, Zapier deduplication windows, Twilio status-callback handling — and what their pattern is for each. They describe synthetic monitors — a five-minute external cron that hits a known integration endpoint and pages them on failure — because the platform's status page does not reflect per-tenant integration breakage. They mention they have read or written the equivalent of the Stripe integration guide and the webhooks complete guide.
Red flag. The candidate's answer is "I check the logs in the IDE." This is not an answer. Platform logs in the Base44 IDE expire too fast to be useful for any incident that started more than a few hours ago. They have never set up an external log destination. They have never shipped a synthetic monitor. They believe webhook signature validation is optional. They describe a pattern where every webhook handler creates a database record on every call without idempotency, which produces duplicate charges and duplicate sends under retry conditions.
Integration debugging is where senior developers earn their rate. A candidate who cannot answer this question with a coherent observability story is one outage away from a refund-and-apology email to your largest customer.
Question 7: "Give me the URL or contact details of one production app you built where I can ask the founder whether they would hire you again."
This is the reference question. It is last because the candidates who fail the first six rarely make it to this one.
Green flag. The candidate gives you a URL and the founder's email within thirty seconds. They tell you up front what the engagement was — duration, scope, hourly rate or fixed fee. They warn you about any context the reference might mention — "the project ran two weeks over because we changed scope" — rather than letting you discover it. They have done this many times and are comfortable with the process. When you contact the reference, the founder responds within forty-eight hours and gives you a specific, unhedged answer: yes, with reasoning, naming what the developer did well and what they would have wanted differently.
Red flag. The candidate offers references but cannot produce a URL or contact within twenty-four hours. The reference list is a list of friends, fellow freelancers, or other developers — not founders or product owners who paid for the work. The reference, when contacted, gives a hedged answer ("they were... fine," "I would consider them for smaller projects") which is industry code for no. The candidate becomes defensive when you ask for references at all, which is the strongest single negative signal we have observed.
About 25 percent of the references we contact give a hedged answer. We treat hedged answers as a hard no. We have never extended an offer to a candidate whose primary reference hedged.
How to score the call
Use a binary pass-fail per question. Five out of seven passes is the cutoff. Below that, the candidate is not a Base44 specialist regardless of how strong they are on adjacent skills.
Questions 1, 2, and 7 are the highest-weighted. A fail on any of those three is a fail for the whole screen, even if the other four pass. The reason: you cannot fix a candidate who has not shipped production Base44 code, you cannot fix a candidate who does not know RLS, and you cannot afford to skip the reference check.
Questions 3, 4, 5, and 6 are graduated. A junior-tier hire might fail one of those four and still be a fit for a junior engagement under supervision. A senior-tier hire should pass all four cleanly.
Pricing should track the result. A candidate who passes all seven cleanly is in the top decile of the market and prices like it — typically eighty to a hundred and fifty dollars per hour for project-based work, more for retainer arrangements. A candidate who passes five or six is mid-tier and should price in the forty to seventy-five dollar range. Below five passes, the candidate is not worth hiring at any rate; the cost of cleanup will exceed the savings.
What this format does not catch
Three things the seven questions cannot test for in a single call.
Communication style under stress. The candidate is on their best behavior on the call. How they communicate when a production incident is live, when scope is changing, when the AI agent has shipped a regression — that is invisible. The only fix is a paid trial of at least three working days before extending a longer engagement.
Code quality at scale. A developer can describe good patterns on a call and still ship code with poor separation of concerns, missing error handling, and hidden coupling. The only fix is a code review of their actual work — either via a paid test or via the reference's working repository if they grant access.
Cultural fit. Whether the developer's working style matches yours — async versus sync, written versus verbal, structured versus improvisational — is not a technical question and the seven questions do not cover it. Run a separate fifteen-minute fit conversation after the technical screen passes.
When to skip the screening entirely
If your engagement is under a thousand dollars or under one working week, the seven-question screen is overkill. For tiny engagements, the right move is to hire from a pre-vetted bench where the screening has already happened, accept that the rate will be higher than you would pay on the open market, and treat the premium as insurance against a bad-hire blowup.
If your engagement is over fifty thousand dollars or over three months, the screening is necessary but not sufficient. Add a paid test, a reference check on at least two prior clients, and a thirty-day trial period before committing to the full scope.
Get a vetted Base44 developer without running the calls yourself
We run roughly 180 of these screening calls per year and place candidates on our internal bench at base44devs.com when they pass. The bench is currently nineteen developers across timezones with a median of forty-two months of production React experience and a median of nineteen months of Base44 experience specifically. Average time from request to a vetted candidate assigned to your project is four working days. Hourly rates run from sixty to one-eighty depending on seniority and engagement type.
Hire a vetted Base44 developer — we match you to a developer from the bench within four working days, and the matched developer has already cleared the seven-question screen plus a paid technical test.
Related
- Base44 developer vetting guide — the hub guide this post sits inside: five-gate framework, twelve technical questions, six red flags, rate benchmarks, and engagement-letter checklist.
- Base44 developer vetting checklist — the longer-form 32-point vetting protocol for engagements over fifty thousand dollars.
- Red flags when hiring a Base44 developer — the warning signs that should kill a candidate before the screening call.
- Base44 developer job description template — the JD that filters most non-specialists out of your inbound funnel.