The $300/Hour Cross-Reference Problem
A VP of Regulatory at a Series B oncology biotech sent me a stack of six invoices last quarter and asked a question I get a lot: "Tell me what I should actually be paying for."
Here's the answer up front, because it's the only line that matters: AI doesn't replace your consultant. It replaces the part of your consultant's bill that's making you wince. The rest of this post is the math behind that sentence.
The invoices ranged from $8,400 to $52,000. Different firms, different scope, mostly preclinical compliance reviews ahead of an IND. We sat down with a printout and a yellow highlighter. The exercise took an hour. Here's what we found.
What the line items actually say
Consulting invoices follow a standard format. There's a line per task, hours, rate, total. Rates ran from $295/hr to $510/hr depending on the consultant's title. The line descriptions, though, fall into three buckets.
Bucket 1: Cross-referencing. Phrases like "review tox study reports against ICH M3(R2)," "cross-check genotoxicity battery against ICH S2(R1)," "verify GLP compliance per 21 CFR Part 58." Hours: 4-12 per task. Rate: full senior rate.
Bucket 2: Strategic judgment. "Pre-IND meeting strategy session," "FDA interaction approach," "argue dose-justification position with division." Hours: 2-6 per task. Rate: full senior rate or higher.
Bucket 3: Document drafting. "Draft pre-IND briefing book section 4.2," "revise nonclinical overview." Hours: 8-24 per task. Rate: mixed senior/associate.
I ran the math across all six invoices. Bucket 1 was 58-64% of total billed hours. Call it 60%.
That's the cross-reference problem.
What cross-referencing actually is
When a consultant "reviews your tox package against ICH M3," here's what happens. They open your repeat-dose toxicology report. They open M3(R2). They check whether the species selection meets the M3 criteria. They check whether the dose levels meet the M3 50×-AUC rule (and its footnote — see our weekend read of all 24 ICH guidelines). They check whether the duration matches the planned clinical duration per the M3 table. They flag gaps.
That's it. That's the work. It's important work — miss a gap and you eat a clinical hold — but the work itself is mechanical. It's structured comparison between a structured document (the study report) and a structured rule (the guideline).
This is exactly the kind of thing computers do well. Pattern-match a finding against a rule, return a citation, flag the delta. We built RegFo's rules engine around 373 of these comparisons. It does the bucket-1 work in 30 seconds for the cost of a few API calls.
I'm not making the case that this is hypothetical. We've run side-by-side comparisons against real consultant deliverables (with permission, redacted) on five preclinical packages. Findings overlap was 90%+ on the mechanical gaps. The cases where the consultant caught something we didn't were almost all bucket-2 strategic judgment that we don't claim to do.
What's actually worth $400 an hour
Bucket 2. Strategic judgment is what you're paying senior consultants for, and it's worth every dollar.
A few examples from the same invoices:
- "Recommend dose-justification framing for FDA division X based on prior interactions with reviewer Y." That's tribal knowledge. Nobody automates that.
- "Negotiate scope of additional juvenile tox study requirement during pre-IND meeting." That's a live conversation with a regulator. AI doesn't sit in the meeting.
- "Argue for non-rodent biologic chronic study at 6 months instead of 9 per ICH S6(R1)." That's a defensible regulatory position requiring judgment about how a specific reviewer will respond. Worth the rate.
Bucket 3 is a mixed bag. Document drafting is partially automatable, partially not. The structured sections — the parts that are basically templated narrative around your data — those compress nicely. The strategic narrative around your dose justification or your benefit-risk position is judgment work, same as bucket 2.
The fix isn't "fire your consultant"
When I show this analysis to RA leads, the first reaction is usually "great, I'll cancel the consultant." That's not the move.
The right move is shift the ratio. If your consultant is currently spending 60% of billed hours on bucket-1 cross-referencing and 40% on bucket-2 judgment, you're paying $300+/hr for work that should cost $5 in API calls. Swap it: do the bucket-1 work in software, then bring the consultant in for bucket 2.
I've watched teams do this and the consultant relationship gets better. The consultant stops being the person you call to verify your tox table matches M3. They become the person you call when you need to argue with FDA about a specific reviewer's quirks. They like this better too — most senior regulators didn't get into the field to spend their evenings cross-referencing tables.
The reframe: AI doesn't replace your consultant. It replaces the part of your consultant's bill that's making you wince.
What this looks like in numbers
Take a typical pre-IND preclinical review. Old model: $25,000 consultant engagement, 60% bucket 1 ($15,000), 40% bucket 2 ($10,000). Ten weeks elapsed time.
New model: software check ($399/mo), 30 seconds, catches the bucket-1 gaps. Then a $10,000 consultant engagement scoped to bucket-2 judgment only. Total: $10,400. Three weeks elapsed time.
Saved: $14,600 per submission. About 7 weeks of clock. The consultant gets paid for the work they're actually best at.
That's the math we run for buyers. It's also why VPs of Regulatory are the easiest people on a biotech leadership team to sell to right now: they've been staring at these invoices for years.
The fastest test: pull up your most recent consultant invoice. Pick out the bucket-1 lines (cross-reference, verify, check against guideline). Then paste the same protocol into a free workspace. 30 seconds. Compare what the engine flags against what you paid for. If the overlap is what we see in our side-by-sides, you've just found 60% of next quarter's consulting budget.
For the broader question of when each model makes sense, our Regfo vs. consultants post goes deeper.