One IND Per Workspace: Why We Made Isolation the Default

Pre-IND data is the most sensitive thing a Series B biotech owns. It tells a competitor your dose-ranging strategy, your tox findings, your CMC vendor, and the exact moment your IND clock starts. If that leaks before you file, you don't get to un-leak it.

The damage isn't theoretical. A leaked CMC vendor list shaves months off a competitor's manufacturing path. A leaked tox finding hands them a free read on whether your modality survives 28-day repeat-dose. And a leaked IND timeline tells the patent strategist down the street exactly when your composition-of-matter clock starts ticking.

So when we sat down to design how RegFo stores customer documents, the first question was: what's the smallest unit of isolation we can offer without making the product unusable? The answer turned out to be one workspace per program. Not per company. Per program.

This post is about why.

The shared-bucket problem

Most B2B SaaS handles tenancy by putting everyone in a shared database with a tenant_id column. Row-level security filters the rows. It works for CRMs and project trackers because the cost of a leak is embarrassing, not existential.

For pre-IND data, that math doesn't work. A bug in the filter logic, a misconfigured cache, or a single stray query without a WHERE tenant_id clause and you've handed Sponsor B a copy of Sponsor A's tox package. There's no patch that fixes that. The data is out.

We've watched it happen elsewhere. Samsung's ChatGPT incident in March 2023 was three engineers pasting source code into a browser tab over 20 days (CIO Dive, April 2023). The leak wasn't a sophisticated attack. It was the default behavior of a shared tenant system meeting the default behavior of a paste shortcut.

If your vendor's architecture diagram has one big database at the bottom, you're one query away from being Samsung.

Why per-program, not per-company

Here's the part that surprised our beta users. We don't isolate at the company level. We isolate at the program level.

Two reasons.

First, biotech teams routinely run two or three programs in parallel. The CMC team for Program A talks to a different CRO than Program B. The preclinical lead on Program B might be a contractor who has no business reading Program A's tox data. Putting them in one shared "company workspace" recreates the problem at a smaller scale.

Second, regulatory findings travel. A finding flagged in your gene therapy program's genotoxicity package shouldn't auto-suggest itself in your small-molecule program's review. They're different drugs. Different rules. Different histories. Cross-contamination of findings is its own kind of leak — not of data, but of conclusions.

So one workspace = one program = one IND-track. Members are invited explicitly. Documents live in their own object-storage prefix. The vector index for retrieval is scoped to the workspace ID at query time, not filtered post-hoc.

What we don't do

I'll be specific about the limits, because the marketing pages of every AI vendor I've seen blur this part.

We don't train on your data. Your documents go to the LLM (currently Anthropic's API and Google's Gemini for extraction) under the no-training terms in their enterprise agreements. Those vendors process the data; they don't keep it. We log requests for debugging, but the document text isn't part of any training corpus we control or contribute to.

We don't have SOC 2. We're a small team. Type 2 takes 6-12 months and a vendor budget we don't have yet. If your security questionnaire has SOC 2 as a hard gate, we can't pass it today. We can pass most of the underlying controls — encryption at rest, encryption in transit, access logs, MFA, principle of least privilege — and we'll show you the architecture. We just don't have the audit report.

We don't run on-prem. Pre-IND data lives in our infrastructure, not yours. If your security policy is on-prem only, we're not the right fit yet.

This is the part where most vendor blog posts pivot to a closing pitch. I'm going to do something different and tell you what to ask before you sign anything.

Five questions to ask any AI compliance vendor

These are the questions our biotech buyers actually ask, in the order they ask them.

What's the tenancy model — shared database with row filters, or hard isolation per workspace? If they say "multi-tenant with strict isolation," ask which one.
Where does my document text go after I upload it? Object storage prefix? Vector DB? Which third-party APIs see the raw text?
If I delete a document, what gets deleted — the file, the embeddings, the cache, all three? "All three" is the only correct answer.
Do your LLM providers retain prompt data? For how long? Anthropic's enterprise terms are public. Ask the vendor to point at the specific clause.
What's the blast radius of a single misconfigured query? If the answer involves more than one customer's data, you have a shared-bucket problem.

If a vendor can't answer those in writing, that's the answer.

The honest tradeoff

Per-workspace isolation costs us money. Each workspace gets its own vector index, its own storage prefix, its own retrieval scope. We can't pool embeddings to save on infra. The first time a user uploads a document, we re-extract it, even if a near-identical document is sitting in another workspace.

That's a real cost. We pay it because the alternative — sharing infrastructure across customer programs to save on bills — is the architecture that produces the kind of leak you can't recover from. For our buyers (Series A-C biotechs running two or three IND programs at once), the math works the other way: the cost of isolation is much smaller than the cost of one cross-tenant incident.

If you're evaluating us against a vendor with a different answer, ask them the five questions above. If they pass, great. If they don't, you've saved yourself a Samsung.

Whether or not you end up using us, take those five questions to your current AI vendor this week. If they pass, great — you've stress-tested the relationship. If they don't, you've found something worth fixing before your pre-IND data is the test case.

When you're ready to see what isolated retrieval actually looks like, paste a protocol into a new workspace. First workspace is free, no credit card. If you want to go deeper, our vendor security framework and our take on whether AI belongs in regulatory work at all cover the rest.