As an investor who’s met hundreds of B2B AI startups, I can tell you with confidence that one trend is crystal clear: modern AI products almost always depend on incumbents’ data, accessed via APIs — Slack, Salesforce, Zendesk, QuickBooks, Jira, LinkedIn and beyond. The typical AI-native app needs the valuable information stored in various systems of record to automate a given set of workflows (thus solving the Messy Inbox Problem). But what happens when these platforms start to hoard their data instead of allowing permissioned third parties to access it?
Salesforce’s recent tightening of access to Slack data is the canary in the coal mine. As of mid-2025, third-party tools can no longer bulk index Slack messages. Non-Salesforce-Marketplace apps are rate limited and cannot store historical data long-term. This move disrupts enterprise copilots, knowledge graphs, and AI agents that were built on this foundational access.
But this isn’t just about Slack! It would appear permissioned data access is fully under siege. JPMorgan has threatened to charge leading fintech account aggregators $300 million per year for data access. Microsoft is asserting stricter control over platforms like Bing and Github. Other incumbents are likely to follow suit. If data access becomes a closed-loop system, what breaks — and what is lost?
As my partner Seema recently wrote, “The platform wars are just beginning — and they won’t be fought with product, they’ll be fought at the API level.” I agree. And while API access may not be cut off completely, the growing restrictions, gating, and monetization pressures will radically reshape how startups build, grow, and price their AI products.
Why are incumbents doing this?The driving motivations for these restrictions typically include:
- Strengthening data privacy and compliance amid tightening regulations (e.g., GDPR, DORA)
- Protecting strategic assets and paving the way for incumbents’ own AI and platform ambitions
- Restricting market competition by locking out rivals from valuable platform and data access
By definition, incumbents are already in the driver’s seat with their customers — why make it easier to get booted to the back?
Will customers rebel — or adapt?Enterprises crave integration. They’ve built internal copilots that ingest Slack, Salesforce, and Jira to answer questions, draft emails, summarize threads, and coordinate incidents. If those taps run dry, productivity drops and internal teams scramble. There are entire ops teams whose responsibility is to prevent data and decision-making from becoming siloed. So we expect these blockages will not occur without a fight.
The truth is, while many customers technically own their data, incumbents still control the pipes. As my partner Alex likes to say, “the best companies [i.e., systems of record] have hostages, not customers.” However, in most cases, even “hostages” won’t tolerate a full-on API blockade. They’ll revolt if they can’t use tools they love. That’s why, as Seema put it, “access stays open — but becomes more expensive.” Incumbents are unlikely to flip the kill switch. They’ll squeeze the ecosystem with rate limits, clunky sandboxes, higher access fees, opaque review processes, and stricter gating behind marketplace policies. This is API war by attrition, not revocation.
We can look to open banking for precedent here. Plaid was founded to let users securely link their bank accounts to prominent fintech applications. Despite Dodd Frank Section 1033, which clearly established that consumers have access to their own financial data, many banks attempted to block Plaid’s access, ultimately fearing that they were being disintermediated from their customers. But something important happened: in many cases, customers had grown so loyal to their fintech apps that they were more likely to leave the bank blocking Plaid than they were the fintech app that could no longer connect.
The implications here are twofold: 1) it might be prudent for regulators to more clearly delineate ownership of data between customer and software provider (ideally affirming the customer’s right to access it wherever they’d like), and 2) it’s possible that the early magic and clear utility offered by B2B AI applications will inspire customers to seriously rebel against any incumbent attempts to restrict access.
If incumbents do block access, what happens to startups that rely on the data?The startups most exposed are those offering unified search, enterprise copilots, automated summarization, and knowledge graph aggregation — basically, anything that needs to pull from one or multiple systems of record to deliver on its product promise. If your product assumes broad, persistent, multi-platform indexing, your model may be at risk. The good news is, there are options:
- Lean into RPA 2.0 — computer use to circumnavigate APIs. Simulate human behavior to actually go into the system of record and pull the necessary data. Clunkier for sure, but an option.
- GTM via marketplaces owned by the incumbents (though that comes with a huge tax)
- Pivot to embed within the enterprise and negotiate access on a per-customer basis. We expect to see many consultative AI automation providers.
- Rebuild from the ingestion layer up, helping customers re-own their own data.
None of these strategies are bulletproof, but they all provide protection against any sudden API access changes. The most immediate risk? Margins will compress. Platform fees + rate limits = weaker unit economics + longer sales cycles.
The open-source opportunity: reclaiming data sovereigntyYes, open source offers a powerful counter-narrative, but it’s not a silver bullet. While open-source LLMs (Mistral, LLaMA), orchestration frameworks (LangChain, LlamaIndex), and vector databases (Pinecone) don’t magically grant access to proprietary incumbent data, they provide a critical mechanism for enterprises to regain control over their own data assets.
When an enterprise can extract its data (or parts of it) from a proprietary platform, open-source tools allow them to process, store, and leverage that data independently. Vector databases, in particular, are vital here: they efficiently store and index the numerical representations (embeddings) of complex, unstructured data (like text from Slack or documents from Salesforce) enabling fast, AI-powered similarity search and retrieval outside the incumbent’s ecosystem.
This means an enterprise can build its own knowledge graphs and AI copilots on its data, deployed on its infrastructure, with open-source models, ultimately breaking free from existing vendor lock-in and dictating its own data destiny. Where open-source truly shines is in portability: it allows you to deploy your solutions wherever the data lives, offering a crucial bulwark against future data clampdowns. Absent any regulatory action — like the aforementioned hypothetical equivalent of Dodd Frank 1033 for enterprise data — open source may serve as the key unlock to enterprises reclaiming control.
Will startups build the entire stack?Absolutely. We’re already seeing early signals:
- Horizontal infrastructure providers like Databricks, Reducto, and Pinecone are powering ingestion-to-inference pipelines.
- Vertical players like Harvey embed tightly with partners, customizing stack and model in lockstep with customer workflows.
- More and more companies are becoming mini–consulting/SaaS hybrids: services upfront to secure the data, productized layers behind the scenes to scale.
In an access-restricted world, “full-stack AI startup” stops being a trope and starts becoming the most defensible go-to-market strategy.
How can each participant in the ecosystem plan for this potential future?Founders and startups:
- Abstract your ingestion layers: design for API failure, include scrapers, embeds, forward-deployed agents, computer use models and RPA
- Negotiate early partnerships: join marketplaces, secure data-sharing contracts, become embedded infrastructure.
- Own at least part of the data stack: build import tools, host within the customer’s firewall, or shift to BYO-data deployments.
Open-source strategically: contribute or maintain ingestion infrastructure as insurance against lockouts.
Enterprise software buyers:
- Control your data destiny: insist on software where you own the index.
- Be cautious of native AI lock-in: if an LLM can only see what one vendor shows it, it’s not your copilot — it’s theirs.
- Favor partners who offer portability: can you swap infra? Move models? Preserve auditability?
Incumbents:
- Be careful not to overreach — platforms that gate too aggressively may slow innovation and trigger quiet customer churn.
- Offer tiered access, transparent terms, and “data export” pathways to avoid regulatory backlash or mass defection.
- Understand that you’re no longer competing with startups, you’re competing with your own ecosystem’s trust.
Final word
We’re entering an era where data access is the single most important strategic input for AI. If the data gets locked down, the future bifurcates:
Path 1: Closed-stack incumbents, whose AI features are lackluster — but they own the customer’s workflows end-to-end. A true hostage situation!
Path 2: Full-stack challengers, who go deeper, integrate tighter, and offer customers freedom, control, and superior outcomes.
Founders building in B2B AI must ask themselves now, not later:
“If every API I use disappeared tomorrow, would I still have a business?”
If the answer is no, it’s time to rebuild. Because in this next chapter, data could very well be the moat, the product, and the leverage.
</div>