Foundation Models Explained: What They Are and Why They Matter

The term "foundation model" was coined by Stanford researchers in 2021 to describe large neural networks trained on vast datasets that can be fine-tuned or prompted to perform a wide range of downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models — trained once, used everywhere.

What makes foundation models different from previous AI systems is their generality. Earlier machine learning models were trained for specific tasks: a model that classified images couldn't write code. Foundation models break this constraint: the same underlying model can write essays, analyze code, answer medical questions, and generate images, often at levels that match or exceed specialized systems.

How Foundation Models Are Trained

Large language models (LLMs) — the most prominent class of foundation models — are trained using self-supervised learning on massive text corpora. The training objective is simple: predict the next token in a sequence. Despite its simplicity, this objective, applied at massive scale with vast data, produces models that develop sophisticated world knowledge, reasoning capabilities, and language understanding.

The key insight from scaling research is that model capability follows predictable power laws: more parameters, more data, and more compute consistently produce better models, with few signs of diminishing returns at current scales. This "scaling hypothesis" drove the race to train ever-larger models that defined the 2020-2024 era of AI development.

The Business Model: APIs, Fine-tuning, and Deployment

Foundation models have created a new layer of the AI stack: companies like Anthropic, OpenAI, and Mistral train foundation models and offer access through APIs. Businesses build applications on top, paying per token. Below the foundation model sits inference infrastructure (Groq, Together AI, Cerebras), and above it sits the application layer — the actual products end users interact with.

At StarX Capital, we invest across this stack. The foundation model layer itself requires enormous capital and is dominated by a handful of well-funded labs. The more accessible investment opportunities are in the infrastructure and application layers — though identifying durable application moats in a world of rapidly improving foundation models remains the central challenge.

Interested in what we're building?

StarX Capital backs early-stage founders at the intersection of crypto and AI.

Pitch to us →