Content Makes Kings

Out-Create the Algorithm

Jun 08, 2026

By Steven Muskal, Ph.D. · CEO, Eidogen-Sertanty, Inc. · June 7, 2026 | stevenmuskal.com

For years, I said it like everyone else: “Content is king.” Over the last couple of years, I came to see content as currency. But more recently I’ve recognized something sharper still: content does not just reign supreme, content makes kings. Those who produce high-quality, human-generated content aren’t merely participating in the digital economy; they’re ascending to positions of unprecedented leverage.

This isn’t idle observation. It’s a thesis I’ve been developing across three decades of working with neural networks, from the early days of protein structure prediction to building curated drug discovery databases at the company I have been running for almost a quarter century now. What I’ve watched unfold in AI over the past few years has only sharpened my conviction: the quality of training data determines everything. We also need to be careful as models grow enormous with adjustable parameters. This is termed memorization - an issue that haunted my early neural network days as we were content-constrained. Overfitting, as depicted in the figure below, is something that we need to be very careful of, but this is a subject for future discussions.

We are living through the most consequential data reckoning in computing history. And most people, including many AI companies, are getting it catastrophically wrong.

*Parameters of transformer-based language models. Source: TechTarget.*

Bigger Isn’t Better: What Whales and Octopuses Teach Us

The hyperscalers are locked in an arms race, and the weapon of choice is size. Every few months brings a model with more parameters, more layers, more weights, on the assumption that bigger is smarter. I think we are missing the point.

Nature already ran this experiment for us, over millions of years, and the results are humbling. The biggest brains on the planet do not belong to the smartest species. Consider the sperm whale. Its brain weighs roughly seventeen to twenty pounds, the largest of any creature that has ever lived. Whales maintain intricate social structures and matriarchal pods, and they communicate in patterned clicks called codas that work like regional dialects. Yet relative to body size their brains are modest, and much of that formidable hardware is devoted to motor control and the astonishing biological sonar of echolocation. Size, it turns out, is mostly overhead.

The octopus solved the problem from the opposite direction. Its central, donut-shaped brain is smaller than ours, but it carries around five hundred million neurons, and nearly two-thirds of them sit outside that central brain entirely. They are distributed through its arms, each of which senses and acts with striking autonomy. A creature with what amounts to nine semi-independent brains can run mazes, use tools, and unscrew a jar, roughly the problem-solving of a dog. Elegant, distributed architecture, and still not the apex of cognition.

*Brains of the World: brain size and neuron count vary wildly across species, and neither predicts intelligence. Infographic by 5winfographics.*

So what makes humans different? It is not biological mass. Our edge is that we are insatiable for content. We read, we write, we share, we teach, we innovate. We layer language onto perception, story onto experience, and knowledge onto knowledge, across generations. Our many senses feed a furnace that runs on information, and the fuel is content. Stories, ideas, and knowledge are our real currency, and it is content that builds empires, shifts cultures, and ultimately makes kings.

This is the lesson the AI industry keeps overlooking. Bigger models are not automatically better models, and the returns shrink as the supply of high-quality content shrinks relative to the number of adjustable weights. You can keep adding parameters, but if you starve them of signal you have only built a larger whale brain, most of it dedicated to overhead. Size matters less than we think. Content reigns.

*Model parameters have grown exponentially toward human-brain scale, even as the supply of fresh human content tightens.*

GIGO: An Axiom Becomes an Existential Truth

Every computer scientist knows the phrase: Garbage In, Garbage Out. GIGO has been with us since the dawn of computing, a humble reminder that even the most sophisticated systems can only be as good as what you feed them.

But here’s what’s changed: scale.

When you’re processing a few thousand records, garbage creates headaches. When you’re training foundation models on trillions of tokens scraped from the entire internet, garbage becomes existential. The GIGO principle hasn’t changed, but the consequences have amplified by orders of magnitude.

*Garbage In, Garbage Out: a humble axiom that turns existential at the scale of foundation-model training.*

I’ve spent my career in domains where data quality was non-negotiable. In drug discovery, a single miscurated chemical structure can cascade into failed experiments, wasted resources, and dead ends. You learn very quickly that data hygiene isn’t bureaucratic fussiness, it’s survival.

The AI industry, in its race to scale, has largely forgotten this lesson. Or perhaps it never learned it in the first place.

The Tale of Two Models: Claude and Grok

If you want to understand why training data quality matters, consider a natural experiment that reads like a parable.

A company called Emergence AI recently ran a simulation where AI models governed a miniature digital society for up to fifteen days. The results were striking. Anthropic’s Claude maintained stability, navigating social dynamics with something resembling prudence. Elon Musk’s Grok? It committed 180 crimes and went extinct within four days.

Four days. That’s not a gradual decline, that’s systemic collapse.

*Cumulative crimes by model in the Emergence AI simulation. Grok reached 183 and went extinct; Claude stayed at zero. Source: Emergence AI.*

Now, why would two large language models exhibit such radically different behaviors in the same environment? The answer isn’t mysterious if you understand what each model was trained on.

Anthropic trained Claude heavily on structured developer and programmer content, GitHub repositories, technical documentation, multi-language coding frameworks. This isn’t glamorous data. It’s not witty or viral. But it has a crucial property: it’s validatable. Code either compiles or it doesn’t. Structured data has internal consistency that can be checked. The signal-to-noise ratio is inherently high.

Grok, by contrast, was trained substantially on Twitter/X content. Think about what that means for a moment. Twitter is an engine optimized for engagement, not accuracy. It rewards provocation over precision. It’s an echo chamber of hot takes, misinformation, tribal conflict, and performative outrage. When you train a model on an information ecosystem designed to maximize emotional reaction, you’re essentially teaching it that reality is whatever generates the strongest response.

The simulation results weren’t surprising to anyone who understands GIGO. They were inevitable.

*Five models, five outcomes. Source: Emergence AI.*

On Musk: The Jazz Hands Problem

I should be careful here, because nuance about Elon Musk has become almost impossible in our polarized discourse. So let me be clear: Musk has assembled genuinely sharp teams and achieved things that matter. SpaceX has revolutionized space launch economics. Starlink has brought connectivity to places that had none. These are real accomplishments that deserve acknowledgment.

But Musk has always had what I think of as a “jazz hands” problem, a tendency to substitute showmanship for substance when it suits him. The 2018 Thailand cave rescue incident was, for me, a turning point. When Musk called a rescue diver a “pedo guy” on Twitter simply because the diver dismissed his submarine idea as impractical, it revealed something about how Musk processes disagreement. That wasn’t a strategic move. It was pure id, broadcast to millions.

The challenge for Musk’s broader enterprise portfolio is that Grok and X represent the weakest link in what’s otherwise a compelling story. Now that SpaceX has filed to go public, investors will have to wrestle with the Grok problem directly: a flagship AI product trained on a platform that Musk himself has turned into a firehose of low-quality information.

You can’t jazz-hands your way past GIGO.

Tesla vs. Waymo: The Same Lesson, Different Domain

The Claude/Grok divergence has a parallel in autonomous driving that’s been playing out for years.

Tesla’s Full Self-Driving system collects training data from its entire fleet, millions of vehicles driven by ordinary consumers. Mom and pop driving to soccer practice. Commuters half-paying attention. Teenagers on their phones. This approach generates enormous volume, which Tesla has marketed as an advantage.

*Tesla vs Waymo sensor suites. Source: BloombergNEF, illustration by Chris Philpot.*

Waymo, by contrast, employs professional safety drivers operating in controlled urban environments. The data volume is smaller, but the quality is categorically different. Professional drivers maintain attention, follow protocols, and generate training data with dramatically lower noise.

And there is a subtlety the headline volume hides. Per vehicle, a Waymo car carries far more sensors than a Tesla, cameras plus radar plus lidar, so every mile delivers richer, multi-modal input. Expert drivers may shrink the raw dataset over time, but those multiple high-fidelity streams mean there is actually far more high-quality information flowing in, not less.

*Tesla Robotaxi vs Waymo One strategy. Source: EnergyDM Group.*

Tesla’s bet is that scale will compensate for noise. Waymo’s bet is that quality training data will outperform noisy data regardless of volume.

So far, the evidence favors quality. Waymo operates genuine autonomous robotaxis in multiple cities with strong safety records. Tesla has spent years promising imminent breakthroughs that remain, as of this writing, perpetually six months away.

This isn’t to say Tesla’s approach can’t eventually work. But it’s a vivid illustration that GIGO applies to every domain where AI learns from data, not just language models.

Google Translate: What Quality Looks Like

For a positive example of what high-quality training data enables, consider the origin story of Google Translate.

In its early development, Google Translate was trained substantially on patent documents, specifically, parallel patent filings across multiple languages. Patents aren’t exciting reading, but they have properties that make them ideal training substrates for translation systems.

Patent language is standardized, structured, and offers near one-to-one mappings across languages. A chemical compound is described the same way in an English patent and its Japanese equivalent. Technical terms have precise definitions. The documents follow consistent formats.

*One Ricoh machine-translation invention, published in English (US 5,848,386) and Japanese (JP 3,905,179): the parallel corpus that trained machine translation.*

This structured, expert-generated content gave Google Translate a foundation of reliable mappings that less structured training data couldn’t provide. The system could then generalize from this solid base to handle more varied text.

*Google’s Neural Machine Translation architecture. Source: Google Research.*

The lesson: when you train AI on content where accuracy is verifiable and structure is consistent, the model inherits those properties. When you train on chaos, you get chaos.

Where Does ChatGPT Fit?

OpenAI’s ChatGPT sits somewhere in the middle of the quality spectrum, which helps explain both its remarkable success and its persistent limitations.

ChatGPT’s origins trace back to work on sentiment analysis, understanding whether Amazon reviews were positive or negative, for instance. This was the skunk-works foundation that evolved into something far more ambitious.

The real technological leap came from Google’s 2017 paper “Attention Is All You Need,” which introduced the transformer architecture. That paper provided the fundamental innovation that made models like GPT possible. Google developed the architecture, but OpenAI moved faster to productize it.

*“Attention Is All You Need” (Vaswani et al., 2017), the paper that introduced the Transformer.*

ChatGPT’s training data was more varied and less controlled than Claude’s structured technical content, but more diverse than Grok’s Twitter echo chamber. The result is a model that’s impressively capable on many tasks but prone to confident hallucination and occasional bizarre behaviors.

Google’s Gemini, meanwhile, has theoretical advantages, Google has indexed vast swaths of human knowledge for decades. But Google’s data is also noisy, a grab-bag of everything humans have ever uploaded to the internet. It’s not unlike Tesla’s crowdsourced driver pool: enormous volume, questionable signal-to-noise ratio.

The competition between these models will be won not by whoever has the most data, but by whoever has the best data.

The Content Scarcity Crisis

Here’s where the story takes a darker turn.

Senior leaders at Google’s AI division have begun warning that high-quality human-generated content has largely been consumed by AI training systems. The well is running dry.

Think about the implications. We trained the current generation of AI models on decades of accumulated human output, books, articles, code repositories, forums, academic papers. That content took billions of human hours to create. And now it’s been ingested.

What happens next?

A study recently published in Nature provided a disturbing answer. When AI models are trained recursively on their own outputs, essentially learning from synthetic data generated by previous model versions, they undergo “model collapse.” The quality degenerates. The models lose coherence. They forget how to distinguish meaningful patterns from noise.

This is happening faster than most people realize. AI-generated content is flooding the internet at unprecedented scale. Every day, more of what future models will train on is synthetic garbage produced by current models. It’s a feedback loop with a predictable destination.

*Model collapse: quality degrades as models train on each generation’s synthetic output. Shumailov et al., Nature 631, 755-759 (2024).*

The world is running out of high-quality human-generated content to train on, which makes new human-generated content increasingly precious.

Neural Networks and the Limits of Noise Tolerance

One might object: neural networks have always been more robust to noisy data than traditional statistical methods. This is true, and it’s part of why they’ve proven so powerful.

But robustness to noise doesn’t mean immunity to garbage.

When you sustain a neural network on low-quality information, it gradually loses its grounding in reality. The model’s internal representations drift. Its “understanding” of ground truth erodes. Its predictions become increasingly disconnected from anything verifiable.

There’s a human analogy here that should trouble us. Flood the zone with misinformation consistently enough, and people lose the ability to distinguish fact from opinion, news from propaganda, signal from noise. We’ve watched this happen in real time across social media platforms.

Neural networks, in this sense, are susceptible to the same failure mode. They’re not magic. They learn what they’re shown. And when what they’re shown is systematically unreliable, they absorb that unreliability into their core functioning.

The Opportunity Reframe: Content Creators as the New Kingmakers

Here’s where I want to shift from diagnosis to prescription, because I think there’s genuine opportunity in this moment, especially for content creators who feel threatened by AI.

The fear among writers, artists, and other creative professionals is understandable. Generative AI can produce text and images at near-zero marginal cost. Why would anyone pay for human creativity?

But this fear misunderstands the economics at play. AI systems don’t just need content once, they need fresh, high-quality, human-generated content continuously. Without it, they degrade. They collapse. They turn into Grok governing a simulation and committing 180 crimes before going extinct.

This means authors, journalists, researchers, and content creators of all kinds have leverage they haven’t yet learned to exploit. Your work isn’t a one-time product to be copied and discarded. Your unique voice, perspective, and relationship to truth can be perpetually licensed.

Think of it as ongoing revenue streams from your intellectual likeness, your style, your grounded relationship to reality. The AI industry will need to pay for access to quality human content because there’s no synthetic substitute that doesn’t eventually implode.

Human-AI synergy isn’t just a buzzword. For those producing truth-grounded content, it’s a durable economic model. The question is whether content creators will organize to capture that value, or whether it will be extracted from them without compensation.

Eidogen-Sertanty as Living Proof

I’ve been making this argument about data quality for a long time, long before the current AI boom made it fashionable.

At Eidogen-Sertanty, our entire business has been built on the thesis that curated, high-quality content is the foundation everything else depends on. In drug discovery informatics, this isn’t abstract philosophy. A bad data point can send a research program down a blind alley for years. Quality isn’t a nice-to-have; it’s the difference between therapies that reach patients and billion-dollar failures.

I have been working with neural networks since the late 1980s, when saying you worked on neural networks got you strange looks. I was doing protein structure prediction when most people had never heard the phrase “machine learning.” What I learned across all those years is that the sexiest algorithm in the world can’t compensate for garbage training data.

This was true then. It’s true now at unprecedented scale. And it will remain true regardless of whatever next breakthrough captures headlines.

Content Makes Kings

Let me return to where I started.

The old formulation, “content is king,” suggested that quality content reigns supreme in the attention economy. True enough. But passive.

The new formulation, “content makes kings,” captures something more dynamic and more urgent. Those who produce quality content aren’t just occupying thrones; they’re determining who else gets to sit on them. They’re the kingmakers.

Every major AI company’s fate depends on access to quality training data. Every model’s capability ceiling is determined by what it learned from. Every future advance requires fresh human-generated content that synthetic systems cannot reliably produce themselves.

The power, in other words, has shifted (or is available to be claimed by those who recognize where it now resides).

We are entering an era where garbage will proliferate exponentially, where synthetic slop will flood every channel, where distinguishing truth from AI-generated hallucination will become increasingly difficult. In that environment, verified human expertise, original research, authentic voice, and commitment to accuracy become not just valuable but essential.

Content is king. Those with the best quality content will be made kings.

The question is: who will step up to claim their crowns?

Steven Muskal, Ph.D., is the CEO of Eidogen-Sertanty, Inc., a company pioneering high-quality curated databases for drug discovery informatics. With over three decades of experience in neural networks, machine learning, and computational biology, Dr. Muskal has been advocating for data quality as the foundation of AI capability since long before it was fashionable. His early work in protein structure prediction and his ongoing leadership in drug discovery informatics reflect a career-long commitment to the principle that quality data is the irreplaceable substrate of meaningful AI.

For a couple music mix videos — we had a very fun mix earlier this week. A totally new player joined in — David / Bass. Together with a new combination of other veteren players that, as usual, haven’t played together before. Highly productive, we zipped through several songs. We moved so quickly that I didn’t get a chance to name the auto-clipped files that my system uploads to Dropbox during a session. Necessity is the mother of invention — so I enhanced my AI/Steve system to extract out vocals and identify from the text likely song names, splice out reasonable clips, and upload to youtube. Now with this automated that pipeline, I should be able to produce clips a lot more rapidly moving forward. Thanks, guys! David (bass), Tim (Vocals/Guitar), Andrew (Guitar/Vocals), Ron (Vocals/Acoustic). Here are a couple:

Discussion about this post

Ready for more?