Why Science Gets Stuck — The Thinking Archive

I was doing nothing.

Not reading a paper. Not running an experiment. I was letting my mind drift across worm biology, swarm robotics, warfare, neuromodulation, when something clicked that months of structured work hadn't produced. A connection between fields I had no reason to juxtapose. An insight that arrived not from effort but from its absence.

That daydream became the spine of my research.

This bothered me. It should bother you too.

Because if the most important cognitive event in months of serious work happened during the one hour nobody was measuring; working alone, no lab, no grant, rented compute then we have a structural problem. Not a personal one. A systemic one.

The central bottleneck in science may not be intelligence, effort, or funding. It may be search. We have built institutions highly capable of exploiting known directions. We are yet to master exploring unknown ones.

The $12,000,000 Problem

In recent conflicts, Iranian drones costing approximately $5,000 each have required THAAD interceptor missiles costing approximately $12,000,000 per shot to neutralise.

Read that again. Five thousand versus twelve million dollars.

A 2,400 to one cost asymmetry. The defender cannot win this game through optimisation. Better missiles cost more. Cheaper drones cost less. The direction that looks like improvement leads straight into a trap. This is not a failure of engineering. It is a failure of search strategy.

The emerging solution is laser-based directed energy defence. Cost per intercept: less than $1. A 12,000,000× cost reduction achieved not by improving the existing missile but by returning to first principles. Someone stopped climbing the missile hill long enough to ask a genuinely different question. They found a different mountain entirely.

This is called a phase transition when a system under pressure reorganises into a qualitatively different state. Not better but structurally different. Water doesn't become "more liquid" as temperature drops. It becomes ice. The THAAD-to-laser shift is the same kind of event: same problem, completely different solution space.^[1]

Incumbents optimise the known. Constrained outsiders search for something else. When the landscape is deceptive then the most direct path leads away from the real solution; the searcher beats the optimiser every time.

This pattern now appears in science.

One underappreciated feature of constrained environments is that they change where systems search, not just how much they produce, redirecting toward solution space that abundance leaves unexplored.

Phase transitions are described by bifurcation theory. A simple example:

V(x, c) = x⁴ + c·x² When c is positive: one minimum - the missile.
When c crosses zero: bifurcation point — the landscape reorganises.
When c is negative: two minima emerge. The second one is the laser.

The constraint acts as the control parameter c. It changes the shape of the solution space, revealing minima that were always mathematically present but invisible to anyone climbing the first hill. The same mathematics describes phase transitions in magnetism, protein folding, and the emergence of collective behaviour in biological swarms.

Cost Per Intercept Log₁₀ Scale

Laser defence : <$1 Iranian drone : $5,000 THAAD interceptor : $12,000,000

Drone : THAAD — 1 : 2,400 · THAAD → Laser — 12,000,000× cheaper · Y-axis: log₁₀ scale — each gridline is 10× the previous.

II.

The Same Thing Just Happened in AI

In January 2025, DeepSeek published a model matching GPT-4 on most benchmarks. Training cost: $6 million. GPT-4's estimated cost: $80–100 million. GPU hours used: 2.78 million versus approximately 60 million, almost 21× difference.^[2]

The reason was not superior talent. It was a constraint. The United States government had banned export of NVIDIA's most powerful chips to China. DeepSeek couldn't buy the weapon. So they had to search differently.

What they found: a way to run only a fraction of the model's parameters for each calculation, a lower-precision arithmetic method nobody had validated at this scale, and a new algorithm that hid communication delays behind computation. None of these existed as standard practice before the constraint forced the search.

The export ban was not an obstacle that slowed progress. It was a reorganising pressure that revealed a different solution space that well-funded labs had no incentive to explore.

Same problem. Different mountain.

What made this possible was not more effort but a change in the shape of the search. The constraint didn't reduce the solution space, it revealed a different region of it.

The pattern holds across domains: war, technology, medicine, agriculture. The question is whether it holds for science itself, and whether we have the institutional honesty to look.

The Aravind Eye Care System in India couldn't afford Western hospital models so they invented assembly-line cataract surgery at 1/10th the cost with equivalent outcomes, performing over 300,000 surgeries annually. M-Pesa in Kenya had no banking infrastructure so they built mobile money from scratch, now used by 53 million people globally. Yuan Longping developed hybrid rice in China under severe resource constraints; feeding hundreds of millions without increasing inputs.

The common structure: constraint closes the obvious path, forcing search into territory that abundance never visits. The solution found there is often not just cheaper but it is structurally superior.

III.

Science Is Climbing the Wrong Hill

Modern science has more researchers, more data, more instruments, and more computational power than any previous generation. By many measures it is thriving. We are producing more activity, but not always proportionally more discovery particularly in fields where progress is hard to measure and metrics are the primary signal.

Yet the same pattern optimising the known while neglecting the possible may now be operating inside science itself.

The distinction that matters is between exploitation and exploration. Exploitation: refining what works, scaling proven tools, extending known theories. Exploration: searching uncertain territory, combining distant ideas, following anomalies into unfamiliar terrain. Every adaptive system needs both. A system that only exploits becomes efficient at climbing the nearest hill while missing taller mountains elsewhere.

Bloom, Jones, Van Reenen, and Webb (2020) put a number on this.^[3] Research productivity : the number of researchers required to produce a given rate of improvement has doubled approximately every 13 years since the 1930s across multiple fields. Moore's Law required 18 researchers in 1971 to sustain its pace. By 2014, 18 times more were needed. Michael Nielsen and Patrick Collison have argued independently that science is delivering less per unit input than in earlier eras. A 2023 analysis of 45 million papers found that academic work is becoming less disruptive over time, more research building on existing trajectories rather than redirecting them.^[4]

These trends converge from independent directions such as economics, bibliometrics, and grant studies; and they point at the same structural diagnosis. In fields where measurable outputs dominate evaluation, this is precisely what an over-exploitative system produces.

Research Productivity Decline — Bloom et al. (2020)

Year 2020

By 2020: ~18× more researchers needed to sustain the same rate of progress as 1930.

Part of the problem is what gets counted. Papers filed, citations accumulated, grants won, benchmarks moved; useful signals, but not identical to discovery. When proxies become targets, our behaviour adapts around them. Institutions rarely ban originality. They simply make caution cheaper. Grant committees and peer reviewers consistently show caution toward novel or interdisciplinary proposals, not from bad faith, but because a genuinely strange idea is genuinely hard to evaluate before it succeeds. Studies of grant allocation have found that highly novel proposals are systematically scored lower in initial review, even when they are later judged more important.^[3]

Here is how the trap forms in practice. A young researcher chooses between two projects. The first extends established work: familiar reviewers, predictable timeline, low career risk. The second crosses disciplines, may fail entirely, and earns little credit from current metrics even when it works. Even in institutions that publicly celebrate innovation, the first project is often the prudent choice. Across thousands of such decisions, prudence compounds into conservatism.

I have watched talented researchers narrow their curiosity not because they lacked imagination, but because their imagination has a career cost.

The failure modes are specific. Working on brain-inspired AI, I spent approximately 250 hours repeating experiments that others had almost certainly already abandoned invisibly, because there is no shared log of dead ends. Negative results stay in private notebooks. Interdisciplinary work falls between funding categories. The most interesting questions get deferred because no review panel owns them.

I am not unique in this.

Henri Poincaré, one of the greatest mathematicians of the nineteenth century spent weeks attacking a problem in Fuchsian functions without progress. Focused work and daily effort. Nothing materialized. Then, stepping onto a bus during a geology excursion, thinking about nothing in particular, the connection appeared complete and certain. He had no time to verify it. He simply knew. The insight had not come from the desk. It came from the transit between ideas from geology readings, the unfamiliar landscape, the mind finally released from its own urgency.

The focused search failed both of us. The wandering mind delivered.

This is not a funding problem. It is a search problem.

In reinforcement learning and decision theory, this tension has a formal name: the exploration-exploitation trade-off. The Gittins index theorem (Gittins, 1979) proves that the optimal policy for choosing between known and unknown options requires a systematic exploration reward for trying what hasn't been tried, proportional to uncertainty.^[12] Pure exploitation of the best-known option is mathematically guaranteed to underperform over time.

Science funding behaves like a system with the exploration bonus removed. Grants reward proximity to known objectives. Reviews reward legibility. Careers reward measurable output. The mathematics says this will converge to a local maximum.

Wuchty, Jones, and Uzzi (2007) analysed millions of scientific papers and found that while team science has grown dominant, solo and small-team work continues to produce disproportionately disruptive findings.^[13]

IV.

The Brain Already Solved This

The brain is the only known system that has solved continuous learning, generalisation, and adaptation simultaneously on 20 watts, across a human lifetime.

It does this through two systems, not one.

The focused system handles execution in a precise, targeted and fast manner. Running the experiment. Writing the paper. Climbing the known hill.

The second system is stranger. Neuroscientists call it the Default Mode Network. It activates when you are doing nothing — resting, walking, staring out a window. It consumes approximately 20% of total brain energy at rest.^[5] The brain pays that price because this system performs a function the focused one cannot: connecting things that have no obvious connection. It is where the worm biology meets the phase transition. Where the geology excursion meets the mathematics problem.

Jung-Beeman et al. (2004) captured the moment of insight in real time.^[6] The "aha moment" is preceded by the brain actively suppressing its focused system so the other can work.

This is not a metaphor.

It is the actual mechanism that biological intelligence runs on. Five hundred million years of evolution decided both modes were worth the cost. The wandering mind is not a distraction from the work. It is half the work.

What the neuroscience suggests and it is a suggestion, is that the cognitive infrastructure for discovery may not be the same as the cognitive infrastructure for productivity. We cannot run a controlled trial on Poincaré's bus journey. But we can observe that the brain pays a significant metabolic price to keep the exploration system running, and ask whether our institutions pay any equivalent cost at all. Modern research runs almost entirely on the focused system. Deadlines. Deliverables. Quarterly reports. Visible output. We have formalised execution down to the hour. We have barely formalised discovery at all.

A civilisation should not rely on evenings, showers, and chance walks to host part of its search function.

The Default Mode Network was initially dismissed as neural noise. Raichle et al. (2001) established its existence and baseline metabolic significance.^[5] Subsequent research linked it to creative cognition, future simulation, social reasoning, and autobiographical memory consolidation.

The practical implication is uncomfortable: the cognitive state most associated with insight and novel connection-making is precisely the state that modern institutional design treats as unproductive. Protected thinking time, unstructured conversation, and cross-disciplinary wandering are not inefficiencies to be eliminated. They are the substrate of discovery.

The Algorithm Science Is Missing

The brain's solution points toward an algorithmic one. Kenneth Stanley and Joel Lehman kept hitting the same wall building robot controllers: the more precisely they defined the goal, the worse their search algorithms performed.^[7,8] The algorithm climbed confidently and got permanently stuck.

So they removed the objective entirely.

Instead of rewarding progress toward a goal, they rewarded only one thing: being different from everything explored before. They called it novelty search. Each candidate is scored by how far it is from an ever-growing archive of everything the search has already visited. The most novel candidate wins. Always.

The result was counterintuitive and reproducible.

On deceptive landscapes where the direct path toward the goal leads away from it ignoring the objective outperforms chasing it.

The archive is what makes this work. Without it, the search is a random walk. With it, the search maps the unknown systematically, drawn toward unexplored territory the way water finds the lowest ground. The archive is the shared memory of every dead end, every abandoned direction, every result too strange to publish.

Stepping stones don't look like progress. Vacuum tubes don't look like computers. Cowpox doesn't look like a smallpox vaccine. A geology excursion doesn't look like a mathematics breakthrough. On deceptive landscapes, the gradient is the enemy.

The simulation below runs both strategies on the same landscape simultaneously. Watch what happens.

Novelty Search vs Objective Search — Live Simulation

Objective agent Novelty agent Archive (visited) ★ Goal ▲ False peak Barrier

Speed 4

Step: 0

Objective Search follows gradient toward goal

Novelty Search maximises distance from archive

Objective Search

current score—

trapped at false peakNo

goal reachedNo

Novelty Search

archive size0

phase transitions0

goal reachedNo

Ready. Press RUN. The objective agent climbs toward the false peak — fast progress, then stuck. The novelty agent builds its archive and explores where it has never been.

The shared record of dead ends and abandoned paths does not exist in science. Negative results stay in private notebooks. Cross-disciplinary work falls between journal scopes. We are navigating without a map, toward destinations that the straightest path will never reach.

The question is whether this algorithm can be applied not to robots but to research institutions.

The novelty score of a candidate solution x is:

ρ(x) = (1/k) Σᵢ dist(x, μᵢ) Where μᵢ are the k nearest neighbours in the archive and dist is behavioural distance: how differently the solutions actually behave, not how different their parameters are.

Stanley and Lehman's key finding: this strategy not only avoids local maxima and finds the global solution faster than direct objective-based search, precisely because it does not commit to a direction prematurely.^[7,8]

Applied to science: the archive is the shared failure log. The novelty scorer asks "has anyone tried this before?" The exploration bonus is protected time. The behavioural distance is cross-disciplinary difference. Each component already exists in some form. What doesn't exist is the system that connects them.

VI.

The Experiment

Theory without a test is just a preference. Here is what to actually build.

Select 50 early-career researchers across neuroscience, physics, machine learning, materials science, and adjacent fields. Early-career matters: they have not yet calcified around a single paradigm. Randomly assign to two cohorts of 25.

Control cohort: Standard grant expectations milestones, quarterly reports, specialist evaluation, conventional outputs.

Search cohort: Three structural modifications.

Protected exploration time : 40% of funded hours. No deliverables. No hypothesis required. The only requirement: when something clicks, log the cognitive state that produced it. Not the idea rather the state. Were you reading? Walking? Connecting unrelated fields? This normalises wandering as legitimate work and makes the exploration fingerprint visible for measurement.

Shared failure intelligence. Every experiment successful or failed should be logged in a structured, AI-queryable database. Hypothesis, method, result, what it rules out, what to try next. My 250 hours of repeated failed experiments become visible. The next researcher finds the log before repeating the same work. Poincaré's geology excursion gets recorded, not lost.

Monthly cross-disciplinary salons with novelty scoring. Each researcher presents their most surprising anomaly; not their best result, but the result that doesn't fit. An LLM-assisted novelty scorer surfaces the highest-novelty combinations for structured discussion.

The Search Cohort - Four Structural Modifications

Bell Labs combined deep specialisation with deliberate proximity of mathematicians, physicists, chemists, and engineers sharing corridors and problems. The transistor, information theory, the laser & Unix all emerged from that collision.^[9] DARPA has funded high-variance programs since 1958 with explicit tolerance for failure: A program managers given a mission, three to five years, no publication requirements.^[10] Xerox PARC generated the graphical user interface, ethernet, and object-oriented programming before conventional metrics could have justified them.^[11] Breakthroughs are not only products of brilliant individuals. They are also products of environments that preserve room for search.

Where AI changes the equation

Bell Labs needed physical corridors because proximity was the only way to create accidental collisions between distant ideas. That constraint no longer holds.

Large language models can now read across the entire published neuroscience papers, physics preprints, materials science datasets, failed engineering reports and could surface non-obvious connections that no individual researcher could find by browsing. An LLM does not replace the wandering mind. It extends its reach. Where Poincaré needed a geology excursion to stumble into the right analogy, a researcher today can query an AI-assisted archive and ask what has been tried in adjacent fields that nobody in their own field has noticed.

The archive, made queryable by AI, becomes a systematic instrument for cross-domain serendipity at scale, available to everyone, running continuously. This is not AI doing science. It is AI doing what the Default Mode Network does: holding the entire space of known ideas simultaneously and noticing what doesn't yet connect.

Measure over 24 months. How many researchers produce genuinely novel framings? It should be assessed by reviewers blind to cohort assignment. How often does the failure database prevent repeated work. Whether breakthroughs trace back through the exploration logs. Plus standard outputs: papers, collaborations, follow-on grants.

For this to scale: the archive must be a public good — open infrastructure, closer in spirit to arXiv than a proprietary database. Novelty scoring must be field-sensitive. Evaluation must reward process, not just output. If search cohort researchers are ultimately measured by the same benchmarks as the control cohort, the modification collapses from within.

What could make it fail: researchers may not adopt the logging format; the failure database may be too heterogeneous; the salons may calcify into performance rather than collision; the exploration time may produce anxiety rather than insight. These are real failure modes. Each one tells us something important about what the actual solution needs to look like. Even a null result would be useful as it would show that protected exploration time matters less than many assume, and that reform efforts should look elsewhere.

That is how science should study science.

If no gains appear after 24 months, learn and stop. If gains appear, scale what worked. Publish everything including the failures.

VII.

Three Objections Worth Taking Seriously

Most speculative ideas fail. Correct. And a related objection: perhaps science is simply harder now, remaining unsolved problems are genuinely more complex than anything previous generations faced, and slower progress reflects difficulty, not search failure. Both points may be true. But if remaining problems are harder, that strengthens the case for exploration rather than weakening it. Optimising within existing paradigms is least likely to work precisely when the problem space is most unfamiliar. Exploration should be evaluated as a portfolio, not by the success rate of any single attempt. The question is not whether individual bets fail — they will, but whether the portfolio generates returns that narrow search cannot.

Incremental work is essential. Absolutely. The problem is not exploitation. It is monoculture: when an entire field converges on the same hill, the same methods, the same metrics, with nobody assigned to look elsewhere.

Exploration can be gamed. A system that rewards novelty without discipline risks producing noise rather than insight. The challenge is not simply increasing exploration but structuring it so that signal survives, which is exactly what the archive does. Every visited location is recorded. Distance from prior search is measured precisely. You cannot fake distance from everything already tried.

Modern science has built powerful machinery for productivity. It can specialise, refine, benchmark, and optimise with extraordinary competence.

What it has built less deliberately is machinery for discovery.

I found the insight that drives my research in a daydream. Poincaré found his on a bus. The settings were different. The cognitive state was identical: the focused mind released, the wandering mind finally free to connect what the desk could not.

Neither of us was being productive by any measurable standard.

But both of us were doing the most important work of the day.

Science has become remarkably good at climbing hills it already knows exist. But the hardest discoveries are often not on those hills at all. They lie in regions we have not yet learned how to search.

The next advance may come not from climbing faster, but from learning how to look differently.

References

Strogatz, S.H. (2003). Sync: The Emerging Science of Spontaneous Order. Hyperion.
DeepSeek-AI (2024). DeepSeek-V3 Technical Report. arXiv:2412.19437. arxiv.org/abs/2412.19437
Bloom, N., Jones, C.I., Van Reenen, J. & Webb, M. (2020). Are ideas getting harder to find? American Economic Review, 110(4), 1104–1144. doi:10.1257/aer.20180338
Park, M., Leahey, E. & Funk, R.J. (2023). Papers and patents are becoming less disruptive over time. Nature, 613(7942), 138–144. doi:10.1038/s41586-022-05543-x
Raichle, M.E. et al. (2001). A default mode of brain function. PNAS, 98(2), 676–682. doi:10.1073/pnas.98.2.676
Jung-Beeman, M. et al. (2004). Neural activity when people solve verbal problems with insight. PLOS Biology, 2(4), e97. doi:10.1371/journal.pbio.0020097
Stanley, K.O. & Lehman, J. (2015). Why Greatness Cannot Be Planned. Springer.
Stanley, K.O. & Lehman, J. (2011). Abandoning objectives: Evolution through novelty alone. Evolutionary Computation, 19(2), 189–223. doi:10.1162/EVCO_a_00025
Gertner, J. (2012). The Idea Factory: Bell Labs and the Great Age of American Innovation. Penguin Press.
Weinberger, S. (2017). The Imagineers of War. Knopf.
Markoff, J. (2005). What the Dormouse Said. Viking.
Gittins, J.C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B, 41(2), 148–164.
Wuchty, S., Jones, B.F. & Uzzi, B. (2007). The increasing dominance of teams in production of knowledge. Science, 316(5827), 1036–1039. doi:10.1126/science.1136099

Why Science Gets StuckModern research may be optimizing the knownwhile neglecting the possible