Distance and Direction to Transformative AI
Some commentators on AI use the speed and acceleration of recent AI progress to estimate when we’ll get to AGI. In The Adolescence of Technology Dario Amodei writes,
Watching the last 5 years of progress from within Anthropic, and looking at how even the next few months of models are shaping up, I can feel the pace of progress, and the clock ticking down.
Matt Shumer uses similar language in the essay that sparked the recent mass freakout about AI:
…in 2025, new techniques for building these models unlocked a much faster pace of progress. And then it got even faster. And then faster again.
He then quotes Amodei’s prediction that “AI will eliminate 50% of entry-level white-collar jobs within one to five years.”
The rate of progress has been amazing. But you can’t calculate the time to any milestone based on velocity and acceleration without also knowing the distance to the destination. Here’s one way to think about adding a distance variable to our “arrival time” calculation: how different do the capabilities of a future system have to be from our current capabilities in order to hit a given milestone? Or, equivalently (if we believe that scaling the big blob of compute is sufficient to reach any set of capabilities), how much do we have to scale compute to get there?
For the milestones we really care about (100% white collar job replacement, immediate danger of AI takeover), current benchmarks can only get you so far. Benchmarks can measure progress toward human-level performance on individual tasks like coding a bug fix (SWE-bench) or even designing an ML architecture (RE-bench Restricted Architecture MLM environment). But for unprecedented milestones like 100% job replacement, how do you verify that your basket of benchmarked tasks covers all the required capabilities? As economist Luis Garicano says, “many white collar jobs are Messy jobs…automating the automatable tasks within them is not near to automating the job.” To the extent that it’s difficult to Taylorize a job, it’s difficult to ensure that you have the right benchmarks.
One approach to estimating capability distance is to estimate when we’ll bring to bear an amount of computation equivalent to that of the human brain. This is the “bio-anchors” approach pioneered by Ajeya Cotra, incorporated into a task-based CES economic model by Tom Davidson, and recently revisited by Scott Alexander. It cuts through the problem of enumerating capabilities. If we have an example of a system that can do a complex white collar job and a way of comparing the power of AIs to the power of that system, we don’t need to know everything that goes into the job to estimate how far AI is from competence at the whole job.
Cotra’s original bio-anchors report gives a median estimate of 2050 for the arrival of “transformative AI,” where her definition of transformative AI boils down to something like 100% white-collar job replacement. Alexander states that this estimate “no longer seems plausible.” To explain this discrepancy between Cotra and the current consensus among the AI-pilled, he refers to John Croxton’s 2025 report updating Davidson’s model for the current environment as of a year ago. Croxton estimates arrival at transformative AI by 2030.
The biggest factor in the difference between Croxton’s timeline and Cotra’s is algorithmic progress. Croxton uses Epoch AI data to conclude that available effective compute (measured in effective FLOPs, or eFLOPS) is growing more quickly than either Cotra or Davidson thought it would. Effective compute has two components:
Real compute: actual hardware.
Algorithmic advances: AI R&D that lets us get more out of every FLOP of real compute.
Cotra and Davidson use the same annual growth estimates for both real compute and algorithmic progress. Epoch AI’s data estimates that real compute is growing at 2x the rate Cotra and Davidson used, and algorithmic improvement is growing at more than 2x.
The question of whether recent compute buildout rates can be sustained deserves its own treatment. But the rate of algorithmic improvement has a direct impact on the topic of how we estimate distance in capability space. So how do these authors measure algorithmic progress?
Both Cotra and Epoch AI use performance on specific ML tasks to estimate the rate of algorithmic improvement. Cotra basically relies on one paper that uses AlexNet as its reference point. Epoch AI’s Capabilities Index combines 37 different benchmarks in a principled way to yield a single number representing current AI capabilities. They then compare this number’s rate of growth to the rate at which real compute is growing. The more capability growth exceeds hardware growth, the more algorithmic improvements are contributing to progress.
Epoch AI’s methodology offers a big improvement over Cotra’s initial stab. But in both cases we’ve moved beyond the simplifying power of a pure bio-anchor. The dream was that instead of enumerating individual strands of capabilities we could cut the Gordian knot by trusting scaling laws and figuring out how much we have to scale to get to brain equivalence. But now we’re back to sorting through the individual strands. We can’t just ignore algorithmic progress. But can we really just plug the Epoch AI algorithmic progress estimate straight into a formula to determine the arrival time of transformative AI?
Imagine a simplified universe in which the goal of automating AI research could be mapped into a capability space with three dimensions: maybe coding ability, long-running task completion, and research taste (skill at picking goals and experiments for scientific research). If we devoted all our research to getting as far as we could in the first two dimensions but didn’t make any progress in the third, would we be getting closer to our goal? Yes. But would we have reached our goal? This question is different from the previous one.
The moral of this story is that, in a multidimensional capability space, to calculate arrival time we need to know not only speed, acceleration, and distance as the optimal robot crow flies, but also the direction of travel. If we take the benchmarks in the Capability Index as vectors that add up to a direction, they probably measure progress in the general direction of transformative AI, but not in exactly the right direction. So we should take Croxton’s 2030 date as a useful lower bound that we almost certainly won’t hit.
This is not an argument that benchmarks are bad or that we can’t know anything, just that specific capabilities aren’t as fungible as effective compute.1 To increase the accuracy of timelines we might have to dramatically enlarge the basket of benchmarks.
People have come up with some pretty clever benchmarks. Clever people should come up with benchmarks that try to measure more of the skills used in messy professional jobs. For example:
Social interaction and persuasion in group settings: Sure AI can induce psychotic beliefs when it gets you alone. But how does it do in committee meetings?
Conceiving or picking long-term projects: Research taste is one example of this, but all professions have their own version of this skill.
Knowing when and how to fish and when to cut bait on a project that isn’t immediately paying off: For example, check out the Wired account of Noam Shazeer joining the Transformers team. The team was stalled on their goal of using Transformers to outperform the best alternatives to LSTMs. Shazeer saw the importance of their effort and jumped in to provide both motivation and magic (i.e., messy skills learned through hard won experience) to help them get over that hump. Coding agents do use some version of the skill of knowing when to push on an approach, but there’s a qualitative difference between pursuing a strategy on a coding problem for which in theory the agent has all the information it needs to choose a good strategy, and pursuing a strategy for pursuing empirical research where you can’t know in advance what will pay off in the real world, even in theory.
Any of the dimensions of the Cattell-Horn-Carroll model of human cognition that Hendrycks et al flag as weak spots for AI, including:
Long-term memory: This is closely related to Dwarkesh Patel’s famous “continual learning” hobby horse.
Speed: For example, “After reading this, immediately say ‘hello’.” Claude took about five seconds on this task. I took 2.6 seconds to read it out loud and about a third of a second to say “hello.” So, discounting the reading time, I’m about 8 times faster than Claude. AI delay is what makes the Anthropic TV commercials work: it’s why you immediately know that a human actor is playing the role of an AI.
If we really are trying to estimate when AI can replace all professional humans, benchmarking capabilities like these and including them in the Capabilities Index would give us a better estimate of effective algorithmic progress – progress specifically in the direction of that endpoint.
We shouldn’t be surprised to see a lot of 1 to 5 year timelines for transformative AI. This is a classic “end is nigh” timeframe, and the zeitgeist seems to be shifting from “everything will be normal forever” to apocalypse mode. But it would be helpful to have more than these two gears as we think about the road ahead.
Some algorithmic progress does translate directly into fungible eFLOPs. Research into techniques that improve the efficiency with which a model uses compute (mixture of experts, for example) will show up in Epoch AI’s algorithmic progress numbers no matter what benchmarks appear in the Capability index. But not all algorithmic progress is of this sort. Effort the labs put into improving coding abilities specifically, for example, pay off in a specific direction in capability space.

