read

Estimated Reading Time: 2 minutes

Every Neural network is, at its core, a mathematical function. Billions of weights and biases, floating point numbers, arranged in layers, connected through carefully designed architectures. The transformer blocks, attention heads, residual connections, and activation functions provide the scaffolding. But the architecture alone is hollow. A transformer without trained weights is basically a very expensive random number generator with a PhD.

It is the specific numerical values of those weights, shaped through gradient descent across vast oceans of data, that breathe capability into the structure.

Here is the thought that is interesting to me and may be a victim of my overthinking 😄 Given I have a startup, there are already many such thoughts queuing patiently, somewhere in high dimensional weight space, there already exists a precise configuration of numbers that crosses the threshold into general intelligence.

Not narrow task performance. Not pattern matching dressed up as reasoning, wearing a trench coat. But genuine abstraction, causal understanding, and generalisation across any domain (General Intelligence).

These weights are not waiting to be invented. They already exist as a mathematical possibility, the same way a prime number exists, whether or not anyone has discovered it yet. We are not building AGI. We are on an extraordinarily expensive treasure hunt, and the map is written in calculus.

The right architecture creates the space in which that configuration can live. The right training signal points the optimiser toward it. And sufficient computing power, approximately the GDP of a small nation, gives us the steps to get close.

The reason we are not there yet is not a lack of imagination. It is the sheer scale of the search problem. The weight space of a modern large language model is so vast that our best optimisers, Adam, AdaFactor, and their descendants are effectively blind hikers, using gradient signals as a compass in a landscape of incomprehensible dimensionality. Every training run is a search expedition. Every architectural innovation narrows the territory we need to explore. Every failed run is just a very expensive “nope, not that way.”

This reframes how we should think about progress toward AGI. It is not purely an engineering problem, a matter of stacking more layers, hiring more researchers, or willing it into existence through sheer Silicon Valley optimism. It is fundamentally a search problem. And how well we search depends on three things working in concert:

  1. Expressiveness → Is the architecture capable of representing the right weight configuration at all? (Are we even looking in the right room?)
  2. Signal quality → Does the training objective create a loss landscape that leads the optimiser toward intelligence, not just toward confidently wrong answers?
  3. Search efficiency → Can we navigate weight space intelligently enough, fast enough, to find what we are looking for before the next funding round?

We are making progress on all three. Slowly. Expensively. With occasional moments of genuine brilliance sandwiched between a lot of “why is the loss spiking again?”

But here is the beautiful part, the coefficients for AGI are not locked behind some door we need to build. They exist. They are sitting quietly somewhere in mathematical space, completely unbothered, waiting for our optimisers to stumble across them. So Quantum in nature, you touch them and know what exactly those coefficients are.

The question is not whether AGI can exist. Mathematically, it already does.

The question is whether we are searching in the right neighbourhood and whether we can afford the compute bill when we find it.

Blog Logo

Manpreet & Renaira


Published

Image

The Tokens

Bonding though words... Manpreet & Renaira

Back to Overview