T O P

  • By -

NoCard1571

This is making an assumption that in place of scaling a single model to hundreds of trillions, or quadrillions of parameters you can just stack millions of smaller models, and I don't think there's really any evidence that would work.


siovene

Isn't this what we do in our human society? Accomplish tasks working together that no single individual could accomplish by themselves?


Cryptizard

Do you think a million preschoolers can work together and come up with general relativity? It doesn’t scale that way.


WG696

The answer to your question is not obvious to me at all.


HalfSecondWoe

I think it would depend on the organization involved. If they could be organized in such a way that tasks could be broken down and distributed, there's a shot You'd also have to account for their lower bandwidth communication. Both in error correction mechanisms and the time they take to process I doubt you could get them to self-organize, but with an intelligent guiding hand I bet the odds are good


Unfocusedbrain

That's the thing with biological entities - if any one of those children just happens to have the potential of Gauss, Terrence Tao, or Ramanujan its probable they solve the question. That's the benefit of evolutionary algorithms, every once and a while they spit out something extraordinary.


PerfectEmployer4995

But this isn’t pre-schoolers tacking physics. It’s PH-D level LLMs tackling concepts that are relatively complex, but within their grasp.


Cryptizard

Where are these PhD level LLMs, let alone at the million parameter level?


Rofel_Wodring

Depends on provisions you make for education, task distribution and oversight, and most tellingly direct mind augmentation. Things which may not apply to preschoolers, but would to AI. Yours in an epistemological standpoint which simply has no room for controlled recursion or even the simple progression of time. And like most people, you see the past, present, and future as mostly indistinguishable. Meaning: unduly weighted towards the past, hence why you thought your analogy was compelling.


Cryptizard

What does "direct mind augmentation" have to do with what OP was talking about? You seem to be countering an argument I didn't make.


Rofel_Wodring

It's how the analogy is inapplicable. Because while directly upgrading the minds of the preschoolers is tricky (but possible; neurofeedback, nootropics, etc.) not much the case with software-basedAI/LLMs.


Cryptizard

Ok in that case I understand your comment now but it is completely vacuous. You basically just accuse me of not being imaginative enough. I think, in this case, you just have no idea wtf you are talking about. It is completely possible for AI to work recursively, I never said it wasn't. But we know from decades of computer science research that you need a minimum level of reliability in your sub-components in order to bootstrap a larger system that achieves manageable levels of error. What OP suggests is likely not possible precisely because of the situation I described. Error rate is known to scale with parameters and LLMs with only millions of parameters are not ever going to achieve that rate. Moreover, there is another missing piece to this puzzle. We would need to show a version of a [threshold theorem](https://en.wikipedia.org/wiki/Threshold_theorem) for LLMs, which no one ever has and we don't know if it is even possible. Now for the third problem, we can't even begin to show a threshold theorem at the moment because there is no way to check efficiently whether the output of an LLM is an error or not and begin to apply error correction. So maybe it would be possible, but it also might fundamentally not be. It is not one of those things where oh look if we just work hard enough and put enough money into it *we can make it happen.* Sometimes there are hard limits on what is possible.


Rofel_Wodring

Humping present exigencies REALLY, REALLY thoroughly, and then insisting that any road to the outcome must tackle those exigencies is a kind of tunnel vision that completely misses the broader point. Don't feel too bad, AI does this, too. Even when you get LLMs to acknowledge that time can change the impossibility of certain premises, they conduct their analysis as though the premises will indefinitely hold anyway. Even though history, especially technological history, has shown so many exceptions that if your predictions of the impossibility of the outcome rely on there being a specific implementation you are going to be proven wrong sooner or later. In other words, you could have avoided typing that holistically pointless analysis by agreeing with me that you, like most humans, do not understand the broader progression of time on reality, but insist on emphasizing the present extra-thoroughly anyway.


Cryptizard

You, like most humans, do not understand computer science.


SyntaxDissonance4

How is human society akin to the human brain? And how is a human neuron akin to an llm? Its just a bunch of false ewuivalencies. Although yeh lets do it and see ehst happens


Dayder111

That's likely how our brain works, having many "modules", albeit they are likely very highly interconnected and affect each other (with some limitations, mostly for the organisms own safety). And they all work in parallel, "discussing" intermediate results occasionally, which you may perceive as thoughts that are not even fully formed in words, zooming around your mind. And occasionally, something "clicks" and you get the idea/solution, which seems workable for you. Not sure about being able to fit much into 500 million parameter model parts, though, unless the current paradigm changes and they will be very interconnected too, compensating each other's individual weaknesses and lack of understanding of specific tasks?


Fringolicious

I can see it, instead of thinking about a model that has to know every domain, think about many models that have smaller domains. Even for something like coding, you could have multiple models that specialize in different languages etc. On their own they will of course be bad outside their knowledge domain but link them all together and you might get a network of highly specialized models that are all experts in their little silo, queried as necessary. You'd just need some sort of orchestrator to know which specialized model to rely on. I don't think that's too big of a leap and might be the way forward to solve the scalability problems


Ne_Nel

You are describing how the brain works.


Fringolicious

Yep. Sounds like it might work well for AI as well, does it not?


Ne_Nel

I have spent the last month studying that with information from experts +Claude, Gemini, GPT4 as critics, with very concrete conclusions. But it burns brains. The kind you're not sure you want others to know about.


Fringolicious

What do you mean it burns brains? Kinda lost me there, not gonna lie


Ne_Nel

The thing is that analyzing some theories involves touching very uncomfortable conceptions of humans and their nature. Particularly when these ideas are considerably plausible. Sometimes I wonder why I want to get into those deep conflicts, but then I remember that I'm just trying to know how plausible it is to create intelligence like ours.


Cryptizard

I don’t think the analogy works. Logic gates are minimal units of computation that give guaranteed correct results. When you string together many LLMs like a circuit, that each have some probability to give a wrong answer, it amplifies the probability that the output will be wrong to near 1. You would need to show some error-correction threshold like we have for existing circuits and quantum computers. But that requires some way of detecting the error which is fundamentally difficult in the world of LLMs because we don’t know what the correct output is supposed to look like for novel queries. It’s ironic that you can’t even correct the output of the AI in this case to realize that what it is saying doesn’t work.


LyPreto

agreed that LLMs by themselves lack error - correction but thats where a knowledge graph of current task/approach/solution could be leveraged, think of back propagation/gradient descent basically creating feedback loop for the network.


Cryptizard

That all requires ground truth, which you only have during training not evaluation.


tomqmasters

You lost me at the logic gates part.


LyPreto

llms are decision engines, much like logic gates choosing 0 or 1 llms make more complex choices


sdmat

This is like asking if we can shrink the dimensions of jet engines down a thousand fold could we make magic carpets. No, because it's an incoherent premise. We will never have models that work like 3.5 of 4o with a thousandfold fewer parameters, because it is impossible for such a model to contain anywhere near the same knowledge.


Noetic_Zografos

Um yes? No? Maybe? It is the output that will ultimately tell us if something should be considered an AGI. So if this hypothetical system gave us wonderful results, then sure.


hapliniste

Mostly yes but we have some things to keep in mind. What representation would these models pass to eachother, what would be their context windows,... If you see it as "agents" communicating with messages I'm not sure how it would be orchestrated at a higher level. Maybe a better approach would be to have a big model but train some parts independently (have input-outout pairs and training loop on a subset of the weights) so that we can train fast and still keep the global context of the full model. I think we'll soon switch to "pool of experts" instead of mixture of experts models, like having a router choose "experts / submodels" in a global pool instead of choosing from the experts available at a single level of the model like we currently do. With such architecture, we could make the cost of running an expert higher to make the model run fewer of them from input to output, so the training gradients would only flow through a small subset of experts, allowing us to train faster. It kind of meet your idea but I feel like there's a path to my idea while having hundreds of separate models and orchestrate that would be hard and I don't really see a clear path to achieve that.


DukkyDrake

The way a large corporation like MS, Google or even human civilization is a kind of weak superintelligence that is vastly more capable than any individual like John von Neumann. Assuming scaling alone continues to fail to produce a truly cognitive agent: The current trajectory of technological development continues to align with my preferred vision of the future. I anticipate that attempts to create standalone cognitive agents from GPT-5+ will fail to exhibit general intelligence. We've already starting to build the beginnings of a distributed AI service mesh that integrates various specialized narrow AI tools, such as GPT-5, as API accessed components. This aggregated networked service fabric will be diverse enough in its supported AI services to one day be regarded as an artificial general intelligence (AGI). I think the future is more likely than not to resemble this vision: >[Reframing Superintelligence: Comprehensive AI Services as General Intelligence](https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf)


Nukemouse

That sounds like it might be the single least power efficient AGI concept I've ever heard of.


Akimbo333

Implications?