T O P

  • By -

Crafty-Struggle7810

These people have been around for a while. I don’t think they have a working product, but their renders look nice nonetheless. 


_dekappatated

brb starting an ai hardware startup with AI generated pictures


Itchy-mane

I'd like to invest in you


_dekappatated

I'll give you 0.05% shares for 50 million dollars.


Itchy-mane

💦🤝 deal


_dekappatated

nioce, time to give myself a 50 million dollar salary and declare bankruptcy


LeahBrahms

I'd like to short you


norsurfit

Only AI generated pictures but no product? Meh, I'll only give you $20 million...


latamxem

You are correct this was posted in /singularity 6 months ago. Same pictures and one pager website but different story . Last headline was University students create company that produces amazing AI chips. I couldn't find the post here. So this is either someone trolling or someone building a scam.


latamxem

[https://web.archive.org/web/20231230154918/https://www.etched.com/](https://web.archive.org/web/20231230154918/https://www.etched.com/)


sdmat

They make a good case for small-medium dense models with short context lengths. It is far less convincing for large MoE models with long context lengths - a class that notably includes every SOTA model. This is because such models intrinsically require much more memory, for the model weights and more importantly for KV cache for every item in the batch. As a result the maximum possible compute intensity per GB of memory decreases drastically. And running very large models at extreme batch sizes is unattractive for latency reasons. I also wonder about the claim that the hardware supports every current model. Really? Does it support whatever attention black magic DeepMind is doing to get 2M context lengths with speed and good performance? How do they know that - did Google give them a peek at the architectural and algorithmic details?


PuzzleheadedBread620

What about training? They only mention inference.


drsimonz

If inference is 20x cheaper, edge computing applications become 20x more realistic (plus or minus, lol). Anyway, isn't half of the training process just executing the model forwards, so you have an error you can then back-propagate? Although if back-propagation isn't possible on this hardware, then maybe not.


[deleted]

backprop=derivative and update weigths. you need update this thing each epoch. its each epoch, not half time. at least usually no.


drsimonz

I think the problem is that the ASIC hardware probably doesn't have the ability to save the derivatives during forward propagation (which is what libraries like pytorch do). Sure, maybe you don't update the weights after every sample, but you still need the derivatives from each sample right?


[deleted]

sure, no derivatives no backprop. you could compute loss with multiple inferences, but for derivatives, but what you state could be right (i have absolutly no idea, i dont even want to google this).


Overflame

If 1 of this is actually equal to 20x H100s (inference, training, cost etc.) then Nvidia's stock will skydive tomorrow. If that won't happen, which it won't, then I call this BS. They didn't show us ANYTHING, only: "Trust me bro, we're 20x better than the most valuable company who invests billions of $ in R&D." It's enough that OpenAI is now on life support after they teased our balls for a year now, these new players can just fuck off if they already do it without providing any REAL value.


BigButtholeBonanza

b-but their renders are pretty


[deleted]

That’s right. Their renders are pretty so I believe everything they said


Tkins

Where the hell are you getting info like open AI is on life support? This is the most bro take.


[deleted]

People have a hard time separating Reddit from reality.


CheekyBastard55

If OpenAI doesn't release something THIS VERY SECOND I WILL CONSIDER THEM FINISHED! DONE-ZO! ZILCH! COMPLETELY OVER!


Ilovekittens345

The thing with ASIC's is that they are application specific. If tomorrow a new architecture comes out that makes the transformer architecture obsolete then an H100 would keep it's value because running the new tech is a software thing, a ASIC that can only do transformer inference loses most of it's value overnight.


drsimonz

On the other hand, I could see this locking the AI industry into the transformer architecture for much longer than it would have been otherwise. Granted, it seems to be a pretty versatile architecture, but I doubt it'll be the architecture that produces ASI. If ASICs come to pervade the market, then new architectures (even if they're objectively better) will struggle to compete on a cost basis. Developing an ASIC is extremely expensive, which is why we're only seeing this now after several years of transformers dominating the field, so ASICs probably won't be developed for new experimental architectures (at least, not until we already have AGI).


Ilovekittens345

Maybe in a couple of years it becomes clear that scaling up the transformer architecture has hit the law of diminished returns and now most companies that want to launch a consumer facing AI product start using LLM's as modules with other stuff build around it. Kind of like the OS of a kernell. In such case, I can see a well build and timely released ASIC become extremely successful But I think this is going to still take 5 years or longer. Right now almost all the big AI companies are primarily trying to get more data and offer their services for free primarily because they want to train on the interactions. Untill we know if can keep on scaling up ... I mean one of these days Google is going to train one their youtube videoes. And finding this out, finding new sources of data. That process could easily continue a decade before we have excausted it and can make a conclusion on what happens when you scale up the transformer architecture to the absolute max.


drsimonz

True, there is certainly more data out there waiting to be tapped. And probably a lot more work to do in curating that data. Maybe that will already be enough to get to AGI.


Neon9987

im curious how far labs are with synth data, numerous labs have made hints that Data shortage "can be solved with more compute", i remember reading a blog by a oai guy that said he'D rather choose more h100's than more coworkers bcs it reliably gives more synth data, faster iteration bla bla


playpoxpax

Yeah, this is 100% BS they’re trying to feed us here. Without even showing any demos. Basically just ‘trust us bro‘. Even if that company somehow doesn’t bullshit us, no one‘s gonna spend $$millions building an infrastructure that can only run inference for one particular model. What’re they gonna do when they inevitably need to finetune it (or even upgrade it)? Throw away all the old cards and start building their entire hardware stack from scratch? That’s not cost-effective at all. But I gotta disagree on the idea that Nvidia is an unshakeable monopoly here. We’ve already seen them having been kicked out of the crypto mining market by ASICs. It’s not such a stretch to imagine them being overtaken in AI by some specialized architecture. Like neuromorphic chips or something. But those are a long way off.


OwOlogy_Expert

> What’re they gonna do when they inevitably need to finetune it (or even upgrade it)? Throw away all the old cards and start building their entire hardware stack from scratch? Presumably, the old hardware will still be useful for running the old algorithms, which you may still want to do even after developing your new hotness. A) The old hardware could be relegated to simpler tasks, but tasks that still need to be done. Simply scaling up existing practical uses for existing AI models. B) The old hardware could be integrated into the new system, allowing the new AI system to shunt workload off to the old hardware when it's something the old hardware can do. Analogous to a human brain region that's specialized for one particular purpose. Or, for an electronic analogy, like a CPU sending simple math calculations to the mathematical co-processor instead of running the calculation on the CPU itself. The specialized, outdated hardware could be a kind of co-processor, taking load off the main system when you ask the system to do something that the old hardware is capable of.


Singularity-42

Also - why wouldn't NVDA sitting on $3T valuation not able to replicate this? Or just buy out these guys with some leftover pocket change in Jensen's leather jacket?


Bernard_schwartz

Big difference in showing something in a lab and scaling up to production. Especially as most of the really high tech stuff is booked for years. Interesting nonetheless.


wi_2

This is bitcoin miners asic race all over again. We will see new asics pop up everywhere for people dumb enough to fall for it


Dayder111

What they promise is very possible and can easily be true, if some professional team (like their) actually spent some time designing a chip purely for transformer inference. It won't be able to do anything else, no scientific calculations, no support for different neural network architectures, no programmability outside of very tiny extent, but will be fast and energy-efficient. Chips pay a high price in energy INefficiency and slowness to support a lot of things that people need or may need. Especially CPUs. The downside is, if such specialized chips get widely adopted, they won't be able to switch to new, better architectures if they will be discovered, and will be stuck with default, mostly unmodified transformer. Potentially hitting such companies hard, or/and slowing down the progress.


GraceToSentience

I bet they'll be successful. I also bet people are going to imitate them, mark my words.


MeMyself_And_Whateva

It needs to be cheap if I need two or three versions for the different architectures available. Having one for transformers is a necessity.


crash1556

custom chips will always be better than a generalized solution, sometimes 100,000x better chips likes this could eventually enable a robot to run its AI GPT-Whatever model locally on its person.


I_make_switch_a_roos

![gif](giphy|jrnKWduDzZHtW6fCi6|downsized)


johnjmcmillion

Man, this gives me Bitcoin-PTSD. Got burned by one of those ASIC scams back in the day.


ClearlyCylindrical

I really don't like this since if too much hardware becomes specialised it will be difficult to incorporate large architectural changes later down the line and could get us caught in a local minimum of sorts.


baes_thm

The business model of so many of these hardware startups seems to be: "transformers are an important workload, so we optimize for this. Nvidia on the other hand doesn't design chips that are optimized for transformers for [some reason]" ... meanwhile Nvidia is probably the single biggest reason we have today's AI boom, they literally created this market over the course of 15 years by building AI features. To think that you can roll up and beat Nvidia at its own game by simply prioritizing the algorithms that _Nvidia itself_ fought like hell to legitimize, is ludicrous. I'm not saying that Nvidia can't be caught, but you need a better plan than going head-to-head right now, unless you can afford to wait for them to make a mistake. They have a massive lead in this market because, again, they literally created it.


longiner

Do you think Nvidia got lucky because OpenAI launched the AI boom by chance or was Nvidia really future focused and knew AI would become big but just a matter of when?


baes_thm

Nvidia is probably the biggest reason we have this AI boom. OpenAI was important for sure, but if you look back, they trained GPT-1 on V100s, which were 10x faster than P100s because they had better matmul accelerators. In fact, Nvidia delivered the first ever DGX box to OpenAI as well. Before that, NVidia created cudnn in 2014. If you look at Jensen Huang's interviews, he likes to talk about this


longiner

But I think OpenAI was caught with their pants down by how much the world enjoyed ChatGPT that they didn't even have a plan to monetize it yet. Without ChatGPT the world would probably still treat AI as the narrow image processing niche that it was before and we wouldn't have the massive money dump into AI that we have today. It might also mean Nvidia would be making massively expensive chips for a few small players like OpenAI (before they became big) and there wasn't a roadmap for OpenAI to be profitable which means there might not have been a consumer for their chips.


Fun-Succotash-8125

How much does it cost?


[deleted]

15 bucks + tip + a lemon + insurance + a cup of tea


00davey00

So we could see a future where nvidia compute is used almost exclusively for training and compute like this for inference?


osmiumo

Etched recently raised $120m, so they’ve got some deep pockets. Nvidia also recently confirmed they’re entering the ASIC space, so they’re aware this is the direction the market is headed in. All in all, this should lead to some real competition and development.


Apprehensive-Job-448

Sohu is >10x faster and cheaper than even NVIDIA’s next-generation Blackwell (B200) GPUs. One Sohu server runs over 500,000 Llama 70B tokens per second, 20x more than an H100 server (23,000 tokens/sec), and 10x more than a B200 server (\~45,000 tokens/sec).


icehawk84

I don't think an H100 can run Llama 70B at 23k T/s, cause I tried deploying it to one and it wasn't anywhere close that fast.


CallMePyro

H100 server bro. Hiring bar.


visarga

large batch mode


Peach-555

How many tokens did you get?


icehawk84

A few hundred per sec IIRC.


Peach-555

Big gap, is it possible to run several instances of inference at the same time? Is the few hundred per second a individual instance? I don't know how much Groq claims do be able to do, but it outputs \~350 tokens per second per request.


icehawk84

Yeah, Groq was faster when I tested, so I ended up using it through their API instead of deploying it to my own servers. Multi-GPU can help with batch inference, but my use case didn't lend itself well to that.


AdorableBackground83

That’s wassup. ![gif](giphy|MO9ARnIhzxnxu)


iNstein

Consider that Bitcoin is exclusively mined using asics, why would they do that rather than the gpus that used to be used? Fact is, that for certain tasks that are highly repetitive, asics provide the best performance and cost. Asics can generally be produced much cheaper and they can outperform non dedicated architectures. I get a strong vibe here on Reddit that there are a lot of butt hurt new Nvidia investors...


pxp121kr

So are you telling me that a small company will come up with something that NVIDIA, a 3 trillion dollar company have not thought about? Being skeptical here.


Peach-555

Nividia of course knows about inference-specialized hardware They won't bother making it themselves if they have higher margins on their non-specialized A.I chips


Aymanfhad

There are companies a thousand times smaller in value than Apple that make phones with higher specifications than the iPhone and at a lower price. The company's value is not a measure.


MainStreetRoad

I would be interested in knowing about 2 of these companies....


MisterGaGa2023

By making the phone you mean "assemble from readily available parts made by multibillion corporations"? Cause you can do that at your home. And by higher specifications you mean "some parts specifications are higher" and some are cheap outdated junk, like CPUs?


Aymanfhad

Many phone companies assemble components, including Apple. There are many phones that come with the SD8 Gen 3 processor and 16GB of memory and are cheaper than the iPhone. Is the SD8 Gen 3 processor old junk? --


irbac5

O really doubt they are ahead by 2 gen


Apprehensive-Job-448

*from their website:* # How can we fit so much more FLOPS on our chip than GPUs? The NVIDIA H200 has 989 TFLOPS of FP16/BF16 compute without sparsity[^(9)](https://www.etched.com/announcing-etched#footnotes). This is state-of-the-art (more than even Google’s new Trillium chip), and the GB200 launching in 2025 has only 25% more compute (1,250 TFLOPS per die[^(10)](https://www.etched.com/announcing-etched#footnotes)). Since the vast majority of a GPU’s area is devoted to programmability, specializing on transformers lets you fit far more compute. You can prove this to yourself from first principles: It takes 10,000 transistors to build a single FP16/BF16/FP8 multiply-add circuit, the building block for all matrix math. The H100 SXM has 528 tensor cores, and each has 4 × 8 × 16 FMA circuits[^(11)](https://www.etched.com/announcing-etched#footnotes). Multiplying tells us the H100 has 2.7 billion transistors dedicated to tensor cores. **But an H100 has 80 billion transistors**[**^(12)**](https://www.etched.com/announcing-etched#footnotes)**! This means only 3.3% of the transistors on an H100 GPU are used for matrix multiplication!** This is a deliberate design decision by NVIDIA and other flexible AI chips. If you want to support all kinds of models (CNNs, LSTMs, SSMs, and others), you can’t do much better than this. By only running transformers, we can fit way more FLOPS on our chip without resorting to lower precisions or sparsity.


Philix

FLOPS are flashy marketing, but how are they massively improving memory bandwidth and interconnect speeds to feed those processors? Are they using a deterministic scheduler and SRAM like Grok? If so, it's only inference hardware and not suitable for training. If not, they could still hit the same memory/interconnect bottleneck that Nvidia does. VRAM is only manufactured by a couple companies, HBM3e is HBM3e no matter what processor it is connected to.


Educational-Net303

You're being too serious to an otherwise obvious vaporware company


replikatumbleweed

This is exactly the kick in the ass that Nvidia needs. GPUs for AI are wasteful.


Ilovekittens345

What if you build your ASIC for a specific application and then a new application comes out and your ASIC's dont work on it? While somebody with a GPU just runs new software. How is that not wastefull then? I think it's like at least 10 years to early to build ASIC's for AI. This recent breakthrough is not even a decade old ... so much is going to change.


replikatumbleweed

Running something in perpetuity on an unoptimized architecture is inherently inefficient. AI might change, but it's a pretty safe bet that matrix multiplication is going to be a requirement for a good long while... which is why gpus had it in the first place, and why we're building MM accelerators now. If you build an ASIC for a whole process, yeah, that's probably going to be bound to the usefulness of that particular process. If you build an ASIC that crunches the hell out of an incredibly commonly needed mathematical function... that has more broad appeal. That all said, this chip is probably so different, it might actually be analog, but at the end of the day, _someone_ or _something_ needs to get us to stop using GPUs for a problem that has discrete, defined elements that can be executed much faster and much cheaper. The power that's being chugged around the world for this is really the fault of everyone saying "This works, it's good enough, fuck optimization." and now power consumption is fucked on a global scale. I see no way in which that's a good thing.


Peach-555

I agree that a lot is probably going to change and it is to early to predict what architecture will become popular. I do think there is economic sense for Transformer/Inference ASICs currently as it frees up the general hardware to do training instead of inference. It does not make sense if a inference ASIC gives return on investment over general hardware in 10 years, but definitely if in 6 months.


CoralinesButtonEye

500k tokens per second is going to seem like NOTHING in a few years. people will be like 'how did they even get ai's to work on such wimpy hardware'


Dayder111

You are downvoted, but I agree. https://arxiv.org/abs/2402.17764 https://arxiv.org/abs/2406.02528 Just these papers alone show that it's possible. And dozens of other optimization and improvement methods came out in the last ~year, more than ever, the research is accelerating.


Gratitude15

If there's a new architecture to be had towards this use case... The use case that is responsible for like half or more of nvidia net worth... So TRILLIONS of dollars... I would place a large bet on nvidia bringing that to market in a way that they win it. This isn't Microsoft late to web. Or Google late to AI. This is nvidia being hit in their core business model that they are elite at. If anything what this tells me is that the computation curve will continue to grow in that 1 oom per year rate given the specialization etc that is possible. It's just staggering to realize that by the end of this decade we have every reason to believe that we will have 100,000x more compute going into intelligence than today. Today's amazing models will be dwarfed at that level. This ain't pentium 3 to pentium 4... This is horse and buggy to interstellar travel... And gpt4 is the buggy 😂


FatBirdsMakeEasyPrey

Text to video/image run on diffusion model.


ceramicatan

You mean the nvidia killer?


RobXSIQ

so...couple hundred bucks once its released???


HyrcanusMaxwell

A. How expensive is this chip compared to a gpu?B. I hope this thing works, because it will slow foundational ai research and refocus attention on fine tuning, leaving some of that research accessible to everyone.


Trucktrailercarguy

Who makes these cpus?


wi_2

Replace is a very big word. You can't train on these things


cydude1234

Time to short NVDA


Akimbo333

Implications?


[deleted]

Realistically how much use is this gonna see? The world of AI is much bigger than just transformers, and I feel like transformers are hitting their peaks and we'll have to move on to a fundamentally different architecture to see more improvements towards AGI.


Apprehensive-Job-448

so far every major LLM, image creation, video creation is based on the transformer and nothing indicates it is hitting any kind of peak. things have scaled for the last 15 jumps it should keep going as long as we scale up


Murder_Teddy_Bear

That gets me kinda hard, sadly just a render, tho.


PiggyMcCool

It is useless if it doesn't have a "good" software stack. Nvidia has an excellent software stack.


Ilovekittens345

asic's dont have a software stack like how Nvidia build out CUDA they only work for one specific application.


ClearlyCylindrical

They most certainly do have and require software stacks. They will need software which knows how to communicate with the device and integrate it into DL frameworks.


PiggyMcCool

that’s why it is useless


Ilovekittens345

For now yeah, in the future when this tech is completely worked out applications of the tech will be run on asic's not gpus.