Crafty-Struggle7810 1 month ago

These people have been around for a while. I don’t think they have a working product, but their renders look nice nonetheless.

_dekappatated 1 month ago

brb starting an ai hardware startup with AI generated pictures

Itchy-mane 1 month ago

I'd like to invest in you

_dekappatated 1 month ago

I'll give you 0.05% shares for 50 million dollars.

Itchy-mane 1 month ago

💦🤝 deal

_dekappatated 1 month ago

nioce, time to give myself a 50 million dollar salary and declare bankruptcy

LeahBrahms 1 month ago

I'd like to short you

norsurfit 1 month ago

Only AI generated pictures but no product? Meh, I'll only give you $20 million...

latamxem 1 month ago

You are correct this was posted in /singularity 6 months ago. Same pictures and one pager website but different story . Last headline was University students create company that produces amazing AI chips. I couldn't find the post here. So this is either someone trolling or someone building a scam.

latamxem 1 month ago

[https://web.archive.org/web/20231230154918/https://www.etched.com/](https://web.archive.org/web/20231230154918/https://www.etched.com/)

sdmat 1 month ago

They make a good case for small-medium dense models with short context lengths. It is far less convincing for large MoE models with long context lengths - a class that notably includes every SOTA model. This is because such models intrinsically require much more memory, for the model weights and more importantly for KV cache for every item in the batch. As a result the maximum possible compute intensity per GB of memory decreases drastically. And running very large models at extreme batch sizes is unattractive for latency reasons. I also wonder about the claim that the hardware supports every current model. Really? Does it support whatever attention black magic DeepMind is doing to get 2M context lengths with speed and good performance? How do they know that - did Google give them a peek at the architectural and algorithmic details?

PuzzleheadedBread620 1 month ago

What about training? They only mention inference.

drsimonz 1 month ago

If inference is 20x cheaper, edge computing applications become 20x more realistic (plus or minus, lol). Anyway, isn't half of the training process just executing the model forwards, so you have an error you can then back-propagate? Although if back-propagation isn't possible on this hardware, then maybe not.

[deleted] 1 month ago

backprop=derivative and update weigths. you need update this thing each epoch. its each epoch, not half time. at least usually no.

drsimonz 1 month ago

I think the problem is that the ASIC hardware probably doesn't have the ability to save the derivatives during forward propagation (which is what libraries like pytorch do). Sure, maybe you don't update the weights after every sample, but you still need the derivatives from each sample right?

[deleted] 1 month ago

sure, no derivatives no backprop. you could compute loss with multiple inferences, but for derivatives, but what you state could be right (i have absolutly no idea, i dont even want to google this).

Overflame 1 month ago

If 1 of this is actually equal to 20x H100s (inference, training, cost etc.) then Nvidia's stock will skydive tomorrow. If that won't happen, which it won't, then I call this BS. They didn't show us ANYTHING, only: "Trust me bro, we're 20x better than the most valuable company who invests billions of $ in R&D." It's enough that OpenAI is now on life support after they teased our balls for a year now, these new players can just fuck off if they already do it without providing any REAL value.

BigButtholeBonanza 1 month ago

b-but their renders are pretty

[deleted] 1 month ago

That’s right. Their renders are pretty so I believe everything they said

Tkins 1 month ago

Where the hell are you getting info like open AI is on life support? This is the most bro take.

[deleted] 1 month ago

People have a hard time separating Reddit from reality.

CheekyBastard55 1 month ago

If OpenAI doesn't release something THIS VERY SECOND I WILL CONSIDER THEM FINISHED! DONE-ZO! ZILCH! COMPLETELY OVER!

Ilovekittens345 1 month ago

The thing with ASIC's is that they are application specific. If tomorrow a new architecture comes out that makes the transformer architecture obsolete then an H100 would keep it's value because running the new tech is a software thing, a ASIC that can only do transformer inference loses most of it's value overnight.

drsimonz 1 month ago

On the other hand, I could see this locking the AI industry into the transformer architecture for much longer than it would have been otherwise. Granted, it seems to be a pretty versatile architecture, but I doubt it'll be the architecture that produces ASI. If ASICs come to pervade the market, then new architectures (even if they're objectively better) will struggle to compete on a cost basis. Developing an ASIC is extremely expensive, which is why we're only seeing this now after several years of transformers dominating the field, so ASICs probably won't be developed for new experimental architectures (at least, not until we already have AGI).

Ilovekittens345 1 month ago

Maybe in a couple of years it becomes clear that scaling up the transformer architecture has hit the law of diminished returns and now most companies that want to launch a consumer facing AI product start using LLM's as modules with other stuff build around it. Kind of like the OS of a kernell. In such case, I can see a well build and timely released ASIC become extremely successful But I think this is going to still take 5 years or longer. Right now almost all the big AI companies are primarily trying to get more data and offer their services for free primarily because they want to train on the interactions. Untill we know if can keep on scaling up ... I mean one of these days Google is going to train one their youtube videoes. And finding this out, finding new sources of data. That process could easily continue a decade before we have excausted it and can make a conclusion on what happens when you scale up the transformer architecture to the absolute max.

drsimonz 1 month ago

True, there is certainly more data out there waiting to be tapped. And probably a lot more work to do in curating that data. Maybe that will already be enough to get to AGI.

Neon9987 1 month ago

im curious how far labs are with synth data, numerous labs have made hints that Data shortage "can be solved with more compute", i remember reading a blog by a oai guy that said he'D rather choose more h100's than more coworkers bcs it reliably gives more synth data, faster iteration bla bla

playpoxpax 1 month ago

Yeah, this is 100% BS they’re trying to feed us here. Without even showing any demos. Basically just ‘trust us bro‘. Even if that company somehow doesn’t bullshit us, no one‘s gonna spend $$millions building an infrastructure that can only run inference for one particular model. What’re they gonna do when they inevitably need to finetune it (or even upgrade it)? Throw away all the old cards and start building their entire hardware stack from scratch? That’s not cost-effective at all. But I gotta disagree on the idea that Nvidia is an unshakeable monopoly here. We’ve already seen them having been kicked out of the crypto mining market by ASICs. It’s not such a stretch to imagine them being overtaken in AI by some specialized architecture. Like neuromorphic chips or something. But those are a long way off.

OwOlogy_Expert 1 month ago

> What’re they gonna do when they inevitably need to finetune it (or even upgrade it)? Throw away all the old cards and start building their entire hardware stack from scratch? Presumably, the old hardware will still be useful for running the old algorithms, which you may still want to do even after developing your new hotness. A) The old hardware could be relegated to simpler tasks, but tasks that still need to be done. Simply scaling up existing practical uses for existing AI models. B) The old hardware could be integrated into the new system, allowing the new AI system to shunt workload off to the old hardware when it's something the old hardware can do. Analogous to a human brain region that's specialized for one particular purpose. Or, for an electronic analogy, like a CPU sending simple math calculations to the mathematical co-processor instead of running the calculation on the CPU itself. The specialized, outdated hardware could be a kind of co-processor, taking load off the main system when you ask the system to do something that the old hardware is capable of.

Singularity-42 1 month ago

Also - why wouldn't NVDA sitting on $3T valuation not able to replicate this? Or just buy out these guys with some leftover pocket change in Jensen's leather jacket?

Bernard_schwartz 1 month ago

Big difference in showing something in a lab and scaling up to production. Especially as most of the really high tech stuff is booked for years. Interesting nonetheless.

wi_2 1 month ago

This is bitcoin miners asic race all over again. We will see new asics pop up everywhere for people dumb enough to fall for it

Dayder111 1 month ago

What they promise is very possible and can easily be true, if some professional team (like their) actually spent some time designing a chip purely for transformer inference. It won't be able to do anything else, no scientific calculations, no support for different neural network architectures, no programmability outside of very tiny extent, but will be fast and energy-efficient. Chips pay a high price in energy INefficiency and slowness to support a lot of things that people need or may need. Especially CPUs. The downside is, if such specialized chips get widely adopted, they won't be able to switch to new, better architectures if they will be discovered, and will be stuck with default, mostly unmodified transformer. Potentially hitting such companies hard, or/and slowing down the progress.

GraceToSentience 1 month ago

I bet they'll be successful. I also bet people are going to imitate them, mark my words.

MeMyself_And_Whateva 1 month ago

It needs to be cheap if I need two or three versions for the different architectures available. Having one for transformers is a necessity.

crash1556 1 month ago

custom chips will always be better than a generalized solution, sometimes 100,000x better chips likes this could eventually enable a robot to run its AI GPT-Whatever model locally on its person.

I_make_switch_a_roos 1 month ago

![gif](giphy|jrnKWduDzZHtW6fCi6|downsized)

johnjmcmillion 1 month ago

Man, this gives me Bitcoin-PTSD. Got burned by one of those ASIC scams back in the day.

ClearlyCylindrical 1 month ago

I really don't like this since if too much hardware becomes specialised it will be difficult to incorporate large architectural changes later down the line and could get us caught in a local minimum of sorts.

baes_thm 1 month ago

The business model of so many of these hardware startups seems to be: "transformers are an important workload, so we optimize for this. Nvidia on the other hand doesn't design chips that are optimized for transformers for [some reason]" ... meanwhile Nvidia is probably the single biggest reason we have today's AI boom, they literally created this market over the course of 15 years by building AI features. To think that you can roll up and beat Nvidia at its own game by simply prioritizing the algorithms that _Nvidia itself_ fought like hell to legitimize, is ludicrous. I'm not saying that Nvidia can't be caught, but you need a better plan than going head-to-head right now, unless you can afford to wait for them to make a mistake. They have a massive lead in this market because, again, they literally created it.

longiner 1 month ago

Do you think Nvidia got lucky because OpenAI launched the AI boom by chance or was Nvidia really future focused and knew AI would become big but just a matter of when?

baes_thm 1 month ago

Nvidia is probably the biggest reason we have this AI boom. OpenAI was important for sure, but if you look back, they trained GPT-1 on V100s, which were 10x faster than P100s because they had better matmul accelerators. In fact, Nvidia delivered the first ever DGX box to OpenAI as well. Before that, NVidia created cudnn in 2014. If you look at Jensen Huang's interviews, he likes to talk about this

longiner 1 month ago

But I think OpenAI was caught with their pants down by how much the world enjoyed ChatGPT that they didn't even have a plan to monetize it yet. Without ChatGPT the world would probably still treat AI as the narrow image processing niche that it was before and we wouldn't have the massive money dump into AI that we have today. It might also mean Nvidia would be making massively expensive chips for a few small players like OpenAI (before they became big) and there wasn't a roadmap for OpenAI to be profitable which means there might not have been a consumer for their chips.

Fun-Succotash-8125 1 month ago

How much does it cost?

[deleted] 1 month ago

15 bucks + tip + a lemon + insurance + a cup of tea

00davey00 1 month ago

So we could see a future where nvidia compute is used almost exclusively for training and compute like this for inference?

osmiumo 1 month ago

Etched recently raised $120m, so they’ve got some deep pockets. Nvidia also recently confirmed they’re entering the ASIC space, so they’re aware this is the direction the market is headed in. All in all, this should lead to some real competition and development.

Apprehensive-Job-448 1 month ago

Sohu is >10x faster and cheaper than even NVIDIA’s next-generation Blackwell (B200) GPUs. One Sohu server runs over 500,000 Llama 70B tokens per second, 20x more than an H100 server (23,000 tokens/sec), and 10x more than a B200 server (\~45,000 tokens/sec).

icehawk84 1 month ago

I don't think an H100 can run Llama 70B at 23k T/s, cause I tried deploying it to one and it wasn't anywhere close that fast.

CallMePyro 1 month ago

H100 server bro. Hiring bar.

visarga 1 month ago

large batch mode

Peach-555 1 month ago

How many tokens did you get?

icehawk84 1 month ago

A few hundred per sec IIRC.

Peach-555 1 month ago

Big gap, is it possible to run several instances of inference at the same time? Is the few hundred per second a individual instance? I don't know how much Groq claims do be able to do, but it outputs \~350 tokens per second per request.

icehawk84 1 month ago

Yeah, Groq was faster when I tested, so I ended up using it through their API instead of deploying it to my own servers. Multi-GPU can help with batch inference, but my use case didn't lend itself well to that.

AdorableBackground83 1 month ago

That’s wassup. ![gif](giphy|MO9ARnIhzxnxu)

iNstein 1 month ago

Consider that Bitcoin is exclusively mined using asics, why would they do that rather than the gpus that used to be used? Fact is, that for certain tasks that are highly repetitive, asics provide the best performance and cost. Asics can generally be produced much cheaper and they can outperform non dedicated architectures. I get a strong vibe here on Reddit that there are a lot of butt hurt new Nvidia investors...

pxp121kr 1 month ago

So are you telling me that a small company will come up with something that NVIDIA, a 3 trillion dollar company have not thought about? Being skeptical here.

Peach-555 1 month ago

Nividia of course knows about inference-specialized hardware They won't bother making it themselves if they have higher margins on their non-specialized A.I chips

Aymanfhad 1 month ago

There are companies a thousand times smaller in value than Apple that make phones with higher specifications than the iPhone and at a lower price. The company's value is not a measure.

MainStreetRoad 1 month ago

I would be interested in knowing about 2 of these companies....

MisterGaGa2023 1 month ago

By making the phone you mean "assemble from readily available parts made by multibillion corporations"? Cause you can do that at your home. And by higher specifications you mean "some parts specifications are higher" and some are cheap outdated junk, like CPUs?

Aymanfhad 1 month ago

Many phone companies assemble components, including Apple. There are many phones that come with the SD8 Gen 3 processor and 16GB of memory and are cheaper than the iPhone. Is the SD8 Gen 3 processor old junk? --

irbac5 1 month ago

O really doubt they are ahead by 2 gen

Apprehensive-Job-448 1 month ago

*from their website:* # How can we fit so much more FLOPS on our chip than GPUs? The NVIDIA H200 has 989 TFLOPS of FP16/BF16 compute without sparsity[^(9)](https://www.etched.com/announcing-etched#footnotes). This is state-of-the-art (more than even Google’s new Trillium chip), and the GB200 launching in 2025 has only 25% more compute (1,250 TFLOPS per die[^(10)](https://www.etched.com/announcing-etched#footnotes)). Since the vast majority of a GPU’s area is devoted to programmability, specializing on transformers lets you fit far more compute. You can prove this to yourself from first principles: It takes 10,000 transistors to build a single FP16/BF16/FP8 multiply-add circuit, the building block for all matrix math. The H100 SXM has 528 tensor cores, and each has 4 × 8 × 16 FMA circuits[^(11)](https://www.etched.com/announcing-etched#footnotes). Multiplying tells us the H100 has 2.7 billion transistors dedicated to tensor cores. **But an H100 has 80 billion transistors**[**^(12)**](https://www.etched.com/announcing-etched#footnotes)**! This means only 3.3% of the transistors on an H100 GPU are used for matrix multiplication!** This is a deliberate design decision by NVIDIA and other flexible AI chips. If you want to support all kinds of models (CNNs, LSTMs, SSMs, and others), you can’t do much better than this. By only running transformers, we can fit way more FLOPS on our chip without resorting to lower precisions or sparsity.

Philix 1 month ago

FLOPS are flashy marketing, but how are they massively improving memory bandwidth and interconnect speeds to feed those processors? Are they using a deterministic scheduler and SRAM like Grok? If so, it's only inference hardware and not suitable for training. If not, they could still hit the same memory/interconnect bottleneck that Nvidia does. VRAM is only manufactured by a couple companies, HBM3e is HBM3e no matter what processor it is connected to.

Educational-Net303 1 month ago

You're being too serious to an otherwise obvious vaporware company

replikatumbleweed 1 month ago

This is exactly the kick in the ass that Nvidia needs. GPUs for AI are wasteful.

Ilovekittens345 1 month ago

What if you build your ASIC for a specific application and then a new application comes out and your ASIC's dont work on it? While somebody with a GPU just runs new software. How is that not wastefull then? I think it's like at least 10 years to early to build ASIC's for AI. This recent breakthrough is not even a decade old ... so much is going to change.

replikatumbleweed 1 month ago

Running something in perpetuity on an unoptimized architecture is inherently inefficient. AI might change, but it's a pretty safe bet that matrix multiplication is going to be a requirement for a good long while... which is why gpus had it in the first place, and why we're building MM accelerators now. If you build an ASIC for a whole process, yeah, that's probably going to be bound to the usefulness of that particular process. If you build an ASIC that crunches the hell out of an incredibly commonly needed mathematical function... that has more broad appeal. That all said, this chip is probably so different, it might actually be analog, but at the end of the day, _someone_ or _something_ needs to get us to stop using GPUs for a problem that has discrete, defined elements that can be executed much faster and much cheaper. The power that's being chugged around the world for this is really the fault of everyone saying "This works, it's good enough, fuck optimization." and now power consumption is fucked on a global scale. I see no way in which that's a good thing.

Peach-555 1 month ago

I agree that a lot is probably going to change and it is to early to predict what architecture will become popular. I do think there is economic sense for Transformer/Inference ASICs currently as it frees up the general hardware to do training instead of inference. It does not make sense if a inference ASIC gives return on investment over general hardware in 10 years, but definitely if in 6 months.

CoralinesButtonEye 1 month ago

500k tokens per second is going to seem like NOTHING in a few years. people will be like 'how did they even get ai's to work on such wimpy hardware'

Dayder111 1 month ago

You are downvoted, but I agree. https://arxiv.org/abs/2402.17764 https://arxiv.org/abs/2406.02528 Just these papers alone show that it's possible. And dozens of other optimization and improvement methods came out in the last ~year, more than ever, the research is accelerating.

Gratitude15 1 month ago

If there's a new architecture to be had towards this use case... The use case that is responsible for like half or more of nvidia net worth... So TRILLIONS of dollars... I would place a large bet on nvidia bringing that to market in a way that they win it. This isn't Microsoft late to web. Or Google late to AI. This is nvidia being hit in their core business model that they are elite at. If anything what this tells me is that the computation curve will continue to grow in that 1 oom per year rate given the specialization etc that is possible. It's just staggering to realize that by the end of this decade we have every reason to believe that we will have 100,000x more compute going into intelligence than today. Today's amazing models will be dwarfed at that level. This ain't pentium 3 to pentium 4... This is horse and buggy to interstellar travel... And gpt4 is the buggy 😂

FatBirdsMakeEasyPrey 1 month ago

Text to video/image run on diffusion model.

ceramicatan 1 month ago

You mean the nvidia killer?

RobXSIQ 1 month ago

so...couple hundred bucks once its released???

HyrcanusMaxwell 1 month ago

A. How expensive is this chip compared to a gpu?B. I hope this thing works, because it will slow foundational ai research and refocus attention on fine tuning, leaving some of that research accessible to everyone.

Trucktrailercarguy 1 month ago

Who makes these cpus?

wi_2 1 month ago

Replace is a very big word. You can't train on these things

cydude1234 1 month ago

Time to short NVDA

Akimbo333 1 month ago

Implications?

[deleted] 1 month ago

Realistically how much use is this gonna see? The world of AI is much bigger than just transformers, and I feel like transformers are hitting their peaks and we'll have to move on to a fundamentally different architecture to see more improvements towards AGI.

Apprehensive-Job-448 1 month ago

so far every major LLM, image creation, video creation is based on the transformer and nothing indicates it is hitting any kind of peak. things have scaled for the last 15 jumps it should keep going as long as we scale up

Murder_Teddy_Bear 1 month ago

That gets me kinda hard, sadly just a render, tho.

PiggyMcCool 1 month ago

It is useless if it doesn't have a "good" software stack. Nvidia has an excellent software stack.

Ilovekittens345 1 month ago

asic's dont have a software stack like how Nvidia build out CUDA they only work for one specific application.

ClearlyCylindrical 1 month ago

They most certainly do have and require software stacks. They will need software which knows how to communicate with the device and integrate it into DL frameworks.

PiggyMcCool 1 month ago

that’s why it is useless

Ilovekittens345 1 month ago

For now yeah, in the future when this tech is completely worked out applications of the tech will be run on asic's not gpus.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe