Nvidia Unveils Blackwell, Its Next GPU

Q6T46nT668w6i3m · on March 20, 2024

I believe “world models” are the future of the field so I really need better performance in areas like IVP, FFT, special functions (e.g., harmonics), and dynamic programming. The H100 advances (e.g., DPX instructions) are terrific but they feel like a starting point. Hell, improved geometric operations (e.g., triangulation and intersection) would be killer too and surely that expertise exists at NVIDIA! The H100, especially for the price, feels terrible when you’re training a neural network bottlenecked on an operation that flies on a consumer CPU when you know there’s GPU optimizations that have been left on the floor.

tgtweak · on March 20, 2024

I suspect these can be patched in as well - most of these functions have implementations in CUDA implying they should be able to run on the hardware even without dedicated instructions.

rbanffy · on March 20, 2024

This was one of the selling points for RISC: to focus on the most frequently used instructions and implement the rest in software would yield smaller and faster designs.

I have the feeling that GPUs are on the sweet spot where smaller footprint directly translates into more executions and, therefore, higher throughput for the same chip area.

pjmlp · on March 20, 2024

With an "OS" to go along with it,

https://www.nvidia.com/en-us/data-center/products/ai-enterpr...

It is this kind of delivery that the competition misses out.

amluto · on March 20, 2024

I imagine that Nvidia is trying to build a more sustainable moat. If the only things they have that their competitors don’t are a nice development framework, nice libraries and nice drivers, it’s not that hard for a customer to get their software working on a competing hardware platform and cut out a bunch of Nvidia’s enormous markup. But if Nvidia also strong buy-in with datacenter operators and an entire platform to magically run people’s applications without having them need to think about how they’re deployed, then they can try for an AWS-like moat in which customers want to avoid the ongoing cost of DIYing their stack.

shiftpgdn · on March 20, 2024

Tiny corp/tinygrad has been working for a year+ and has raised something like $5 million dollars to try to get AMD chips up to speed to Nvidia, without much success. Check out twitter where George Hotz has been very vocal about AMD needing to open source their chips to allow someone to help them get up to speed.

fotcorn · on March 20, 2024

Consumer AMD chips to be precise. AMD is mostly focusing on their datacenter chips (MI300X), and I assume the support for AI workloads there is much better. They might even see their consumer chip undermining sales of datacenter chips.

NVIDIA got big because CUDA works on the most crappy notebook GPUs up to their most powerful chips, and AMD should do the same, but focusing their limited number of driver devs on the expensive enterprise hardware makes sense IMHO.

robot · on March 20, 2024

Yes, he was first like, we're totally doing this on AMD! A few weeks later he's like wtf@#! this is the buggiest thing ever

amluto · on March 20, 2024

I’m a bit surprised AMD hasn’t purchased tinygrad. Or funded or supported them. Or really made any serious move here.

paulmd · on March 20, 2024

I think moving geohotz closer to the levers of power and into places where he can see even more sensitive information to squawk on twitter/github probably is the exact opposite of how AMD wants to handle the situation.

AI/ML is a rapidly moving field etc and you know geohot is gonna leak it all on twitter as soon as there's anything to announce, which makes it far more difficult for them to pivot later, etc.

gitfan86 · on March 20, 2024

Assuming that the hardware isn't a moat because other people make similar hardware is a mistake.

The networking alone is a huge bottleneck at scale. A competitor has to be better at networking AND chips to be competitive.

robot · on March 20, 2024

"it’s not that hard for a customer to get their software working on a competing hardware platform and cut out a bunch of Nvidia’s enormous markup"

Agree and yet none of the contenders were able to work out their software play (Intel, AMD, chip startups) for more than a year which shows how corporates move slow.

Google is not selling their TPUs AFAIK and their tooling is completely focused on internal use.

So really interesting to see no one else is properly addressing the need even though they have chips (and the chip itself is much simpler than a cpu, a systolic matrix multiplier array).

amluto · on March 20, 2024

I’m not taking about the chip suppliers moving fast. I’m talking about the users. For example, here’s Stability pulling it off:

https://stability.ai/news/putting-the-ai-supercomputer-to-wo...

I’m sure this was a lot of work, and Intel surely helped a lot, and there are probably plenty of kludges involved. But it worked, and there’s a lot of money on the table to do things like this.

kkielhofner · on March 20, 2024

I'm very vocal about this to the point where the naive/cursory view is that I'm an "Nvidia fanboy". It's amazing how many times I've had to try to relate this point and how much hate I get for it - Nvidia is lightyears ahead of AMD and the overall ROCm ecosystem in terms of software support. AMD makes fantastic hardware but at the end of the day it doesn't do anything without software. This is very obvious and very basic.

CUDA will do whatever you want and it more-or-less just works. ROCm (after > six years) is still:

- Won't work on your hardware

- Used to work on your hardware but we removed support within a few years

- Burn 10x more time trying to get something to work

- Be perpetually behind CUDA in terms of what you want/need to do

- Sorry, that just won't work

- Performance is lower than it should be for what is often actually better hardware, to the point where a superior newer generation AMD GPU gets bested by a previous generation Nvidia GPU with inferior (on paper) hardware specs

I've been trying ROCm since it was initially released > six years ago. I want AMD to succeed - I've purchased every new generation of AMD GPU in these six years to evaluate the suitability of AMD/ROCm for my workloads. Once a quarter or so I check back in to evaluate ROCm.

Every. Single. Time. I come away laughing/shaking my head at how abysmal it is. Then I go back to CUDA and sit in wonder at how well it actually works and throw even more money at Nvidia because I just need get things done and my concerns about their monopoly, artificial market segmentation, ridiculously high margins, etc are a distant second to my livelihood.

AMD (and others) need to understand what Jensen Huang has been saying for years - 30% of their development spend is on software. As the announcements this week show, Nvidia is using their greater and greater financial resources and market share to continue to lap AMD in the only thing people actually care about: here's our product and here's what you can actually do with it.

Many people with a fundamental hate/disgust for Nvidia will come back and say "ok bootlicker, it's supported in torch you're spreading FUD". Ok, take a look at the Nvidia platform you linked and show me where the ROCm equivalent is. Take a look at inference serving platforms which are one of the things I care most about. Look at flash attention, alibi, and the countless other software components that you actually need beyond torch in many cases. Watch even basic torch crash all over the place with ROCm.

Sure, you /might/ be able to train or run local one-off inference with AMD. How do I actually run this thing for my users? Crickets -or- maybe vLLM support for ROCm for LLMs (nothing for other models). Then dig just a little bit deeper and realize even vLLM isn't feature complete, requires patches, specific versions all around, and from personal experience a lot of github/blog spelunking and pain. With CUDA it's `docker run` and flies.

With CUDA I can run torchserve, HF TGI, vLLM, Triton, and a number of others to actually serve models up for users so I can make money from my work. ROCm, meanwhile, can barely run local experiments.

AMD needs to get it together.

treme · on March 20, 2024

Crazy that video game graphics business happened to be the secret level portal to AI land

dragontamer · on March 20, 2024

Watch this 90s commercial for 3dfx: https://www.youtube.com/watch?v=ooLO2xeyJZA

GPUs always had more compute / Gigaflops than traditional computers. GPUs in fact have more to do with 80s-era supercomputer architecture than normal CPUs.

https://www.youtube.com/watch?v=ODIqbTGNee4

dehrmann · on March 20, 2024

I'm not sure if it's just nostalgia, but I still like the final 3dfx logo, at least in a video game context.

jdawg777 · on March 20, 2024

That was an unexpected twist.

pjmlp · on March 20, 2024

Ever since GPUs became programable, which goes earlier than many think (see TMS34010 from 1986), there has been many attempts to use the cards for general purpose compute.

It turns out that anything related to neural networks, and similar AI approaches, is all about compute.

dragontamer · on March 20, 2024

I'd say almost the opposite in practice.

GPUs, as they became programmable (and maybe even a little bit before that...), started to take cues from SIMD Supercomputers. So the compute methodologies were researched first, and then GPUs (ie: applications to graphics) were applied afterwards.

I've heard rumors that the first programmable GPUs were considered because GPUs already were in SIMD-style compute and running instructions in a programmable way at the hardware/firmware level. It just needed to be "revealed" to OpenGL or DirectX programmers.

pjmlp · on March 20, 2024

TMS34010 used a C like SDK, in 1986, there were hardly any SIMD supercomputers, unless you consider Cray and Connection Machines as model.

dragontamer · on March 20, 2024

I'm thinking closer to 90s period, where "graphics cards" developed by Voodoo / 3dfx and its Glide API were clearly along the DirectX / OpenGL inspiration path. (3d Graphics pipeline / transform & lightning / etc. etc.). I don't know the exact point when these 3d accelerators became programmable with vertex/pixel shaders, but Glide is clearly on the path and part of the history here.

--------

You're right in that 80s / Arcade chips look to be different. Its a programmable graphics chip alright, but I wouldn't call that chip SIMD, not from what I can see from Wikipedia at least.

pjmlp · on March 20, 2024

I wasn't the one bringing SIMD into the picture. :)

falcor84 · on March 20, 2024

In hindsight I think it makes good sense - game graphics were always aiming to represent worlds with as high fidelity as possible.

barumrho · on March 20, 2024

Good point, but it is interesting that the computations to render visual worlds in high fidelity is the same computation to "ingest" the data to create a model.

Reminds me of how a mic is a speaker and speaker is a mic.

AYBABTME · on March 20, 2024

I think it speaks to the generalizability of linear algebra.

p_l · on March 20, 2024

nVidia claims that they went into video games because it provided a way to fund their goal of compute accelerators

packetlost · on March 20, 2024

I don't believe that claim one bit. That reads like revisionist history if I've ever seen it. If there's sources that back it up, fine, but until I see pretty reasonable proof I'm going to take that as CEO grandstanding while the market is hot.

xyzzy_plugh · on March 20, 2024

Nvidia has been peddling compute for decades. They've long been selling enterprise tier cards for industrial purposes that are ill-suited to games. In many ways CUDA has been their long running attempt to break out of gaming.

I'm not surprised here. They invested and invested and invested and now everyone is playing catch up while they nearly corner the market. In industry circles we've been lamenting their dominance in this area for well over a decade.

Google and Apple are the only companies I see that have a shot at disrupting their market but they tend to end up navel gazing instead of burning bridges. And Nvidia has bridges with everyone.

It's absolutely an industrial geopolitics mastercraft scenario. We'll be studying them for a century.

Everyone has long known that the secret to unlocking the next tier of performance is solving concurrency. Look at all the advances that have made it into production in the last twenty years. But threading models only get you so far on a CPU, and programming GPUs is comparatively very difficult. Nvidia has always aimed to displace the CPU's prominence, since day one.

It's horizontal scaling 101.

erupt7893 · on March 20, 2024

They've been peddling compute since mid-2000's, however Nvidia has been around for much longer. To the revisionist history point, I'm unfamiliar seeing any of this prior to that. Unless I'm missing something, they were very much a graphics only company competing with 3dfx, Matrox, S3, ATi, etc

dotnet00 · on March 20, 2024

They were founded in 1993, but until 1998 had only put out 2 chips, the first was a quadratic surface renderer that didn't sell well, and the second was a normal triangle rasterizer that saved them from bankruptcy. Then in 2001 they released the first GPU that could run code, each subsequent generation expanding on that capability until they were able to do general purpose compute with CUDA.

hibikir · on March 20, 2024

Nvidia was selling cards with hardware acceleration since 1997. The first one I own is from 1998, and its advantage wasn't that it had superior compute, but that it also included the non-3d circuitry of the rest of the video card. So instead of having a 2d card and a Voodoo, which were daisy chained, you could use that a lone Riva TNT that will do both. So they started being less about compute than the competition.

Between the first nvidia cards and the first release of CUDA, you have a decade.

dehrmann · on March 20, 2024

> So instead of having a 2d card and a Voodoo, which were daisy chained, you could use that a lone Riva TNT that will do both

Or get the slightly underpowered Banshee.

bcrosby95 · on March 20, 2024

Decades is still long after the first GeForce. Before CUDA we had third parties using graphics primitives to run general purposes computing.

So I'm not buying it.

dotnet00 · on March 20, 2024

You can find slides from IIRC one of their first GPUs talking about how they can be made into very capable compute accelerators. They've supported CUDA on most of their GPUs since 2007 while the competition is still struggling to put out a truly equivalent solution. Even before that they had Cg.

I think it's pretty obvious that they saw the writing on the wall about the value of GPUs for massively parallel computing and actively pushed to be in the right spot for when everything took off.

Symmetry · on March 20, 2024

And probably they'd been at least thinking about that since the early 2000s to have it ready by then. But the claim is that they were thinking about it since their beginning in 1993, and that's the part that just seems hard to believe.

dotnet00 · on March 20, 2024

That's the claim Jensen has made, and while there isn't any particular evidence besides his word to go off for the founding goal, it's worth considering that between 1993 and 1998 they only put out 2 chips. The first one was quadratic primitive based (which, sounds more focused towards a future of compute applications, due to the ease of representing equations in such a form), and the second one was a last ditch effort triangle primitive chip to avoid bankruptcy after DirectX settled on triangle primitives. So I don't really think those years count for much in terms of determining what their goal was.

Their release cadence only really picked up in 1998, and just 2-3 years later they had the first 'programmable' GPUs with early shaders.

As such, saying that NVIDIA wasn't always aiming for GPU compute feels kinda like saying that SpaceX wasn't always aiming for feasibly reusable rocketry because it took them ~15 years to build up to their current version of it.

agumonkey · on March 20, 2024

Also a funny 'existential' point to remember. You have a clean idea (doing math / quad-splines) but the market sinks it.. doesn't mean it won't be good later. It's a game of long and patient grind.

nabla9 · on March 20, 2024

It's not hard to believe.

The idea that future of computing is related to parallel numerical computing and linear algebra (HPC, graphics ,scientific computing, data science, machine learning) predates current deep learning boom.

Already in 2003 and GPGPU (General-purpose computing on graphics processing units) was a thing. OpenVidia came out in 2003.

Scene_Cast2 · on March 20, 2024

They tried to do physics acceleration at one point. I think they were trying to find a niche for their DSP-style compute.

dragonwriter · on March 20, 2024

> They tried to do physics acceleration at one point

Well, there is PhysX which they bought in 2008 and still have; when they bought PhysX they shifted the acceleration implementation from relying on dedicated physics accelerator hardware to GPGPU.

There’s APEX which was built on top of PhysX but later discontinued.

There’s FleX…

“At one point” might be a slight misstatement.

agumonkey · on March 20, 2024

I'm 50/50 on this. I don't think they ever envisioned (npi) that gaming would grow that much and would allow them to pivot back into GPU based HPC. That said they might always had a dream hinting at this.

Murfalo · on March 20, 2024

What is so hard to believe? Technology evolves rapidly. I can't imagine that anyone investing big money in compute technology wouldn't have expected that.

Zenul_Abidin · on March 20, 2024

Because that happened in a time when there were many players in the 90s each making their own GPUs for gaming purposes specifically. Compute was not even on the picture until things like CUDA and OpenCL came out.

froonly · on March 24, 2024

Back in the late 90s, there was a project at SGI, called Bali, to make all their pipelines work in IEEE 32-bit floating point (they were using Intel i860 chips) so that they could do HW rendering of scenes written in Pixar's Renderman language.

Sony copied that idea for the 1st Playstation, and then folks like NVidia & 3DLabs quickly followed suit, the idea being they would enable that functionality for games like Final Fantasy.

In the early 2000s, the HPC folks realized that you could use a GPU for physics & engineering codes, and here were are 20 yrs later.

izacus · on March 20, 2024

You do understand nVidia dates back to fixed pipeline accelerators for essentially some vertices and textures, right?

pxtail · on March 20, 2024

Of course, obviously ever since inception of the company they had the noble goal of uplifting human race. It was just the set of unfortunate circumstances that forced them to make trivial utility devices just to survive long enough

cma · on March 20, 2024

They did a lot on medical scan visualization pretty early on too.

ripe · on March 20, 2024

"When we were selling shovels to coal miners, we always secretly knew there was gold in those hills."

Yeah, sure.

tgtweak · on March 20, 2024

Highly doubt this was the initial idea for nvidia given they were graphics-only for a very long time. CUDA definitely felt like more of a value-add for the first 5-6 years than a concerted effort to build accelerators and to fund that with graphics demand. First "tesla" line of GPUs - which had very little compute-only focus - were in 2007.

ksec · on March 20, 2024

>First "tesla" line of GPUs - which had very little compute-only focus - were in 2007.

CUDA was first released in ~2006 but started at least in 2004, largely building on top or the momentum of Cg which was released in 2002 and being worked on since 2000. There was at the time in the late 90s about general purpose parallel programming and how it could possible be done on GPUs or doing it on something like PS2 Cell.

I dont know if Nvidia was really started with CUDA in mind in 1993. But Nvidia was into CUDA like GPU usage WAY ahead of anyone else in the field.

gosub100 · on March 20, 2024

Wasn't there a time when people were hacking shaders to do compute tasks and read the results back instead of displaying to the frame buffer?

p_l · on March 22, 2024

Yes, multiple such attempts, including complete languages. Very thin hardware support as different GPUs varied very much on what could be done.

GeForce 8 introduced CUDA because graphics pipeline similarly evolved into position where instead of forcing the use of dedicated cores for specific tasks (vector and fragment shaders) due to simple economies of chip making, the Shader Model 4 introducing yet another stage justified making unified shader cores... which could be just as easily used for other tasks.

cma · on March 20, 2024

Cell was PS3 (2006)

m3kw9 · on March 20, 2024

And tell me what is Nvidia trying to fund with the AI business?

amelius · on March 20, 2024

Then why does my graphics card have 5 video outputs?

MangoCoffee · on March 20, 2024

Nah, Intel said Nvidia just got "lucky"

xyzzy_plugh · on March 20, 2024

So says every loser when they are bested by their own ignorance.

baq · on March 20, 2024

Does a series of matrix multiplications have a soul?

AYBABTME · on March 20, 2024

Matrix multiplication is just an effective representation for the dense graph structures. The same graphs could be implemented in other ways (adjacency list, edge list), perform the same overall logic, without doing vector/matrix/tensor math. It seems like the magic is in the overall idea of neurons and networks of them following increasingly interesting architectures and training mechanisms.

But because these graphs are mostly dense and involve numerical operations, matrices/tensors are a great implementation.

sirsinsalot · on March 20, 2024

Does a series of electrical impulses in a brain?

amelius · on March 20, 2024

Obviously no, because signals take time to move from A to B, while humans live in the now.

bheadmaster · on March 20, 2024

Define "soul".

paulmd · on March 20, 2024

does a SQL table provide a stochastic representation of conceptual symbolics?

oh, sorry, I thought we were just asking questions

VladimirGolovin · on March 20, 2024

Does a bunch of atoms arranged into proteins have a soul?

transcriptase · on March 20, 2024

It would appear the G in GPU now stands for AI.

zacksiri · on March 20, 2024

They might re-purpose the G to Generative

Generative Processing Unit works for all cases.

m3kw9 · on March 20, 2024

When AI is in everything, G would become General

black_puppydog · on March 20, 2024

Together with the newly-peripheral PPU (formerly known as CPU)

m3kw9 · on March 20, 2024

CPU is gonna be the coprocessor when most things are inferenced instead of strictly calculated.

adverbly · on March 20, 2024

I think the fact that people sometimes use the letter g to denote generalized intelligence might be useful here.

You could call them gPUs.

https://en.m.wikipedia.org/wiki/G_factor_(psychometrics)

sesuximo · on March 20, 2024

Apu already taken by aviation tho

Uvix · on March 20, 2024

And by AMD (Accelerated Processing Unit, their term for CPUs with good integrated graphics like what Xbox and PlayStation consoles use).

whamlastxmas · on March 20, 2024

And The Simpsons

k8sToGo · on March 20, 2024

So is GPU (Ground power unit)

Dalewyn · on March 20, 2024

Auxiliary Power Unit.

Both aviation and computers use them, though the latter more often call them UPS (Uninterruptible Power Supply).

bogwog · on March 20, 2024

Let's be real, nobody cares about their acronyms being unique or not confusing.

Dalewyn · on March 20, 2024

But that's not what parent comment was talking about.

bionhoward · on March 20, 2024

If everyone and their cousin is bullish on compute, then what’s the bear thesis here? Why might compute NOT be the best answer to our challenges as software engineers? Why might a focus on compute scaling ultimately be inferior to something else?

I seek all kinds of answers, including ones about fundamental logic, mathematical physics, etc

whiterknight · on March 20, 2024

When we don’t have a model of the problem it takes enormous amounts of power to synthesize one out of neurons or another general function approximating primitive.

But once we understand a little bit about the problem we can model 80-90% of its behavior with a handful of parameters. Add in some bias and noise parameters and you have an accurate trainable machine learning model that’s orders of magnitude more efficient.

Take for example a spring which can be modeled by 1 or 2 parameters. But its impulse response looks like a sin curve multiple by exponential decay.

If you just train neurons to match input/outputs from a spring you need a ridiculous number of model parameters to describe that shape.

CNNS have seen an enormous amount of success due to this fact: a lot of processes can be modeled by convolution.

Kon-Peki · on March 20, 2024

> what’s the bear thesis here?

Nvidia is ceding the low-end GPU market to anyone who wants it. Not only does it allow a competitor to establish a reliable source of revenue for their R&D department, but it could cut off the sale of the binned chips that are inevitably produced on the expensive, tiny processes that Nvidia uses - which would hurt their margins to some degree.

jauntywundrkind · on March 20, 2024

Innovators Dilemma seems strong on this one. Except typically the flight upmarket is driven by competition below. In this case Nvidia isn't being driven upmarket, the profits are just too tempting focusing on upmarket & there's not much competition downmarket.

paulmd · on March 20, 2024

"obviously chevy is just ceding the low-end market to anyone willing to make a camaro for the price of a camry, just think of all the profit waiting for anyone willing to establish themselves in this market"

bro there is like $10 of margin in your idea for a $200 gpu lol, nobody is "ceding" anything (actually 4060 is a more advanced card than 7600 on literally every front, for ~10% more money) but the cost floor has climbed to the point where $200-300 gpus just don't progress that much anymore.

There's very good reasons for this - shrinks are the least effective on low-tier cards (because memory controllers don't shrink), and you simply don't gain much actual savings from shrinking a 200mm2 die - congrats it's 150mm2 now, on a more expensive node, meaning your $10 chip is now $9. And meanwhile gamers want more VRAM every year, manufacturing and testing and shipping costs have gone up (and cost the same for a 4090 as a 4060), etc. The economics of low-end cards is literally terrible and they are simply falling off the edge of profitability.

Intel is willing to lose money hand-over-fist just to get into the market, but AMD and NVIDIA are pretty much charging fair-ish prices, and gamers just are too emotionally immature to accept that moore's law really really actually is dead for realsies and things aren't going to progress 40% perf/$ per gen anymore.

It's so weird, nobody cries about the CPU market like this. A 1600AF went for $85, a 3600 went for $160, nobody said "boo" when the 5600X increased that to $330 or whatever. Nowadays you are spending at least 50% more on your CPU than you did 5 years ago, sometimes closer to 2x. The enthusiast market is buying $250-400 cpus now, not $85-160. And obviously everyone understands that upgrading your CPU every gen is terrible value too, especially when prices have drifted upwards. But they don't have a half-decade of negging from reviewers telling them that this is a market in crisis, and that they should feel bad about buying a CPU, etc.

The literal half-decade of warfare from reviewers against the GPU market is so tired at this point. Bro, things are going to slow down, it just is how it is. GPUs are the processor that's most dependent on moore's law providing growth in transistors at the same cost, and wafer price increases hit them the hardest. Go complain to TSMC instead, or ASML, or the brick wall - it's ultimately a physics problem. But there's a hell of a lot of clicks and youtube ad money to be made whining about it in the meantime.

At least reviewers are finally coming to jesus on DLSS - mostly because they know AMD will finally have a decent upscaler within a year tops, and that RDNA4/5 will be pushing forward on tensor etc. The writing was on the wall as soon the specs leaked for PS5 Pro, which is basically adopting RTX features wholesale. https://www.youtube.com/watch?v=CbJYtixMUgI https://www.youtube.com/watch?v=BG-7vyw2YRg&t=1625s

loudmax · on March 20, 2024

I'm absolutely not going to short Nvidia stock, but it's plausible that they're overvalued.

Nvidia GPUs are pretty flexible in terms of computation and extremely power hungry. It may be that a next generation of more specialized hardware, such as TPUs or something, outperforms Nvidia GPUs on machine learning tasks to such an extent that those GPUs are obsolete for those tasks. This next generation could come to market sooner than Nvidia anticipates.

Another possibility is that ML researchers figure out some ways to radically reduce the amount of compute required for good training and inference on _less_ specialized hardware. It's really impressive what you can do with llama.cpp. If open source models running on consumer grade hardware ever get to 90% as good as ChatGPT (which, to be clear, is absolutely not the case currently), then those top end GPUs are overkill for most use cases.

I don't think either of those scenarios is particularly likely, but they're at least plausible.

paulmd · on March 20, 2024

> Another possibility is that ML researchers figure out some ways to radically reduce the amount of compute required for good training and inference on _less_ specialized hardware

just like the creation of radically simpler internal combustion engines led to us spending a lot less on internal combustion engines, right? /s

vinyl7 · on March 20, 2024

If we focus on writing better software with performance in mind instead of this insane stack of abstraction disaster, we could easily get massive increases in compute capability with current hardware.

The most impressive thing about modern computing is that we've had exponential increase in compute speed, yet everything runs as slow as it did 30 years ago

kilpikaarna · on March 20, 2024

Not sure about your definition of bearish, but concerned about the geopolitical risk of relying on this one company and their one supplier. In addition to everything else.

Also bearish on programmers keeping up on their fundamental algorithms, rather than trying to throw NNs at every problem.

ApolloFortyNine · on March 20, 2024

>Also bearish on programmers keeping up on their fundamental algorithms, rather than trying to throw NNs at every problem.

Besides interviews, at least 90% of developers have no need for 'fundamental algorithms'. The library being used uses them in some way sure, but the vast majority of devs simply need to know how to use the tool, not how the tool itself is developed.

poorlyknit · on March 21, 2024

This attitude is how we get stuff like the GTA V O(n^2) JSON deserialization bug. I get it, nobody is implementing quicksort on a day-to-day basis, but being taught the ideas behind these algorithms is important for building mental models.

ApolloFortyNine · on March 21, 2024

If the GTA bug was anything but just bad programming, it was a dev trying to implement their own JSON parser. It's said as much in the analysis that the JSON parser was simply poor, they could have just used any of the already existing JSON parsers.

And the other smaller problem was using a hash array instead of a hash map. You don't really need a mental model of multiple algorithms to know o(1) is faster than o(n).

The whole thing is more of a sign of how poor coding practices must have been at Rockstar for such a bug to not only make it in, but persist for years.

kilpikaarna · on March 20, 2024

Eh, I wasn't even referring to forgetting how to write bubble sort or hashmaps from scratch. But just being able to reason about different algorithmic solutions to a problem, and intuit something about their complexity and requirements.

Thinking back to a recent-ish discussion here about the (very elegant and efficient) algorithm behind Shazam, and some of the comments here being along the lines of "haha, that's so quaint, nowadays you could just use a neural net". Nevermind how that would even work as well.

Or programmers excited about neural networks forgoing the much simpler, and in many cases completely sufficient, computer vision algorithms built into OpenCV, in favor of trying to train their own model from scratch.

cactusplant7374 · on March 20, 2024

The bear case:

1) AGI won't happen because we are on the wrong path

2) AI being a big part of our lives is still a theory. Aswath Damodaran has some brief thoughts on this.

But the biggest bear case has to be that the technology won't get better. Essentially, everyone assumes that it will without reservations.

ajross · on March 20, 2024

Going to skip the "fundamental logic, mathematical physics, etc" angle and go with:

"That AI isn't all that that and won't make much money" seems to be by far the biggest one. So far the applications are impressive and a little scary, but not actually something that anyone is going to pay for. Apple makes a zillion dollars because people want its phones. Google makes a zillion dollars because people want to sell junk to folks on the internet.

You need to posit a product built out of compute that does more. Maybe replaces a bunch of existing workers in an existing industry, something like that. So far the market is still looking.

aurareturn · on March 20, 2024

At my work, I'm finding so many incredible things that GPT4 API can do to make my company run much more efficiently.

For example, being able to feed a potential customer's invoice into GPT and ask it to see what kind of services we can offer to beat its price. Our sales people had to spend hours doing this before. Now it's done in 2 minutes through an engineered prompt. And it's incredibly accurate.

The problem with GPT4 API is context size and price. That's it. Both are bottlenecked by faster and cheaper compute.

That's my bull case for more compute, not bear case like OP asked.

Macha · on March 20, 2024

> You need to posit a product built out of compute that does more. Maybe replaces a bunch of existing workers in an existing industry, something like that. So far the market is still looking.

Although, even if that is the ultimate result, there's still going to be plenty of money changing hands on the way to that conclusion. See also all the blockchain/web3 companies, when there was (to me at least) clearly a lot less substance/potential there.

whiterknight · on March 20, 2024

These giant GPUs aren’t getting more efficient, they draw even more power to do more work. It’s impressive and will have important use cases.

But fundamentally technology gets better when we can do more with less.

dotnet00 · on March 20, 2024

They are getting more efficient. They draw say, 2x the power, but do 4x the work. The H100 is apparently ~5x more efficient than the A100.

whiterknight · on March 20, 2024

Agreed, I need to describe this more precisely.

When it has to draw that much current it limits the contexts in which it can be beneficial.

stephbu · on March 20, 2024

Ironically the power per cycle is decreasing - power and thermal dissipation are really the limits NVIDIA is exploring. It’s what the software does with those cycles that is leaping exponentially.

paulmd · on March 20, 2024

The other bottleneck they are studiously exploring and minimizing is beachfront area and networking/interconnect bandwidth.

Nvidia went all-in on infiniband serDES while AMD chose pcie/CXL. But since Pcie signaling requirements are tighter, you need bigger stronger PHYs, which means you get less actual area per beachfront. The penalty is latency/power, but who cares when gpus are latency-hiding machines anyway?

https://www.semianalysis.com/p/cxl-is-dead-in-the-ai-era

https://www.semianalysis.com/nvidia-b100-b200-gb200-cogs-pri...

this in turn means that nvidia can implement more links or bigger links in their nvswitch networks, which means they can construct bigger systems and push the TCO down.

Two 7900X is still functionally a 7900X, but two 3090s is functionally a 48GB card. Nvidia has got the interconnect bandwidth to a point where it’s a significant enough fraction of the local bandwidth to be functionally one single gpu - this is the same argument as MI300X etc. Doesn’t matter whether the link is on-package or off-package, what matters is that it’s a significant fraction of the speed of your local memory or cache ports. Nvidia did that, with large numbers of gpus, not just a pair of chiplets.

Nvidia has been thinking about this one for a long time - nvswitch is on its third generation, and can switch literal terabytes of data per switch, times several switches. The Mellanox purchase too, but it goes back way longer.

And unlike AMD they actually have a driver that works and just trivially exposes these capabilities and gets out of the way. If you want to tinker and build the open alternative that’s fine, other people want to work.

This is shocking to many AMD fanboys but actually Jensen is a good engineer too, nvidia is mostly on top because they sell products that people want (to such a relentless degree they get furious if they don’t get faster every year etc) and cannot be trivially displaced by “just as good” Radeon drivers etc - just see the latest installment of the geohot saga. Nobody is trapped by nvidia, it is a golden cage - getting actual work done or just going and playing a game instead of spending hours playing with regedit hacks to disable dxnavi to fix DX11 shader compilation stutter is what you’re buying.

https://twitter.com/__tinygrad__/status/1770160392389771305

https://old.reddit.com/search/?q=Dxnavi+stutter+&include_ove...

Nvidia is on top because of relentlessly competent engineering and savant-level business direction, and as much as people scoff at the idea… that’s literally the reason you hate him lol. He is a Jobs-like visionary figure that can see what the tech can be and drive the engineering and business factors to align along the long-term to get him where he wants to go, while also providing the funding and profit in the short term.

https://m.youtube.com/watch?v=Xn1EsFe7snQ&t=1034

The only company with comparable parasocial negative attachment is apple and it’s for the exact same underlying reason . People are also systematically unable to understand that apple users are not “trapped” or in need of rescuing either. People buy apple because it does what they want it to really well, and they don’t care about installing Linux on their phones. And nerds resent that deeply. It’s not a coincidence there’s this axis of warfare around both Nvidia and the App Store with the EU etc. Nerds cannot abide someone choosing the “wrong” hardware. They are right and you will buy the same thing as them or they will get the EU to outlaw your product, or change the symbol licensing to prevent you running on Linux, etc. If you don't like the same filesystem as me, obviously that means I get to relicense some symbols that have been there for 20+ years and break your filesystem. Btrfs is better, the council has spoken.

It keeps happening for a reason, folks, lol. Nerds can’t tolerate others making different choices. And those users disproportionately self-select to “nerd” platforms like android and AMD.

hackerlight · on March 20, 2024

I think we can be certain that AGI will be compute intensive. Something Ilya Sutskever said made that clear. If you only have a small model, there's logically not much you can do with it. You can represent a single edge of an object, maybe. But it's not enough capacity to represent multiple edges and how they mix together to form an object. And if it can't do that, then it has no representations it can use for reasoning.

There's still the secondary question of how compute heavy it will be, and I don't think anyone knows. But Sam Altman, in a recent speech he gave in Korea, expressed confidence that there isn't a limit in sight for returns from GPT scaling.

jstummbillig · on March 20, 2024

> Why might a focus on compute scaling ultimately be inferior to something else?

a) Humans at some point between here and eternity become more efficient easier, than scaling compute is hard. Seems unlikely.

b) Compute is overrated, now or will be in the near future. I will be happy to donate to the church of "compute is overrated" if that makes people get off of gpt-4+ and let me cook. Read that as: I doubt it.

I don't see a c)

hnthrowaway0328 · on March 20, 2024

I think eventually AMD and other players are going to mive in with force and we will see a surplus of supply in a few years, especially when China starts to spit out a lower end version of...everything.

xyzzy_plugh · on March 20, 2024

Nvidia is too big. China could not easily get away with spinning complete rip offs that even support CUDA without poking a hole in their hull. Nvidia is working very hard to placate China. If China does go down this route, it'll be for China-only for a considerable time. Nvidia can afford to placate China and avoid that scenario, should their economics continue to trend upwards, practically forever.

AYBABTME · on March 20, 2024

I would normally be tempted to think the S&P500 should look linear or similar-ish to last year, and so on. But I think there's a valid thesis where rapid technological advancement does indeed just grow the pie exponentially: where the amount of value that becomes unlocked in a non-zero-sum manner grows tremendously.

Just with current AI models, the amount of value that is waiting to be created (take technology X, add AI to it) is incredible. Casual things that used to take years for a team to build, can now be solved by throwing a GPU at it with a generic model that is fine-tuned a bit. Basically, things that were unpractical 2y ago are now on the table.

The bear thesis is that compute will stop being scarce. Which is plausible, since in capitalism, the best cure for high prices tends to be high prices.

Something crazy to think about is that Accelerando by Charles Stross is starting to look like a prophecy being slowly fulfilled.

gitfan86 · on March 20, 2024

The demand for compute increases dramatically as the functionality and reliability go up.

AYBABTME · on March 20, 2024

Yeah but the margins may go down, and a lot more players may enter the market.

swingingFlyFish · on March 20, 2024

Examples?

solumunus · on March 20, 2024

Cacti · on March 20, 2024

I mean the entire tech industry is predicated on a continued exponential growth of computer power. It could be this 70 years and next couple dozen are a blip of what will be millennia of linear returns.

whamlastxmas · on March 20, 2024

I mean I could baselessly argue that the second we have AGI, we will very shortly after have ASI, after which I think most computing could very likely be hundreds or thousands of times more efficient. It’s possible there exists far more computing power in the world than we will ever need.

flohofwoe · on March 20, 2024

But Can It Run Crysis?

(seriously though, don't call it a "GPU" when rendering takes the back seat)

ksec · on March 20, 2024

I guess Blackwell is too late in the design cycle to use N3. It would be interesting to see, at these sort of margin and volume. Would it make sense to have GPU on latest node? Next Gen 3nm GPU in 2025 and if they could move aggressively 2nm GPU in 2026.

dsir · on March 20, 2024

Does anyone know the numbers in layman's terms regarding the demand for compute and what our systems/chips are able to reasonably process with this new tech?

I'm curious if the technology is now vastly out preforming the demand here or if the demand for compute is outpacing the tech.

trueismywork · on March 20, 2024

Demand is infinite

captainbland · on March 20, 2024

I'm sure it will be a very impressive part but the performance claims sound too good to be true.

"up to 30 times the inference performance, and up to 25 times better energy efficiency"

edward28 · on March 20, 2024

Mostly marketing shenanigans.

ChrisArchitect · on March 20, 2024

Lots more discussion: https://news.ycombinator.com/item?id=39749646

flerchin · on March 20, 2024

Is Nvidia using AI to help design new GPUs? When that happens we're actually off to the singularity. Until then, I can't tell if we're in a bit of hype mode.

aurareturn · on March 20, 2024

Explained here: https://www.youtube.com/watch?v=HxyM2Chu9Vc

Quite interesting.

paulmd · on March 21, 2024

iirc also discussed a bit here as well https://www.youtube.com/watch?v=JXb1n0OrdeI

lakomen · on March 20, 2024

Honestly, idk what that means. Is it a 4xxx successor? What's the point? Why, as a consumer who likes playing video games, do I have to buy a "GPU" that isn't primarily a GPU? Maybe I'm getting old

jug · on March 20, 2024

I'm not sure what you mean. You don't have to buy this GPU and it's clearly not geared towards gaming any more than the A100 was.

samstave · on March 20, 2024

I'm just super curious where these chips will be in 10 years - not the state of the chip design - these physical chips.

It will be interesting when chips such as this percolate down to single folks using one of these to just run their home AI node.

When every building has just one of these in their core building AI system that allows for all the regular talk to your smart home and have it intelligently accommodate your inferred needs/intentions.

Its possible today - but I mean on a wide scale.

syrgian · on March 20, 2024

I am not hopeful at all that it will go that way.

Instead, I expect those buildings to invest in having the fastest, most stable internet connection with added redundancy and everything will be fully centralized in datacenters.

samstave · on March 20, 2024

Fair - I guess these will be relegated to 'Evil Lair Under Ground Bunker'(TM) types - like Zucks Hawaii compound's internal datacenter on-prem.

EDIT: the 10-year is just a random number.

And while these chips will obviously eventually be obsoleted as far as their cutting-edge - that doesnt remove the usefulness of such a chip for as long it can logic electrons.

So even by any future standard - as few of these thrown in more a consumer space will be adding of value for a long time, one would think. I'd be really interested in knowing about fab re-tooling.

I recall one night at Intel in 1997 or so - I stepped onto the balcony to have a smoke at about 1am.

There was another guy there who was in finance and we had a similar smoke schedule - and we would chat. He was lamenting about the difficulty in his work was to re-work a lot of DB schemas because some of the numbers he had to entire as far as finances, were too large for the fields.

I never forgot that - or a common trope that was thrown around intel at the time - I was in my early 20s and was focused on video games when at Intel - so I didnt get to follow along with this: "Its cheaper to just build a new fab than to retool one for the next iteration of processor"

-- so I wonder just how much ancillary waste happens these days with each new chip iteration/evolution. Meaning - all the amounts of resources that went into the fab to make such a chip - and where these wind up in their lifecycle.

The newest machines to make the ultra EUV chips are ~$380 million a piece (which is nothing these days on a scale - but on a unit basis, thats a Fton.

https://www.cnbc.com/2022/03/23/inside-asml-the-company-adva...

https://i.imgur.com/say0bnv.png

Cthulhu_ · on March 20, 2024

They wouldn't, because in 10 years time the consumer-grade equivalent will out-power this model, assuming the increase in performance and decrease in cost persists.

I'm sure there's a few people that have e.g. an intel itanium from 10 years ago in their home lab, but those don't hold a candle to current-day consumer grade CPUs.

Solvency · on March 20, 2024

If this thing is all about AI why are we calling it a GRAPHICS processing unit still?

Don't tell me it's because of familiarity with the word GPU. Nvidia could coin a new acronym and write a PR release and the entire world would circulate it and even discuss it in here and every other vendor would scramble to play catch-up.

whywhywhywhy · on March 20, 2024

> Don't tell me it's because of familiarity with the word GPU

It's because of familiarity with the word GPU...

dotnet00 · on March 20, 2024

The improvements focus on AI, but it's still a GPGPU oriented chip. Due to how language works, it doesn't really matter that graphics isn't the main focus anymore, the chip still follows the same basic architectural principles as what is expect of a modern GPU, thus it is a GPU.

dragonwriter · on March 20, 2024

> If this thing is all about AI why are we calling it a GRAPHICS processing unit still?

Because names are sticky, and no one wants to start evangelizing a new term (“Matrix Math Processing Unit”) for it, preferring to put energy into things with value.

Nullabillity · on March 20, 2024

Presumably they still want something to sell once the bubble pops.

__s · on March 20, 2024

Already there's the addition of TPUs, tensor processing units