Best GPUs for self-hosted AI?

Trustworthy_Fartzzz@alien.top · 10 months ago

Best GPUs for self-hosted AI?

maggio@discuss.tchncs.de · 10 months ago

My friend did this with a RTX 3060 12GB, and documented the process in this Octopusx blog post

If you have any questions we’d be happy to help

tehnomad@alien.top · 10 months ago

The best consumer NVIDIA card is the 3090ti because of its 24GB memory, so you can run bigger LLM models. I have a 3060ti 12GB which works pretty well with 7B and 13B LLM models.

Aw3som3Guy@alien.top · 10 months ago

Don’t have direct experience with either, but:

It’s my understanding that a corral tpu is exclusively an inference accelerator, no training or more generative applications. Also, corral TPUs are a little bit unobtainium, with the only options I’ve seen behind scalped about as much as a pi, to basically the same result.

I think you’re overthinking the nano a bit. I’m not sure that you’d need explicit support for the nano, because it’s just a cuda gpu and so it should^TM just run anything cuda, as long as the arm cpu doesn’t trip the software up . For example, I’ve seen people running blender renders across a cluster of jetsons, just because, and I doubt that blender has any explicit support for jetsons.

If you’re coming at it from the sense that you have rack space to spare, a used Tesla / Quadro gpu would probably be better value than a jetson nano OG, because those were I think 2GB/4GB and 256 Kepler era cuda cores. You’d almost have to go out of your way to find a worse PCIe card, plus a normal PCIe card in a normal x86 server wouldn’t have arm software restrictions. Although as the other commenter mentioned, cooling/power draw is a more serious consideration for a PCIe card, plus the risks of buying used.

seanpmassey@alien.top · 10 months ago

It depends.

What is your budget? And what hardware/hypervisor do you have?

And what specifically are you looking to do with “generative AI?” Ugh…I hate that term.

There are two key things to keep in mind about rack-mount GPUs. First, you need servers that are specifically built to host most GPUs in the factory. Almost all of NVIDIA’s server-grade GPUs are passively cooled, so the servers need to have a fan configuration to cool the GPUs. And except for the lowest end server GPUs (P4/T4/A2/L4 - all Inference cards and over $1000 per card) which draw less than the 75 watts provided by the PCI slot, all of the GPUs require at least 150 watts, molex power connectors and higher wattage power supplies.

And most of the drivers and docker/kubernetes plugins for these GPUs are locked behind NVIDIA licensing.

You’d want something that is at least Pascal-generation, but the Turing or newer cards are better.

Your better bet is to get a rack-mount workstation (which is basically a server anyway) and stick a higher-end Quadro or GeForce 30x0 card in there.

Edit: I never answered what I have - an R730 factory built for GPUs with a pair of Tesla P4 cards. I originally built it to play with GPUs for VDI.