GPU Server for LLMs, SD Pipelines, etc.
So, in experimenting with these neural network trained models, it has become apparent that either one must buy a lot of expensive individual GPU cards to achieve the core density and VRAM necessary for interesting experiments like training or local model services; or, one can set up a server with several slightly older model GPUs and GPU-Accelerators.
https://www.linkedin.com/pulse/7-best-gpus-deep-learning-ai-2023-ashwani-patel – Overview of some GPU cards, recommends Nvidia K80 and P100. Apparently P100 supports nvlink.
https://h20195.www2.hp.com/v2/GetDocument.aspx?docname=4AA7-3070ENW – General takeaway is that a z840, properly configured, can nvlink two Nvidia Quadro p6000s. This would result in 48gb vram, enough to run a 70b model.
Or possibly 2 x NVidia Quadro p8000 cards, as these have Tensor Cores and possibly ray tracing, facilitating higher order matrix operation capability.
NOTES (5/4/24):
The following hardware has been obtained for the purposes of a general server:
- HP z840 Workstation
- 2x e5-4669 v4 22 core CPUs
- 4TB SSD
- 1x HMABAGR7A4R4N-VN Sk-Hynix 128gb DDR4 LRDIMM 2666Mhz
- 1x 128GB Hynix ECC 2666MHz PC4-21300 LRDIMM HMABAGL7C4R4N-VN ECC Memory
Thus at this point, the matter of filling the remaining 14 ram module slots is in order. Due to human error, the part numbers of the RAM DIMMs do not match completely, and HMABAGR7A4R4N-VN will need to fill the odd numbered slots, while HMABAGL7C4R4N-VN will need to fill the even numbered slots. Either this or vice versa should satisfy compatibility requirements.