Neural Network Research (Random Notes on General Theory and Practice)
Regarding the underlying mechanisms of machine learning, matrix operations are the backbone of the mathematical framework in which all emergent phenomena arise. Something like a 'mathematical aether', this conceptual framework provides the necessary richness of context which patterns of a useful nature require to be defined. In contemporary technologies, dimension of the involved matrices is very high. Networks are arranged in various types and schemas, which have certain obvious properties and of course certain mysterious emergent properties. Complexity gives rise to these information tools in different ways; the most simple being the ones obvious from the mathematics employed to construct them.
Therefore, it is useful to research these mechanisms; and as such is true this post will reference a blog post on huggingface by TheBloke which summarizes the white papers invovled.
- https://huggingface.co/blog/gptq-integration - A description of quantization of large language models.
The following reference is in regards to GPTQ, GGUF vs EXL2 for use with Nvidia RTX Titan running LLMs.
- https://www.reddit.com/r/LocalLLaMA/comments/197o1f7/noob_question_if_i_have_the_vram_should_i_switch/ -- Discussion regarding quantization of LLMs in 24gb VRAM
Memory management, in CUDA is an issue when writing python scripts to utilize various audio functions along with running an LLM query. To that end, the following reference is included.
- https://medium.com/@soumensardarintmain/manage-cuda-cores-ultimate-memory-management-strategy-with-pytorch-2bed30cab1#:~:text=The%20recommended%20way%20is%20to,first%20and%20then%20call%20torch. -- CUDA memory management.
- https://pytorch.org/docs/stable/notes/cuda.html#environment-variables -- CUDA environment variables
- https://stackoverflow.com/questions/5971312/how-to-set-environment-variables-in-python -- Setting environment variables from a python script.
Specific information regarding model from Eric Hartford, and ollama model file parameter list
- https://erichartford.com/dolphin-25-mixtral-8x7b -- Includes info on how to create model file
- https://github.com/ollama/ollama/blob/main/docs/modelfile.md -- model file parameter list
General Considerations on Training, VRAM and GPUs
This section is simply a place to hold notes on the matters of training LLMs, how much VRAM per parameter size, model type and GPU architecture etc., since this subject is so deep and the sources are varied, disorganization is probable until the actual editing of an achieved workflow is attained. Therefore, the following links are included (mostly so I can close my browser tabs and not get frustrated when I go looking for that info in earnest):
- https://huggingface.co/docs/transformers/perf_train_gpu_one#anatomy-of-models-memory --
huggingface
article on memory requirements per model parameters/type/method of training used. - https://datascience.stackexchange.com/questions/117444/what-size-language-model-can-you-train-on-a-gpu-with-x-gb-of-memory -- Another article on the same as above
- https://www.quora.com/Is-it-possible-to-connect-multiple-GPUs-Titan-Pascal-and-Gigabyte-1050-Ti-on-an-HP-z840-workstation-If-so-what-are-the-connectors-and-cables-required -- info on z840 architecture for GPUs
- https://en.wikipedia.org/wiki/Quadro#:~:text=Nvidia%20has%20since%20moved%20away,A6000%20on%20October%205%2C%202020. -- One of the spec comparison charts used to determine which GPUs to aquire
- https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GRID -- Another list of NVidia GPUs
Why did the Nvidia GeForce card bring a ladder to the gaming party? Because it wanted to reach new heights in performance!