Uncensored AI: Dolphin-Mixtral FOSS

NOTE: Creating an ollama model uses space on sda.

Considering the usefulness of AI, and the commercial nature of our particular culture here in America, there is little doubt that the future will bring proprietary information in the forms most beneficial for those markets in which it is valuable. Basically, a free AI is a tool with a wider scope; and production can be applied to more niche endeavors with a freely trained model.

I first came across this video from FireShip

Dolphin-Mixtral LLM

From the video a single command is run to install ollama

Note that running this creates and enables an ollama service, so manual shutdown may be necessary at some point, or if the model is not going to be used regularly, disabling it after use and enabling for use would probably be a good idea.

$ curl https://ollama.ai/install.sh | sh

To run this model:

$ sudo systemctl enable ollama
$ sudo systemctl start ollama
$ ollama run dolphin-mixtral:latest

It needs at least 40GB of VRAM, however this can be split over several more affordable GPUs.*

*EDIT: It does not in fact need VRAM, but RAM; the insufficient memory error generated can be worked around by passing a parameter to a new model-variant (based on the model one desires to run) in a model file, as is explained here:

https://github.com/jmorganca/ollama/issues/1019 – post explaining how to work around memory error

Contents of Model File (dolphin-mixtral_latest.model)
FROM dolphin-mixtral:latest
PARAMETER num_gpu 0

Console command sequence:

# In ollama file directory
$ gvim ./dolphin-mixtral_latest.model
$ ollama create dolphin-mixtral:no-gpu -f ./dolphin-mixtral_latest.model
$ ollama run dolphin-mixtral:no-gpu

Ok this worked, now for some specific models

Used the LLM to write a few rudimentary TTS scripts. Found sound online services as well. Found a small low quality vocoder tts, wrote up a tts.py for it:

$ python ./tts.py example_N.txt

Where the above command and argument serve to have the quick tts read the contents of the example_N.txt file where N is some digit, usually the index of the latest file generated by the LLM.

  1. https://pyttsx3.readthedocs.io/en/latest/engine.html – pyttsx3 library

Ollama webui

Going to go ahead and try this out, of course in the ollama directory using the llmtest_env venv for python.

https://github.com/ollama-webui/ollama-webui

Some research on LLMs

Application Development (Neuvo)

  1. RLHF – Reinforced Learning from Human Feedback: https://huggingface.co/blog/rlhf
  2. ReAct Prompting “Yao et al., 2022(opens in a new tab) introduced a framework named ReAct where LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner. Generating reasoning traces allow the model to induce, track, and update action plans, and even handle exceptions. The action step allows to interface with and gather information from external sources such as knowledge bases or environments. The ReAct framework can allow LLMs to interact with external tools to retrieve additional information that leads to more reliable and factual responses. Results show that ReAct can outperform several state-of-the-art baselines on language and decision-making tasks. ReAct also leads to improved human interpretability and trustworthiness of LLMs. Overall, the authors found that best approach uses ReAct combined with chain-of-thought (CoT) that allows use of both internal knowledge and external information obtained during reasoning.” – https://www.promptingguide.ai/techniques/react

LangChain

A reference video for using langchain to embed a document and have an ollama model operate over it:

Installing and using models in various formats

  1. https://github.com/jmorganca/ollama/blob/main/docs/import.md – describes GGUF, .safetensors, etc.
  2. https://huggingface.co/TheBloke/WizardMath-70B-V1.0-GGUF/blob/main/README.md – instructions for joinging split gguf files from TheBloke.

Ollama

Here is a video describing the new features of ollama

Ollama stores its models in /usr/share/ollama/.ollama. Apparently, a symbolic link can be used to allow the data to be stored on another volume or external drive:

  1. https://github.com/ollama/ollama/issues/155 – describing a workaround to specify a different storage location

General


Review of Processing Unit Types (2023)

Leave a Reply

Your email address will not be published. Required fields are marked *