Rudimentary Chat Bot – OpenVoice, Whisper & LLM
Using ollama, and the dolphin-2.5-mixtral-8x7b model a simple script to test TTS and requests from Ollama server is being constructed. These are some reference notes.
- https://stackoverflow.com/questions/17657103/play-wav-file-in-python -- Playing Audio in Python
- https://moez-62905.medium.com/the-ultimate-guide-to-command-line-arguments-in-python-scripts-61c49c90e0b3#:~:text=In%20Python%2C%20command%2Dline%20arguments,arguments%20passed%20to%20the%20script. -- Using command line arguments within python script.
- https://ioflood.com/blog/python-run-shell-command/ -- Running shell command in python using subprocess.
- https://www.youtube.com/watch?v=1NonzlRr6JA -- Robot vocals
Okay this was successful and resulted in a script called askbot.py
which is run from a shell script called askbot.sh
.
Recording Audio for an undetermined time
This is the next obvious step, recording audio for an undetermined amount of time... specifically until the user signals that the audio is done being inputted, or preferably when silence is detected from the microphone. To that end, some research was conducted resulting in the following code:
- https://stackoverflow.com/questions/18406570/python-record-audio-on-detected-sound -- stackoverflow result for user defined recording interval
Okay, with a few minor changes, (Noise threshold to 30db and changing the recording path to /home/sparkone/sdb/projects/python/OpenVoice/recordings
) this script appears to be functioning quite nicely.
Passing audio files to whisper for speech-to-text transcription
With the audio recording functioning as desired, the next step is getting the recorded audio transcribed into text. Checking for a stop, quit or goodbye type keyword would should suffice for ending the script.
For this task, we can use whisper
, or faster-whisper
to transcribe the audio offline.
- https://github.com/openai/whisper -- whisper project on github
- https://github.com/SYSTRAN/faster-whisper -- faster-whisper project on github
- https://github.com/huggingface/distil-whisper -- dustil-whisper
In order to facilitate functionality of this project, a simple script which takes audio files in will be used initially, then incorporated into the main project. So far we have askbot.py
and record_3.py
(3rd attempt at generating a recording script).
Update: This worked, as a script. Final iteration of script askbot_3.py
.
NOTES: CUDA memory erros and segmentation faults are resulting from allocation of VRAM outside scopes of reservation. A potential solution is to delegate tasks to separate GPUs. Also upgrading GPU has become important due to this and latency issues. Ideally, a Quadro p8000 would be desired as it offers parallel memory sharing options in the current z series architecture I am using, however cost is prohibitive and a more economical option which does not offer parallel memory sharing in the z series architecture.
NOTE: See Neural Network Research Post for specific details regarding PyTorch, Ollama and CUDA
Segmentation faults seem to be occurring less, after adjusting model parameters and introducing garbage collection to the script. Additional measures include deleting local variables after use and cleaning cache.
It has become apparent that while the recorded audio is being processed, it is still picking up voice audio on the microphone, the recording stream needs to be closed before calling the ollama query and playback function.
In order to work around repeat answers and to speed up response time, perhaps utilizing the JSON stream approach would be a practical solution.
- https://github.com/ollama/ollama/blob/main/docs/api.md -- Ollama docs
- https://www.w3schools.com/python/python_json.asp#:~:text=Parse%20JSON%20%2D%20Convert%20from%20JSON,will%20be%20a%20Python%20dictionary. -- Convert JSON to Python Dictionary
Request:
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?"
}'
Response (Stream of JSON objects):
{
"model": "llama2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
See the above docs page for more options, such as runtime parameters and showing a model list.
Using GPTQ models
Using GPTQ models requires a bit more work than simply downloading and creating a model file as with .gguf. See the following reference:
- https://github.com/ollama/ollama/blob/main/docs/import.md -- Installing and quantizing gptq models
Apparently this just makes another CPU model out of a GPTQ model. The intention here to fully utilize an NVidia RTX Titan and it's 25gb VRAM. Trying something else...
This looks like it must be run in python, see below reference for a starting point:
- https://www.reddit.com/r/LocalLLaMA/comments/13b3s4f/how_to_run_starcodergptq4bit128g/ -- Run .gptq locally?
- https://huggingface.co/TheBloke/medalpaca-13B-GPTQ/discussions/3#:~:text=To%20load%20GPTQ%20models%20from,have%20the%20CUDA%20toolkit%20installed. -- Run GPTQ models locally (theBloke).
- https://github.com/oobabooga/text-generation-webui -- Oobabooga runs GPTQ
Okay so the oobabooga/web-ui
works for running models fast in the GPU. The issue of extensions is a WIP.
Oobabooga/text-generation-webui Extensions List
- https://github.com/oobabooga/text-generation-webui-extensions -- List of extensions available
Superbooga (SuperboogaV2)
Getting the following errors when attempting to load extension superbooga
:
01:45:50-406664 ERROR Failed to load the extension "superbooga".
Traceback (most recent call last):
File "/home/sparkone/sdb/projects/python/text-generation-webui/modules/extensions.py", line 36, in load_extensions
exec(f"import extensions.{name}.script")
File "<string>", line 1, in <module>
File "/home/sparkone/sdb/projects/python/text-generation-webui/extensions/superbooga/script.py", line 10, in <module>
from .chromadb import add_chunks_to_collector, make_collector
File "/home/sparkone/sdb/projects/python/text-generation-webui/extensions/superbooga/chromadb.py", line 1, in <module>
import chromadb
File "/home/sparkone/sdb/projects/python/text-generation-webui/installer_files/env/lib/python3.11/site-packages/chromadb/__init__.py", line 1, in <module>
import chromadb.config
File "/home/sparkone/sdb/projects/python/text-generation-webui/installer_files/env/lib/python3.11/site-packages/chromadb/config.py", line 1, in <module>
from pydantic import BaseSettings
File "/home/sparkone/sdb/projects/python/text-generation-webui/installer_files/env/lib/python3.11/site-packages/pydantic/__init__.py", line 374, in __getattr__
return _getattr_migration(attr_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sparkone/sdb/projects/python/text-generation-webui/installer_files/env/lib/python3.11/site-packages/pydantic/_migration.py", line 296, in wrapper
raise PydanticImportError(
pydantic.errors.PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.6/migration/#basesettings-has-moved-to-pydantic-settings for more details.
For further information visit https://errors.pydantic.dev/2.6/u/import-error
This error is apparently common on the user install regardless of the operating system used. The following version corrections worked:
./cmd_linux.sh ## Load the conda environment to select the correct pip
pip install chromadb==0.3.18 ## Select the compatible version of chromadb for superbooga
pip install pydantic==1.10.14 ## Select the compatible version of pydantic for superbooga
This worked.
NOTE: To use ingested data as context, the notebook mode must be utilized and the following tags employed:
"There is technically documentation for it, but you are right that it's not super obvious. Right under the superbooga header there is a thin pane with the text "Click for more information..." if you click on that it will expand and give you a somewhat detailed explanation of how superbooga works.
But I can summarize the key information. For one, superbooga operates differently depending on whether you are using the chat interface or the notebook/default interface. In the chat interface it does not actually use the information you submit to the database, instead it automatically inserts old messages into the database and automatically retrieves them based on your current chat message.
To query information you have submitted you need to change to the notebook/default interface. In those modes you can insert the following three special tokens:"
<|begin-user-input|>
<|end-user-input|>
<|injection-point|>
- https://github.com/oobabooga/text-generation-webui/issues/2824 -- Reference for superbooga data injection.
EmotiVoice (Decent Latency)
See:
- https://github.com/yhyu13/Emotivoice_TTS -- Installation instructions, after downloading the model using the UI
It looks like this one uses some sort of remote access, it would be preferable to use OpenVoice locally...
TODO: Find or write extension for OpenVoice, as it has decent latency as well
LucidWebSearch
This is an interesting extension which can use OCR models (e.g; nougat
) to retrieve and utilize web resources and optionally parse/generate markup containing mathematical expressions.
See:
- https://github.com/RandomInternetPreson/LucidWebSearch?tab=readme-ov-file -- Web Search and Math Search Extension
OpenVoice Extension
After using OpenVoice
in a script testing listening and STT/TTS around an ollama GGUF model, I desired to implement a custom extension utilizing it as well for oobabooga/text-generation-webui
. However I encountered the following errors after pip install -r requirements
in extensions/OpenVoice
:
INFO: pip is looking at multiple versions of wavmark to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following yanked versions: 2.0.0
ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement torchaudio<2.0 (from wavmark) (from versions: 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1)
ERROR: No matching distribution found for torchaudio<2.0
Not sure how to proceed from here... Is it possible to use 2 different virtual environments in the same running instance of the webui?
TODO: Check and determine whether this is possible/practical.
Context Window Research and Methods
The context window size is an important factor in the practical application of models, and to that end various articles are included as reference for research notes.
- https://medium.com/@ddxzzx/why-and-how-to-achieve-longer-context-windows-for-llms-5f76f8656ea9#:~:text=Once%20we%20have%20efficiently%20incorporated,fit%20the%20new%20context%20length. -- Article on RoPE and other context enlargement methods
Research References
- https://en.wikipedia.org/wiki/DeepSpeed -- DeepSpeed Wiki