Home Assistance Voice & Ollama Setup Guide – The Ultimate Local LLM Solution!

A couple weeks ago I showed off this, the Home Assistant Voice Preview Edition, an ESP32 based smart speaker that if set up to use Whisper and Piper handles voice commands and home automation fully locally through Home Assistant. It’s amazing, but the one thing that’s missing is a proper conversation agent. Sure, asking it to turn your lights on, or set timers is useful, but being able to ask it questions and get answers would be really nice. That is where Ollama comes in. Ollama is an open source way to run large language models locally, and it turns out it’s ridiculously easy to set up and connect to Home Assistant Voice, so let me walk you through it, then demo just how useful it is.

First things first, you’ll need Ollama set up and running. You’ll need a reasonably powerful PC for this, ideally with a graphics card with lots of VRAM, although it can run just on the CPU relatively quickly, so long as you have enough system RAM too. In my case I already have two home servers running, one for work and one for personal use, so I’ll set it up on my personal NAS which is running the latest stable build of TrueNAS Scale – and that ‘latest stable build’ part is actually really important. Only in the last few months has TrueNAS finally migrated to using Docker containers as their “apps” manager. That means installing Ollama is as easy as clicking install in the app store, setting any parameters you might want like letting it use your existing storage pool if you’d prefer, and setting how many CPU cores to allow it to use, as well as how much RAM, and if you can pass a GPU into it too, then saving it. That’s pretty much it ready to go. You can open the container shell to set up and try out models right in the command line – and this does actually give you a bit better control too, although we’ll come back to that – but you can also just leave it alone now and head to Home Assistant.

Head to the devices and integrations page and add the Ollama integration. Put in the IP address of the Ollama server, and the correct port, then it’ll ask you what model you want to use. Here you can take your pick, although this is where the command line interface might be beneficial. If you don’t have much RAM, like my server did when I first set this up (I’ve since doubled to 32GB), you might find you need to run the smaller versions of models wherever possible. These models all generally come with differing parameter counts – the more parameters generally the better the responses, but the harder they are to run and certainly the more memory it takes to run too. From Home Assistant there doesn’t seem to be a way to install a particular parameter size version of a listed model, it’ll just download the default. If you need to pick a smaller one, you might need the command line interface anyway, which you can then run ollama run modelName:parameterSize – so Ollama run deepseek-r1:1.5b – or if you just want to download the model but not run it swap run for pull.

Once you’ve pulled the models you want, or just picked the default in Home Assistant, saving that will create an entity for that model in Home Assistant. To use that model with Voice, head to voice assistants, click on Local Assistant, then change the conversation agent to the ollama model you want to use. In the settings for that you can change the instructions given to the model, along with the context window size, max message history and the keep-alive time. Personally I’d recommend changing that from -1, which means permanently, to something reasonable like a few minutes, so it doesn’t sprawl all over your system’s RAM permanently. There is also a setting to let the LLM control Home Assistant, although you’ll find that it’s best to leave that set to “No control”, partially because a bunch of models don’t support tools – a required feature for the control to work – and partially because of the setting on the main menu here: “Prefer handling commands locally”. This means commands like turn on lights are handled by the Home Assistant agent, while questions it doesn’t know the answer to are passed off to the LLM. This means those commands are run faster and more efficiently, and it keeps the compatibility problem at bay too. 

Now, with that set up, we have successfully connected Home Assistant Voice to a locally run large language model! It’s remarkably simple, and yet works pretty well. “Okay Nabu, how long is an inch in centimetres?”

Responses do take longer than the built-in conversation agent because it’s now doing speech to text, generating a response from the LLM, then using text to speech to voice the answer, but considering this is all running locally I’m pretty happy with that! The one thing you’ll want to try out and consider is which model to use. While Deepseek-R1 is the new hotness, the R in the name may as well stand for ‘reasoning’ because the responses it gives contain an awful lot of text for a relatively simple question. Asking R1 what is an inch in centimetres spits out two large paragraphs of “thinking”, followed by a single sentence answer. If you could maybe filter out everything in the “<think>” tags that might work, but for me I just opted to use llama 3.2 instead. That seems to work well.

The only other thing I’d like to add to this is the ability to search the web, although that doesn’t seem like a simple addition, so I’ll have to put that on hold for the time being. In short then, setting up Ollama and Home Assistant Voice is remarkably easy, and the results, while a little on the slow side, are great. Let me know in the comments if you set this up, and how you get on with it!