Home Assistant Voice Review – Local AI Assistant (Alexa / Amazon Echo Replacement)

This sleek and stylish little box is the Home Assistant Voice Preview Edition, a replacement for your Amazon Echo that runs fully locally (if you want anyway) and so long as you’ve got Home Assistant controlling all your smart home gear already, can control everything in your house with just your voice. This little thing costs just £48 – or £55 if you buy from the Pi Hut – and is fully open source too. You can build it for just $13 if you want to DIY it, although most will want to just pick up this assembled and ready-to-use version for sure. Let’s take a look at the hardware, then I’ll walk you through the setup and tweaks, and finally take a look at what this thing can do.

Hardware wise this is both pretty simple and wonderfully complex. Physically you get a thin square box with two mic holes on top, alongside a button and a rotary dial that feels beautifully reminiscent of the iPod classic’s scroll wheel, except this one has a tactile actually spinning wheel. On the side you’ll find the mic mute switch with the big obvious red indicator to show it is in fact muted, and on the other side you’ll find the USB C port needed to power the thing – a 5V 2A power supply specifically although I’ll get back to that – and a 3.5mm jack for an external speaker, should you want one. On the bottom you’ll find a punch out for a grove port – Grove is Seeed Studio’s open platform for external modules, their website has over 200 external modules you can pick up, everything from GPS modules to formaldehyde detectors – yes genuinely there’s a formaldehyde sensor and amazingly it’s out of stock! Anyway, this is the first hint that this thing is an open source bit of kit. Taking the four push-in rubber feet out and removing the four screws, you’ll be able to take the top plate off. From here you can see the two microphones in their silicone covers and the USB selector switch, connecting I have to assume the data lines either to the ESP32 or the XU316 which is the AI coprocessor onboard.  

A look at the back of the board reveals the tightly packed components that make this thing work. You’ve got the grove port on the right – the i2c version specifically – the ESP32-S3 that runs the show in just to the left, and the XMOS AI chip above that. To the left of that is the 3204AIC low power stereo codec from TI that can either drive the built in speaker that ejects sound out the top side opposite the USB port, or can drive the external speaker via the headphone jack. The most amazing thing here has to be the two fully labelled headers available at the bottom of the board. The one in the middle offers 3.3V, 5V, four IO pins and TX and RX for the ESP32, and the one on the bottom left has ground, two RGB pins and five more IO pins from the ESP meaning if you really wanna hack this thing they’ve left you a lot of room to do just that – not just the grove port. This amount of not just user-focused design, but maker focused design is incredible and is a sight for sore eyes when it comes to this sort of hardware. Imagine actually owning your own hardware to the point you can fix it and tweak it to your heart’s content… What a world that’d be huh. Well this is one small step towards that, and I’m here for it. 

One last thing I wanted to mention on the hardware front is the power situation. The board is specifically listed as needed a 5V 2A power supply – both on the website and on the device’s sticker itself – and initially that’s what I was powering it with, but I wanted to do a quick power measurement and for some reason my USB PD power supply I was using to power the board just fine wouldn’t power my little USB power meter, so on a chance I plugged in a regular USB cable from my PC and to my surprise the board powered up just fine. Even when listening for the wake word and responding to commands I never saw it break 250mA, with the majority of the time spent between 100 and 150mA at 5V. I suspect the 2A supply would be needed for driving an external speaker or if you have other devices hooked up, but for regular operation, at least in my little tests, you don’t need much to drive this thing.

So you’ve got your board powered up, what now? Well the first thing to do is head to the Add-On store and install Whisper and Piper, with the former being the local copy of OpenAI’s speech to text model, and the latter being Nabu Casa’s own text to speech model. With those running, you’ll then need the Home Assistant app on your phone and bluetooth and location services enabled (the latter may be specific to Android, I don’t own any Apple products so I’m not sure) it’ll show up as a discovered device in the devices and integrations menu. You can run through the setup process, which involves you saying the wake word twice, that being “Okay Nabu” by default, although “Hey Jarvis” and “Hey Mycroft” are both options as well, and then setting what voice you want piper to use, naturally I picked Alba, the Scottish woman. 

Now the downside to having a wealth of customisation is that something like this doesn’t work all that well straight out the box. There’s a list of accepted sentences you can use, things like turn a light off, activate a scene, timers, and even delayed actions like turn on a light in 5 minutes, but to be clear this isn’t a full AI assistant, at least not by default. The biggest catch I found was when trying to interact with the various entities like lights, because things like my Hue lightbulbs aren’t just called “bedroom hue”, the actual entity ID is phillips lta001 5dc53d09 level light color on off. Catchy, right? Even the more friendly name bedroom.HueAmb.Bulb light isn’t really usable, and for some reason even the more easy ones like Elgato Key Light and Elgato Ring Light it really just couldn’t work with. That is until I went through and added aliases to the entities I know I want to control with voice. It’s actually pretty quick and easy, just head to voice assistants, click on the exposed entities list, then pick a device you want to access easily – in my case let’s say my bedroom temperature sensor – click on it and add an alias. For me that’s “bedroom temperature”, then “bedroom humidity”. That’s it. Now when you ask it, “Okay Nabu, what’s the bedroom temperature?”, it should respond. 

Of course, the speed at which you get a response – and the accuracy of the speech to text – will all depend on what hardware you are running this on and which Whisper model you are using. I’m using Base EN on my Threadripper system, with the HASS VM having I think 4 cores available to it. I tried running medium but it just won’t run, and small EN was maxing out all four cores with no remorse and ended up taking something like 30 seconds to respond, so base EN it is. That actually works pretty well, although the more Scottish I get the worse it performs. In my Anglicised accent, “Okay Nabu, turn studio light 1 off”, that works well. In my Glaswegian accent… well let’s see: “Okay Nabu, turn studio light 1 on”. Now as of writing this script – sorry let me turn that off – now as of writing this script I have no idea what this will do when I actually film this, so if it worked, amazing – it did a few times during my testing – and if it didn’t, well there you go, much like voice activated elevators, this is racist against the Scottish people (remind me to link to the comedy sketch I’m referring to there in the description…). Either way the slower and more clearly you can speak to it, the better the results you’ll generally get with it – plus naming your entities or aliases well, which as it turns out means writing out numbers in word form, helps a hell of a lot. 

Something like a timer is actually really cool too – “Okay Nabu, set a timer for one minute”. Once it’s set, the LED ring lights up and counts down, and you can actually set more than one, and add names to them which is really helpful for cooking just add “for pizza” or similar to the command – but moreso you can then modify them, or even just check on them. “Okay Nabu, how long left on the timer?” You can then add or remove time, or just cancel it if you don’t need it anymore. How cool is that? Should you want to know the weather, well as long as you’ve exposed a weather entry it will happily tell you – “Okay Naub, what’s the weather?” The only problem here is that even though the entity it is pulling that information from has a forecast built in, only the current weather is what actually gets exposed to the assistant, meaning if you ask, “Okay Nabu, what is the weather tomorrow?”, it’ll fail to find a device called “weather tomorrow”. It can be very literal, which as a rather literal autist is quite funny to me, but still. There is a script available from the Home Assistant forum that should help, but I haven’t had much luck getting it to work personally. 

As I said earlier, this isn’t a full LLM assistant. It’s limited to the built in sentences and commands it knows what to do with. If you ask it a question like, “Okay Nabu, how long is one metre in inches?”, well it can’t help. You can hook this up to an LLM, either cloud like ChatGPT or local with Ollama – with the latter something I am looking to set up soon once I stick a higher VRAM GPU in one of my two NASes, so if you’d like to see a guide on how to set that up let me know in the comments – but out of the box it’s a little more limited – but at least if you opt for the Whisper and Piper option rather than using Home Assistant Cloud, it’s fully local. The XMOS chip is what is doing the wake word detection, and your HASS server is doing the whisper, conversation agent and piper work locally too, so no one needs to know how many times you turn your lights on and off but you. 

One final thing I want to point out is that it isn’t just the hardware that’s open. The firmware is just ESPHome, with the full source code being available in their Github repo. It’s long, a nearly two thousand lines of code, but it’s all there. If you want to mess with it, add Grove devices, or just see how it works it’s all here, out in the open. To be able to access literally the whole device – the circuit diagrams, the case files, and yeah the entire source code is amazing. It’s what I’ve done with my open source tools, but to see a bigger company like Nabu Casa do this too is fantastic – although with the FOSS Home Assistant being their primary product I can’t say I’m surprised, just pleased. This is a device you truly own, for a smart home solution you equally truly own, so yes this obviously gets a glowing recommendation from me. It isn’t perfect for sure, it’s a little rough around the edges, and especially running it all locally comes with some hardware challenges, but even having the chance to have a fully local (and therefore private) voice assistant, even if it’s just to control your smart devices, is amazing. I’m very pleased with this thing, and if you have Home Assistant set up, I highly recommend you pick one of these up. 

  • TechteamGB Score
4