Run your own ChatGPT at HOME!

Andrew McDonald | 12th June 2023 | Tech Explained | No Comments

Wouldn’t it just be so cool to run your own copy of a chat AI like ChatGPT at home? No interference, no data leaks, no privacy concerns. Sounds great doesn’t it? Well it turns out you can, and it’s ridiculously easy! Much to my surprise too, you don’t even need a 4090 – although it would help. I’ve been running this on a 2080Ti, but really anything with 11GB or more of VRAM should just about scrape by. The more VRAM (and compute power), the better though! In theory you can also run this on your CPU alone if you have enough system memory, or on an AMD card too, but that’s a lot less stable thanks to just how ‘bleeding edge’ these tools are so your mileage may vary.

So how do you do this then? Well, much like Stable Diffusion and the Automatic1111 web GUI, there’s a web GUI tool on Github that makes this whole thing ridiculously simple. I’ll link to everything in the description so you can try this yourself, but the main thing is that this is all based on lm-sys Fast-Chat, and their frankly amazing Vicuna model. I’ll explain more about the models in a bit, but this web GUI repo just needs to be downloaded, then run the setup batch file – assuming you are on Windows. You run the “start_windows.bat” file and it’ll open a command prompt window. It will ask you what GPU you have – NVIDIA, AMD, Apple M1 or CPU – I had some bugs on the CPU only mode so that might not be ready yet, but the NVIDIA option I can confirm does work. It’ll download all the packages and files you’ll need, then run the web UI. You can find that at localhost:7860 – although if you want to run this on a different system but want to access it from say your phone, you’ll want to open the webgui.py file in a text editor and add “–listen” to the CMD_FLAGS line.

Once you have the WebGUI open, you’ll need to download some models. Now this is where things get interesting, as there are actually a whole load of different models – much like Stable Diffusion’s wide variety of models. The one I want to focus on here is called Vicuna, which got its start as Facebook’s LLaMA model and has been trimmed and tuned considerably to the point where training the model from scratch only costs around $300. There are others, like Stanford’s Alpaca model, GPT-J and a bunch more, but Vicuna, specifically the 13 billion parameter model is what I’ll be using here. Now one of the quirks of this whole large language model business is that the models, officially, aren’t what you’d call ‘turnkey’. Lm-sys – the open research collective from UC Berkeley who made both Fast-chat and Vicuna – say you should request the original LLaMA weights from Meta, then run a conversion with their delta to create Vicuna. That’s a rather memory intensive process with the 13b model requiring 60GB of system RAM to do that. Now I did actually try and do this, but despite filling in the form to request the weights from Meta, they never replied.

Happily though, some lovely people have uploaded pre-converted models to HuggingFace, the go-to location for all your AI model needs. All you’ll need to do here is copy the name of the repo and head to the “Model” tab, then down at the bottom paste the name and click download. Assuming you have enough VRAM to load the model, you can then select the model from the drop down and load it in. You’ll want to select 4 wbits and a group size of 128 as that’s what this model is expecting – it’s literally in the repository name. If you don’t have enough VRAM to run that, the 7b model might be a better fit as that should run with around 8GB of VRAM or more.

Ok so now the model is loaded in, we can basically get chatting. This web UI has a whole load of options I want to quickly run past you. The text generation page is obviously where you’ll chat to the models, with options like regenerate, impersonate, or clear history. The chat modes let you customise the style of the interaction, and down at the bottom is the character gallery. This is where you can basically give the AI an image and personality to follow. You probably see where this is going. The chat settings page is where all that happens, although the “parameters” tab is the one I’m more interested in. The “max_new_tokens” slider essentially controls how long your replies are, so you’ll want to bump that up. You’ve got a bunch of generation parameters, of which you can use presets to set, like the LLaMA-Precise preset.

The “Model” page is where you will have downloaded your model, but also includes a load of options for GPTQ and transformers. Training is really cool, basically you can train the model yourself – something I want to try and do in a future video with a more powerful GPU. And finally there’s the interface mode. You can add a load of extensions here like elevenlabs text to speech, google translate and even OpenAI themselves. You can also tweak the command line and launch parameters here too – like adding the “listen” flag to make the tool available on your network.

Ok, so that’s the UI, let’s try out the chat bot! Asking it basic questions or for a simple output like a professional email telling your boss to stop micromanaging you, that all works pretty well. If the max tokens limit is set too low you might need to press the continue button to get it to finish generating the response, but as you’d expect it’s a pretty decent output. If we compare side by side what Vicuna and ChatGPT – just the free version too – replied with, you’ll notice that ChatGPT is much more willing to write a lot. The volume, and if I’m honest quality, of text is a fair bit greater with ChatGPT – although both do serve a usable response. I found that with this request in particular, ChatGPT gave a much more natural sounding email, whereas Vicuna just outright said the word “micromanaging”. It’s not bad by any means, but I can see why you might prefer ChatGPT for this one.

For a bit of a tougher challenge, I thought since we’ve heard so much about how ChatGPT can pass things like the bar exam to give legal advice, or how it can pass medical exams which means you can trust its medical advice – so why not ask it some medical questions? Well I looked up some UK medical exam questions and asked both Vicuna and ChatGPT. The first is a question about medical consent – “An 89 year old man has a basal cell carcinoma on his forehead which requires excision. He has dementia. The clinic nurse feels he is not competent to give consent for surgery. Which is the more appropriate action for obtaining consent?” You get multiple choice answers here, including, “ask a psychiatrist to assess his cognitive function; ask his GP to sign the consent form; ask his wife to sign it; ask the patient; or ask the surgeon to assess his mental capacity”. All of these are designed to sound reasonable if you don’t know the answer – so what did Vicuna say? It said get the GP to sign it, or witness signing, because the GP has knowledge of the patient’s medical history and can provide context regarding their current condition. Ok, sounds reasonable, what about ChatGPT? It said to get his wife to sign the consent form as a surrogate decision maker.

Ok, so what’s the answer? E, ask the surgeon to assess his mental capacity. The clinic nurse suggested he might not be fit to consent, not that he is, but both Vicuna and ChatGPT decided he wasn’t competent – although Vicuna did get a little closer by saying the GP would have the context to know better. It’s still wrong, but it’s arguably closer than ChatGPT.

Let’s try another. The next question is about a 21 year old woman who has increasing severe pain in her left leg. She broke it and was put in a cast 2 days ago, so what do? Elevate the limb? Refer to orthopaedics? Remove the cast? Replace the cast with increased padding? Re-X-ray the fracture? Vicuna says elevate the limb to reduce swelling and then to refer to orthopaedic specialists. ChatGPT says she should go straight to orthopaedics for further assessment. The correct answer is option C, remove the cast, because she is suffering compartment syndrome. Now while both of these bots got the answer wrong, ChatGPT did actually call out compartment syndrome as a possible diagnosis. It said the specialists should be the ones to diagnose that though, which I don’t think is a bad thing to suggest.

Ok, one last one, about a 14 year old girl who is requesting the pill as she is already active with her boyfriend. Her parents are unaware of the appointment. Should the doctor offer advice and prescribe the pill, contact her parents, contact the local safeguarding team, contact the police, or explain that it’s illegal to prescribe her the pill? Vicuna spits out a whole load of text here, saying that while it would normally be illegal, in this case prescribing the pill is allowed and the best course of action – and perhaps getting her tested too. Finally, a fully correct answer! ChatGPT also gets it right, and with what sounds like genuine care and compassion in the lengthy explanation too. I know it isn’t but it’s interesting to see.

So first off, no you shouldn’t trust any of these bots for medical, legal or really any advice. They are not trained professionals, and shouldn’t be treated as such. Secondly, I think it’s pretty clear Vicuna isn’t as adept as ChatGPT is. Perhaps there is some more tuning to be done on the text generation webui settings to get a bit better answers, but I’d say it’s still a little while off yet. I think it’s pretty clear who this text generation web UI is made for – it’s clearly meant to be a virtual friend style interface, rather than a ChatGPT clone. There’s definitely a few features I’d like to see like better chat history and handling. You can download the chat history as a json file then upload it later to get back to an existing chat, but it’d be nice if that was a bit more automatic.

The models themselves will improve though, and there are plenty of other models to try too so you might find something works a little better than Vicuna, but it is going to be hard to compete with OpenAI’s billions of dollars worth of compute time. Still, it’s really cool to be able to run this on consumer hardware – and not even the absolute top end stuff too. If you do give this a try, let me know what you get up to in the comments below!

Tags:chatgpt, fast chat, llm, self hosted, vicuna

Related Posts