Hongkiat: Running Large Language Models (LLMs) Locally with LM Studio. “Running large language models (LLMs) locally with tools like LM Studio or Ollama has many advantages, including privacy, lower costs, and offline availability. However, these models can be resource-intensive and require proper optimization to run efficiently. In this article, we will walk you through optimizing your […]
Not sure if you have noticed it: Google has released Gemma 3, a powerful model that is small enough to run on local computers.
https://blog.google/technology/developers/gemma-3/
I've done some experiments on my Laptop (with a Geforce 3080ti), and am very impressed. I tried to be happy with Llama3, with the Deepseek R1 distills on Llama, with Mistral, but the models that would run on my computer were not in the same league as what you get from ChatGPT or Claude or Deepseek remotely.
Gemma changes this for me. So far I let it write 3 smaller pieces of Javascript, analyze a few texts, and it performed slow, but flawlessly. So finally I can move to a "use the local LLM for the 90% default case, and go for the big ones only if the local LLM fails".
This way
- I use far less CO2 for my LLM tasks
- I am in control of my data, nobody can collect my prompts and later sell my profile to ad customers
- I am sure the IP of my prompts stay with me
- I have the privacy to ask it whatever I want and no server in the US or CN has those data.
Interested? If you have a powerful graphiccs card in your PC, it is totally simple:
1. install LMStudio from LMStudio.ai
2. in LMStudio, click Discover, and download the Gema3 27b Q4 model
3. Chat
If your graphics card is too small, you might head for the smaller 12b model, but I can't tell you how well it performs.
Did a few coding experiments with Gemma 3 local on lmstudio. So far it performs flawless (in terms of capability - on my lowly Geforce 3080ti it is fairly slow, something like 5 tokens per second. But I've got time, and it is mine, running locally, no Billionaire's corporation sees my prompts.
For me (privacy nut) this is a big thing, not having to use ChatGPT for everything.
Whoohoo! #LMStudio lets me run the big Gemma 3 27b, 4-bit quantized locally on my slightly old gaming laptop with Geforce 3080 TI. It is slow, but my first tests show it is indeed fairly powerful. Not up to the reasoning models for reasoning tasks (duh!), but way above everything else I could run locally before (Llama 3).
A big step towards keeping your data private and still enjoy the services of a great LLM.
In this video, Ollama vs. LM Studio (GGUF), showing that their performance is quite similar, with LM Studio’s tok/sec output used for consistent benchmarking.
What’s even more impressive? The Mac Studio M3 Ultra pulls under 200W during inference with the Q4 671B R1 model. That’s quite amazing for such performance!
Your data. Your computer. Your choice #devonthink #chatgpt #claude #gemini #mistral #ollama #lmstudio #gpt4all #comingsoon
AI just got faster!
Speculative Decoding speeds up LLMs by 2x–3x using a draft model + verification model approach. Faster AI responses
Retains accuracy
Integrated in LM Studio 0.3.10
Read more:
https://medium.com/@omkamal/accelerating-llm-inference-with-speculative-decoding-using-lmstudio-393d6befbe25?sk=d60eae40262434cbd002762f11ca8f55
Lokal installierte KI-Tools: Datenschutz und mehr
https://video.medienzentrum-harburg.de/videos/watch/725ac37f-ed75-4922-810d-dc12fbd9ebd8
Bonus: LM Studio: Pourquoi faire tourner DeepSeek-R1 en local?
https://cryptrz.org/wordpress/2025/02/05/bonus-lm-studio-pourquoi-faire-tourner-r1-en-local/
Evaluating the Performance of LLMs: A Deep Dive into qwen2.5-7b-instruct-1m: https://lttr.ai/AbFz6
Reviewing DeepSeek-R1-Distill-Llama-8B on an M1 Mac
▸ https://lttr.ai/AbDLG
Surprised how relatively quick some DeepSeek models have run on my older M1 Mac Mini. This would tax my NAS far too much but the Mini is on all the time anyway. Securely running this becomes the problem now.
I’ve been testing DeepSeek-R1-Distill-Llama-8B on my M1 Mac using LMStudio, and the results have been surprisingly strong for a distilled model.
Read more https://lttr.ai/Aa4FO
Of course this AI is not really saying anything new, but sometimes saying obvious things clearly can also be of value.