Decent News for August

Erik Hermansen August 3, 2025

Why Choose Local LLMs over Cloud?

Every engineer with at least five years of experience stops expressing their technology decisions as the "best". We don't say "Linux is the best operating system" or "PostgreSQL is the best database" or "Rust is the best programming language". That's kid stuff.

We say "it depends" or "choose the right tool for the job".

With LLMs, we should, of course, have the same nuanced, professional thinking. Rather than insisting on always using the model that scores the best on this week's benchmarks, we should be thinking of what will work best for our software's use cases. And we can enlarge our awareness beyond that to consider things like hosting costs, ethical constraints, and our business model.

So I won't scream "local LLMs are the best" at you. But I will whisper a little secret in your ear...

Local LLMs are the right choice for many apps. And people don't seem to realize it!

And you'll swat me away from your head, because it's really annoying to have me whispering directly in your ear. I'm sorry. That would only be appropriate if we were romantically involved or on a grade school playground.

When would you want to use a local LLM?

You don't want to pay for LLM hosting costs, but you want the ability to expose your app to large numbers of users. Local LLM hosting is free and scalable, because there really isn't any hosting to it (unless you count the user's laptop as a host).
Your app is privacy-focused. A local LLM app doesn't need to send user data in prompts to a service, because the LLM inference runs entirely on-device.

The downside of using local LLMs?

The most capable foundational models aren't available. A seasoned ChatGPT, Claude, or Gemini user will sense pretty quickly that a chat conversation with device-grade Llama or Deepseek is not matching the standards they are used to.
Varying hardware capabilities mean different local LLMs will work from one user to the next. As an app developer, you'll have a dilemma of aiming for models that more users can run or the more capable models that less people can use.

This second problem has been on my mind for a while, and I've put a lot of work into the create-decent-app tool to alleviate it.

Create Decent App 2.3 - Picking the Best Model

This has been a twisty problem to solve, but I think we've got the solution.

Let's say you're a developer using Create Decent App to generate project source for a new web app. Run `npx create-decent-app`, and your brand new project source is ready for you to hack away at. More info on that here.

The CDA-generated source will default to using Llama 3.1 8b. Is that because Llama 3.1 8b is the best? No, I already told you we don't think that way! It's just a default. Any of the other 130+ models available via WebLLM can be used.

As an app developer, you might want to just stick with one model. The prompts you write that your app executes will likely need to be tested and tweaked to work well with each supported model. I know that even what seems like a minor model change, e.g. Llama 3.1 to Llama 3.2, can cause an app to start behaving poorly because prompt-model coupling is real.

By the way, this is why CDA doesn't let the user choose any model they want when running your app. You need to specify which models you think your app will work with. And that might require per-model customizing your prompts.

But you'll have some users that have less or more video memory. And if your app supports a 1 GB model, that's going to bring in more potential users. Also, there might be users who prefer a smaller model that trades off accuracy for running faster.

Once you've specified which models are supported in your app with an edit to a configuration file, there's a lot of neat automatic behavior in CDA 2.3-based apps. When the user arrives at your app, a pretty good "auto-select" guess is made on which model will work best for them. The algorithm is a little complex to explain (you can see the source code here), but it takes into account device capabilities, past loading success/failure, performance, and user-specified preferences.

CDA will also warn the user in advance of any predicted problems loading a model and offer them alternative choices from your supported model list if you've specified more than one model. You can even mark models as "beta" if you don't yet have full confidence in them, but don't mind making them available to your users.

The video below is eleven minutes of me demoing CDA 2.3.

If you'd like to try out the latest Create Decent App, don't contact us for more information. Don't schedule a sales pitch. Don't download anything. Just uh... go to your terminal and type "npx create-decent-app". If you're a web app developer, those instructions are probably going to be all you need.

And you can contact me if you want to. I wasn't being unfriendly. I've just always hated it when I needed to contact somebody before I could try their software.

-Erik

Why Choose Local LLMs over Cloud?

Create Decent App 2.3 - Picking the Best Model

You should also read: