Local AI Model Comparison (Updated)

I tested several local AI models and mapped my thoughts in this chart. It may not be definitive, but it reflects my experience. Msty is my #2 choice. Ollama and PageAssist are my new picks as a combined solution. It just works. It’s simple, reliable, and supports knowledge stacking.

Security wasn’t my focus, but then, these models to run locally on my machine and with web search off.

Did You Know?

Sign up for TCEA’s AI Amplified Educator Accelerator, a totally online learning opportunity for K-16 educators. You get six months of access via BoodleBox Unlimited to learn the skills you need to build bots (a.k.a. similar to Perplexity Spaces, OpenAI Custom GPTs, Google Gems, etc.). It’s only $149 and you get 22 hours of CPE credit, digital badge and certificate.

The Chart

I wouldn’t use this chart as a REAL determination of which is the best, but it DOES reflect MY experience, including options I hope to take advantage of or test.

Things I Wanted

What I’m looking for in a local AI model platform is simple:

Easy to add/remove large language models (LLMs) that work on my machine
Installation is simple, easy, and doesn’t require a lot of extra add-ons and set ups (e.g. Sanctum required these additional DLLs)
Models that are loaded run fast, and don’t freeze mid-stream, leaving me hanging (e.g. LM Studio did this again and again…after awhile, you realize it doesn’t matter easy the installation of this tool was if it freezes)

No doubt, some of the issues are due to my own inexperience with these tools. I apologize in advance to the developers of these awesome free, open source platforms. This is meant to capture my burgeoning understanding, and my first impressions. This chart may change in the future as my understanding grows.

Things I Liked

Some of the things I liked, for example, were how Msty combined all the best of the tools into one tool, and a little more (e.g. Knowledge Stacks, Workspaces, easy access to models no matter where the GGUF file (these are the models in one file you get from HuggingFace) might be on your computer).

I also liked the fact that Ollama ran super-fast on my machine with the models I tried, which included:

Llama 3.2b
DeepSeek
Gemma

What’s more, Ollama offered a web interface, but I didn’t spend a lot of time on it. The lack of a GUI, especially after exploring Sanctum and LM Studio, left me wanting more. But I could see how someone would like Ollama.

Off the Well-Trod Path

Ok, for a bit of speculation and possibly fictional reading.

Here’s an analysis from DeepSeek via Msty (there was a lot more analysis and consideration using the PRISM Framework but I’ll spare you…suffice it to say, DeepSeek a great job and didn’t freeze once. I will share the custom instructions, er, megaprompt, in a separate blog entry):

AI Generated Analysis

The PRISM Framework evaluates the four AI model management tools based on Privacy, Robustness, Security, Integrity, and Simplicity:
- MSTY and LM Studio cater to both technical (developers) and non-technical users by offering a range of features from advanced management options to user-friendly GUIs.
- Ollama is specifically designed for developers with a focus on privacy and security, avoiding cloud reliance.
- Sanctum distinguishes itself through its emphasis on physical access controls, making it ideal for environments where data control is paramount.
MSTY:
- Privacy: High due to privacy-first design and offline execution.
- Robustness: Robust with a user-friendly interface supporting multi-AI interactions; potential robustness may vary based on complexity.
- Security: Secure with prompt management tools, but lacks encryption for local devices compared to Ollama or Sanctum.
- Integrity: May be affected by diverse model outputs from different AI sources; need to address consistency and bias issues.
- Simplicity: High with organized features like chats and file attachments.
Ollama:
- Privacy: Strong through local execution and no cloud reliance.
- Robustness: Robust via Docker support for deployment, though may face issues with model sensitivity or dynamic loading.
- Security: Security is enhanced by avoiding cloud solutions but lacks explicit encryption compared to Sanctum.
- Integrity: Integrity relies on well-maintained models; potential inconsistencies between environments (local vs. online).
- Simplicity: Moderate due to REST API requiring some technical know-how for deployment.
Sanctum:
- Privacy: High with extra keycodes ensuring secure execution.
- Robustness: Lightweight design suggests good performance, though limited complexity handling compared to other tools.
- Security: Strong via encryption and access control; minimal cloud reliance enhances security.
- Integrity: Privacy focus may hinder data sharing for accurate outputs but ensures authorized executions.
- Simplicity: Low due to command-line interface suited for technically inclined users.
LM Studio:
- Privacy: Moderate with a user-friendly GUI and no advanced features; lacks encryption as other tools.
- Robustness: Fragile under heavy usage or interface crashes, though intuitive for setup.
- Security: Low due to limited technical controls without explicit security measures beyond the interface.
- Integrity: Consistent if models perform reliably but may vary with intensive tasks.
- Simplicity: High as designed for accessibility without deep configuration.
Each tool balances different aspects based on user needs, from simplicity and ease of use (LM Studio) to advanced privacy and security features (Sanctum). The choice depends on specific requirements regarding technical controls, performance robustness, and intended audience.

Experimenting with Long Prompts

I tried experimenting with megaprompts (a.k.a. long prompts) that told the AI how to behave. These worked, but they slowed down models like Llama and Gemma to a crawl. Gemma gave me quality results using Ollama, but surprisingly, the results from several models were nearly identical. That was a bit of a surprise given my work with tools like ChatGPT and Claude. Obviously, comparing my desktop-based local AI models to those is an exercise in disappointment.

Future Growth

As I reflect on these, I see that more complex models trade efficiency for accuracy, and require more resources. Is it even worth running these models on a desktop computer, as opposed to a server, due to the requisite resources needed? I suppose that the real issue is that AI won’t be all that valuable or worthwhile unless I’m using a cloud-based solution with the corresponding impact on the environment.

I am pleased with the use of megaprompts with Msty and DeepSeek. I’ll have to experiment further, and you can see the results above…what do you think, are they accurate?

Wait, Wait, Here’s a Rubric

I thought it might be fun to have it generate a rubric…”This rubric provides a structured approach to evaluating privacy aspects, ensuring that all critical factors are considered in the assessment process.” I’ve stripped it down a bit. DeepSeek reminded me of Claude, going off on tangents and overthinking things.

Criteria & Weight	Description of Criteria	Score: Excellent
Privacy (40%)	The extent to which the AI tool protects user data from unauthorized access using encryption, access controls, and anonymization techniques.	Comprehensive protection with strong encryption and user control.
Robustness (30%)	The reliability and stability of the AI tool in handling various types of data inputs, ensuring consistent performance even with noisy or incomplete data.	Handles all data inputs reliably without errors or instability.
Security (20%)	The extent to which the AI tool and its provider protect against potential vulnerabilities, including susceptibility to cyber-attacks by unauthorized parties.	Shows advanced security features with minimal risk of attacks or vulnerabilities.
Integrity (15%)	The accuracy and honesty of the AI tool’s output in terms of data processing, detection of errors, and prevention of biases or manipulations.	Outputs highly accurate results consistently without detectable errors or biases.
Simplicity (15%)	The ease-of-use and user-friendliness of the AI tool, including intuitive interfaces for both technical and non-technical users with clear instructions and accessible documentation.	Features an exceptionally intuitive interface that is easy to learn and use without confusion or effort.

Key Considerations:

Weighting: Each criterion was assigned based on its relevance to privacy-focused AI tool evaluation, with Privacy being the most critical due to data protection concerns.
Score Levels: The scores reflect a clear performance hierarchy from Excellent (highest) down through Good and Needs Improvement to Fair or Poor. This allows for easy interpretation of how well each criterion is met by an AI tool.

A short note: I asked DeepSeek for assistance in editing this table, but it “overthought” everything I told it.

Ok, the score for these (which is TOTAL bunk I suspect) using the rubric is:

Tool	Privacy (40%)	Robustness (30%)	Security (20%)	Integrity (15%)	Simplicity (15%)	Weighted Score
MSTY	4	3	2	2	4	3.80
Ollama	4	3	3	3	2	3.85
Sanctum	4	2	4	2	1	3.45
LM Studio	2	1	1	3	4	2.35

I’m inclined to give MSTY the win because of simplicity, but I can see how security is a priority. Ok, this was a time sink.

😉

Discover more from Another Think Coming

Subscribe to get the latest posts sent to your email.

3 comments

Another Think Coming

Challenge Claims, Uncover Reality

Local AI Model Comparison (Updated)

The Chart

Things I Wanted

Things I Liked

Off the Well-Trod Path

Experimenting with Long Prompts

Future Growth

Wait, Wait, Here’s a Rubric

Key Considerations:

Discover more from Another Think Coming

3 comments

Leave a comment Cancel reply

Local AI Model Comparison (Updated)

The Chart

Things I Wanted

Things I Liked

Off the Well-Trod Path

Experimenting with Long Prompts

Future Growth

Wait, Wait, Here’s a Rubric

Key Considerations:

Discover more from Another Think Coming

Share this:

Related

3 comments

Leave a comment Cancel reply