Is It Just Semantics about Gen AI and Privacy?

Over on Mastodon, PLB (initials only, although you can read the conversations via Mastodon) makes this point:

My point is that it is misleading to promote an AI agent or assistant by saying “This respects your privacy BECAUSE it is self-hosted” if the project is actually using a cloud service as its most important technological component.

I am arguing that semantics does matter here. If a “self-hosted” file storage solution was really just a self-hosted frontend for S3, I’d be pretty upset too. If something claims to be self-hosted, I actually expect it to work without an Internet connection. And if an AI agent claims to be FOSS, I expect it to use a FOSS model. The LLM being a proprietary cloud solution is NOT a minor detail.

I find it difficult to argue with PLB’s point in paragraph #1. If it’s using a cloud service via API, though, your data is safeguarded. I can imagine a scenario where everyone lies and breaks the law, and your data IS used for training in violation of contract and legal protections (’cause, copyright and AI).

My response appears below:

Miguel’s Response to PLB

API providers use Enterprise Terms that legally bar training on your inputs. Data is “pass-through” only…it is processed for a reply, then auto-deleted. Your local UI keeps the history on your disk, not in their cloud. Breaching API terms triggers massive legal and financial hits. Under 2026 laws, penalties range from $2,500 to $1M per violation. Providers also face GDPR fines (up to 4% of revenue), lawsuits, and a total loss of SOC 2 trust.

A self-hosted LLM is private since it’s on your machine. But there are cloud AI providers who respect your privacy using frontier models via API and safeguard it. And, there are cloud AI providers who make open-source models (e.g. Llama, DeepSeek, Gemma, Qwen, Zai, Mistral) versions available and safeguard your privacy…not a lot, but at least one comes to mind (BoodleBox). 

Cloud Gen AI via API: Safer or Not?

The answer to that question really boils down to whether you trust the API process and consequences. Today, those protections guaranteed by the law ARE less safe. If the federal government (e.g. USA or UK) decide that over-riding those protections in the interests of itself (or wealthy billionaires), then we can probably count on Wild, Wild West kind of situation where anything goes.

With the limitations of API in mind, you would think the data is as safe as modern society can make it. But, of course, it is possible to imagine that someone would violate that agreement. I mean, think about shadow libraries and data breaches. Obviously, there was some profit for folks to violate those expectations. If a gold-plated politician decides that’s the way to go, then, “Oh, well.” Revolt, revolution, and mayhem are time-proven ways to safeguard what was once held sacred, but the Constitution shows us a better way.

Local AI: The Private Alternative

The ONLY way to be truly sure that no one can access your data is to run a local Gen AI model on your machine. While I think of these local AI solutions, others may be already migrating to something else…

Interest in Clawdbot, (or is it Moltbot now?) an open-source AI personal assistant, has been building from a simmer to a roar. Over the weekend, online chatter about the tool reached viral status — at least, as viral as an open-source AI tool can be.

Clawdbot has developed a cult following in the early adopter community, and AI nerds in Silicon Valley are obsessively sharing best practices and showing off their DIY Clawdbot setups. The free, open-source AI assistant is commonly run on a dedicated Mac Mini (though other setups are possible), with users giving it access to their ChatGPT or Claude accounts, as well as email, calendars, and messaging apps.(source)

That said, PLB also points out:

This is not just about privacy either. People who are into self-hosting can be into it for a variety of reasons. It can be privacy, it can be FOSS, it can be resilience from Internet outages, it can be environmental concerns, it can be anti-capitalist sentiments, it can be to save money, it can even be just because self-hosting is fun. And everyone is allowed to care about what they care about.

Exactly. Who can argue with that last sentence? Who would want to?


Discover more from Another Think Coming

Subscribe to get the latest posts sent to your email.

Leave a comment