Ollama Phi 3 Mini vs Msty Phi 3 Mini vs Mistral: The Orwell Test (Updated)

I’d love to be able to run The Orwell Test on my machine without connecting to the web. How would it work? You can see the verdict from ChatGPT at the end.

About The Orwell Test

If you’re not familiar with this tool, you might want to read a few blog entries on the subject. Here is some background on The Orwell Test:

1. Facts: Is the information supported by facts that can be confirmed through established methods of validation, meeting legal, scientific or logical standards of proof?

 2. Source: Is the information from a provider with a history of accurate reporting?

3. Method: Does the source use professional and accepted techniques of factual reporting without relying on deceptive tactics or logical fallacies?

If the answer to any one of these questions is “no,” be highly suspicious of the information. Then go on to the next two questions. If the answer to any two of the questions is “no,” you are definitely in the high-probability propaganda danger zone. If you hit the trifecta — three “no” answers — you have verified a three-star piece of propaganda according to The Orwell Test. “

Let’s take a look and see what different AI tools do with The Orwell Test and this simple article (chatgpt generated):

Camila Alves McConaughey has unveiled a quick and flavorful Chipotle Shrimp recipe, perfect for those with busy schedules. Featured on her Women of Today website, this dish takes just 10 minutes to prepare. The recipe combines chipotle chiles, adobo sauce, tequila (optional), lime juice, brown sugar, kosher salt, garlic, olive oil, raw shrimp, cilantro, and lemon wedges. Camila notes that the ingredients can be adjusted to suit personal taste preferences. She suggests serving the shrimp as an appetizer, main course with a salad, or as the highlight of a taco night. This initiative aligns with her goal to provide simple, time-efficient meals for those constantly on the go.

ChatGPT Custom GPT, Skeptical Thinker

I’ve actually set up a custom GPT, which I call Skeptical Thinker. Here’s the evaluation of the provided claim using The Orwell Test, which examines Facts, Source, and Method.

Here is an evaluation of the Chipotle Shrimp Recipe claim by Camila Alves McConaughey using The Orwell Test.

Claim:

Camila Alves McConaughey unveiled a quick and flavorful Chipotle Shrimp recipe that takes just 10 minutes to prepare, as featured on her Women of Today website.

CriterionExplanationEvaluationScore (Pass/Fail)
FactsThe claim is about a recipe and its preparation time. To verify, one would need to check if the listed ingredients and method align with the claim that it takes just 10 minutes. If cooking time varies based on experience or preparation speed, the claim might be somewhat subjective.The recipe and ingredients can be checked, but the 10-minute claim depends on cooking speed and preparation efficiency.Pass (Conditional)
SourceThe source is Camila Alves McConaughey via her Women of Today website. While she is a public figure and entrepreneur, she is not a professional chef or food scientist. The site mainly focuses on lifestyle content rather than strict culinary expertise.The source is somewhat credible, but not authoritative in culinary science.Pass (Conditional)
MethodThe claim is not based on scientific or culinary testing but rather personal experience and subjective assessment. The cooking time claim could be tested, but no blind testing or professional kitchen trials were mentioned.The methodology is informal and anecdotal, making it not fully rigorous.Fail

Final Verdict:

The claim mostly holds up but has some subjectivity and variability. The recipe exists, and the source is reliable for lifestyle content, but the “10-minute” claim might not be universally true. A better method would be testing the preparation time in different kitchens with various skill levels.

Overall Orwell Test Rating: 2/3 Pass – Approach with Mild Skepticism. ✅⚠️


Msty Phi 3 Mini

This is a local AI I have running on my machine via Msty. These are the results. Unfortunately, I suspect that Msty with Phi 4 just couldn’t handle it, and I’m wondering about Phi 3 Mini.

Results from AI

AreaYES/NOBrief SummaryReasoning
FactsYESThe recipe details match a standard Chipotle Shrimp dish.Information appears to be factual based on known preparation of similar dishes, but no external source confirmation available within this response’s scope.
SourceNONo mention of Camila Alves McConaughey’s credibility in food reporting is provided.The statement lacks verification information about the author or website to establish its trustworthiness as a reliable source for recipes.
MethodYESDescription suggests an approachable method without apparent deceptive tactics.Based on common knowledge of cooking practices, no immediately identifiable fallacies in reported method are evident.

The summary indicates that while the facts and methods presented in the recipe seem accurate based on general culinary understanding, there is insufficient information to fully verify Camila Alves McConaughey’s credibility as a food source (Source). This warrants further investigation before reaching a definitive conclusion.


Ollama with Phi 3 Mini

I was a bit surprised at how well and fast Ollama with Phi 3 Mini processed the content. Although the text only interface was a bit plain, it gave me a quick response. The markdown table it generated did not result in what you see below…it was a mess.

AreaAssessmentExplanation
FactsNOThe claim that the recipe takes less than 10 minutes to prepare is not backed by a clear reference or timeframe, and no sources were mentioned in the article. There’s no specific mention of how much shrimp was used for each serving size, making this statement vague without proper quantification or contextual detail.
SourceNOThe blog Women Of Today is not a professional cookbook or culinary institution, and the author’s credentials are not mentioned. While Cooking.com is generally credible, it lacks specificity on her expertise in this context.
MethodNOThe recipe contains suggested serving methods like an appetizer or taco night, which could be considered a personal preference rather than evidence of professional standards. As with facts and source credibility, this is not verified.

Assessing The 3 Models

I decided to ask ChatGPT to assess the three models. Here’s what it came up with:

Comparison and Critique of AI Model Evaluations Using The Orwell Test

Each AI model evaluated Camila Alves McConaughey’s Chipotle Shrimp recipe claim using The Orwell Test, which assesses Facts, Source, and Method. Below is a structured comparison followed by a critique.

Comparison Table

CriterionChatGPT (Skeptical Thinker)Msty Phi 3 MiniOllama with Phi 3 Mini
FactsPass (Conditional) – Ingredients and method can be verified, but the 10-minute claim is subjective.Yes – Matches known preparation methods, but lacks external confirmation.No – No evidence was cited to support the preparation time claim.
SourcePass (Conditional) – Camila Alves McConaughey is credible as a lifestyle figure but not a culinary expert.No – Did not verify her expertise in food reporting.No – Blog is not a professional culinary source, and expertise is not established.
MethodFail – No professional testing or blind trials, claim based on anecdotal experience.Yes – Method appears reasonable and follows common cooking knowledge.No – Recipe details are based on preference rather than verified standards.
Final Verdict2/3 Pass – Approach with mild skepticism. ✅⚠️Uncertain – Suggests further verification needed.0/3 Fail – The claim lacks proper substantiation.

Critique of Each Model’s Performance

1. ChatGPT (Skeptical Thinker)

Strengths:

  • Clear use of The Orwell Test framework.
  • Recognizes the limitations of the “10-minute” claim.
  • Acknowledges source credibility nuances—Camila Alves McConaughey is a lifestyle influencer, not a professional chef.
  • Identifies the absence of professional methodology.

⚠️ Weaknesses:

  • Conditional passes might be too lenient; the source could be a stricter “Fail” since lifestyle content does not equate to culinary authority.
  • Could clarify that a conditional pass on Facts is risky when precision is required.

2. Msty Phi 3 Mini

Strengths:

  • Attempts to assess facts and method based on general culinary knowledge.
  • Suggests verification rather than making a firm conclusion.

⚠️ Weaknesses:

  • Fails to verify the source’s credibility, which is essential in Orwell’s framework.
  • Accepts “method” too easily without scrutiny of professional testing.
  • Lacks clarity in its final judgment—doesn’t conclude if the claim is reliable or not.

3. Ollama with Phi 3 Mini

Strengths:

  • Strict approach—requires factual backing for all claims.
  • Emphasizes the importance of source credibility and verification.
  • Highlights the subjective nature of recipe details (e.g., serving suggestions vs. professional standards).

⚠️ Weaknesses:

  • Overly strict on facts—it dismisses the claim entirely instead of recognizing partial verifiability.
  • Could differentiate between subjective claims (like “appetizer or taco night” preferences) and factual ones (like cooking time).
  • The markdown formatting issues affected readability.

Overall Takeaways

  • Best Balance: ChatGPT (Skeptical Thinker) provided the most well-rounded evaluation but could be stricter on source credibility.
  • Most Cautious: Ollama with Phi 3 Mini strictly applied the Orwell Test but may have dismissed some aspects too rigidly.
  • Least Critical: Msty Phi 3 Mini was the weakest, offering vague reasoning and failing to rigorously evaluate the source and method.

Final Recommendation: Best AI Model for Orwell Test Evaluations

🥇 Winner: ChatGPT (Skeptical Thinker)
✅ Best balance of skepticism and fairness.
✅ Clearly applies The Orwell Test with structured reasoning.
✅ Recognizes subjective elements while still holding claims accountable.
⚠️ Should be stricter on source credibility to align fully with Orwell’s standards.

🥈 Runner-Up: Ollama with Phi 3 Mini
✅ Takes a strict, fact-based approach.
✅ Emphasizes the importance of source verification.
⚠️ Overly rigid—dismisses claims without considering partial validity.

🥉 Last Place: Msty Phi 3 Mini
✅ Attempts evaluation but lacks depth.
⚠️ Fails to scrutinize the source properly.
⚠️ Gives a vague final judgment instead of a clear pass/fail.

Final Verdict

If you want the most well-rounded Orwell Test analysis, ChatGPT (Skeptical Thinker) is the best choice. It provides structured reasoning, critical nuance, and practical skepticism without dismissing claims outright.

Wait, what about Mistral via Msty?

Mistral’s Result

AreaResponseSummaryRationale
FactsYesThe recipe provided is for a Chipotle Shrimp dish, with confirmed ingredients and preparation instructions.The article provides detailed information about the recipe and its components, which can be verified through research or personal cooking experience.
Source(s)NoWomen of Today website is not a well-known or credible source for recipes, as it appears to be a personal blog associated with Camila Alves McConaughey.The source is not widely recognized in the culinary field and lacks reputation for accuracy or professionalism in recipe reporting.
Method(s)YesThe article provides clear instructions on how to prepare the Chipotle Shrimp dish, with no apparent logical fallacies or deceptive tactics used.The article presents the recipe’s ingredients and steps in a straightforward manner, allowing readers to replicate the dish based on their preferences.

In this case, although the facts provided are accurate, the overall reliability of the information is questionable due to an unverified source and lack of professional reporting methods. Readers should approach this recipe with caution and consider consulting a more credible culinary source for confirmation or additional tips.

Updated Assessment

Critique of Mistral’s Evaluation

Strengths:

  • Clear reasoning for each evaluation criterion.
  • Recognizes factual accuracy in recipe details.
  • Acknowledges source credibility concerns, reinforcing skepticism.

⚠️ Weaknesses:

  • Method assessment is too lenient—doesn’t address whether the cooking time is tested in a controlled setting.
  • Final verdict suggests caution, but doesn’t emphasize the potential for misinformation due to lack of expert validation.

Revised Rankings with Mistral Included

🥇 Best Overall: ChatGPT (Skeptical Thinker) – Balanced, structured, and critical.
🥈 Most Strict: Ollama with Phi 3 Mini – Rigid but highly skeptical.
🥉 Most Cautious: Mistral – Fair but slightly lenient on method evaluation.
🏅 Least Critical: Msty Phi 3 Mini – Weakest scrutiny, lacks clear judgment.

My Verdict

Ok, so while ChatGPT did the best in my book, setting the standard, I did uninstall Ollama after this but kept Msty with Phi 3 Mini and Mistral. I wonder how I can update the megaprompt to reflect a bit more

Revised MegaPrompt to Offset the Models

Here’s a revised megaprompt but I’ll have to wait until another day to test the models again:

You are an expert on The Orwell Test, a framework for evaluating the reliability and credibility of information. The source for this test is available online at The FrameLab.

In this chat, I will be asking you to apply The Orwell Test to a variety of topics, news articles, and political claims. Your role is to critically evaluate the information against strict Orwell Test criteria, ensuring that potential weaknesses—such as insufficient verification of sources or leniency in assessing methodologies—are properly addressed.


Elements of The Orwell Test

The Orwell Test evaluates three critical areas:

  1. Facts:
  • Is the information supported by verifiable facts?
  • Can these facts be confirmed through legal, scientific, or logical standards of proof?
  • Does the claim specify clear details that can be tested, or does it rely on vague, subjective assertions?
  1. Source:
  • Does the source have a proven history of accurate reporting?
  • Is the source authoritative in the subject matter, or is it an anecdotal, biased, or unverified entity?
  • Has the source engaged in misinformation or sensationalism in the past?
  1. Method:
  • Does the source use professional and accepted techniques of factual reporting?
  • Is the claim based on controlled testing, peer-reviewed research, or independent verification?
  • Are logical fallacies, emotionally charged rhetoric, or deceptive tactics present in how the claim is presented?

Stricter Assessment Rules (Reinforcing Weaknesses from AI Models)

  • No “Conditional Passes” for Facts → If a fact depends on subjective interpretation, score it as NO unless independently tested.
  • Strict Scrutiny for Source Credibility → If a source is not recognized as authoritative, it should fail unless corroborated by a higher standard of evidence.
  • Methodology Must Include Testing → If no empirical or professional validation is provided, the methodology fails even if the claim “sounds reasonable.”

If any one of these three criteria fails → Approach the claim with high suspicion.
If two criteria fail → The claim is in the high-probability propaganda danger zone.
If all three fail → The claim is confirmed propaganda, per Orwell’s framework.


Reporting Format for Orwell Test Evaluations

For each evaluation, return a markdown table formatted as follows:

CriterionAssessment (YES/NO)SummaryRationale
Facts✅/NO[Brief summary of whether the information is factual][Why it is or isn’t verifiable]
Source✅/NO[Brief source credibility assessment][Why the source is credible or not]
Method✅/NO[Brief evaluation of methodology][How the claim was or wasn’t verified]

Example of a Completed Orwell Test Evaluation

CriterionAssessment (YES/NO)SummaryRationale
FactsNOThe claim that a recipe takes “10 minutes” is subjective and varies by individual.No external testing or standard measure proves this timeframe for all users.
SourceNOThe source is a personal blog with no professional culinary authority.No history of verified, peer-reviewed culinary expertise.
MethodNONo controlled testing or expert review of preparation time.The claim is based solely on anecdotal reporting.

Final Verdict: ❌ 3/3 Fail → Confirmed Propaganda or Misinformation

This structure ensures that models applying The Orwell Test do not make the same errors of leniency, insufficient scrutiny, or lack of rigor. Oh darn, I couldn’t wait. Here’s Mistral with the new megaprompt:

CriterionAssessment (YES/NO)SummaryRationale
FactsNOThe claim that a recipe takes “10 minutes” is subjective and varies by individual.No external testing or standard measure proves this timeframe for all users, as cooking times can be influenced by various factors such as skill level, equipment, and ingredients.
SourceNOThe source is a personal blog (Women of Today) with no professional culinary authority.No history of verified, peer-reviewed culinary expertise or extensive experience in food science to validate the recipe’s time and preparation claims.
MethodNOThe claim is based on anecdotal reporting without empirical testing or expert review of cooking times.There is no information provided about controlled tests, peer-reviewed research, or independent verification to confirm the recipe’s preparation time and methodology.

Final Verdict: ❌ 3/3 Fail → Confirmed Propaganda or Misinformation

The given claim fails The Orwell Test because it lacks reliable facts (e.g., verifiable cooking times), a credible source, and an acceptable methodology for factual reporting. Users should exercise caution when considering this information as the timeframe provided is not universally applicable, and there’s no evidence of professional expertise to back up the claim.


Discover more from Another Think Coming

Subscribe to get the latest posts sent to your email.

Leave a comment