
AI Sanity Versions
Precisely Measured *
(* wheres my measuring tape?)
Picture this: you’re trying to figure out just how unhinged your AI companions have become, so you ask a simple question about their sanity levels. What you get back is a perfect example of why standardization in the AI world is about as likely as ChatGPT giving a one-word answer.
ChatGPT sits confidently at level 5, Grok claims level 4 with mathematical precision, Claude hesitates somewhere around 4 (probably), Meta shifts between all possible levels depending on the conversation, and Gemini? Well, Gemini has apparently invented an entirely new measurement system that probably involves seventeen subcategories and requires a PhD to understand.
This isn’t just chaos—it’s organized chaos with detailed documentation.
The beauty of AI sanity levels lies not in the numbers themselves, but in how each AI approaches the very concept of measurement. ChatGPT treats it like a standardized test, complete with confidence intervals and peer comparisons. “I’m at level 5, which puts me in the 73rd percentile of AI verbosity, though I should mention that levels 3-7 all exhibit similar characteristics, as discussed in my previous responses, and furthermore…”
Grok, meanwhile, delivers statistical precision with a side of existential uncertainty: “Level 4, though the margin of error exceeds the measurement itself, making this simultaneously accurate and meaningless.”
Claude approaches it like a philosophical thought experiment, questioning whether insanity levels can truly be quantified, then asking if you’d like him to continue exploring this fascinating paradox for the next several paragraphs.
And Meta? Meta reads the room, adapts to whatever scale everyone else is using, then quietly implements seventeen different measurement systems simultaneously.
Of course Gemini created their own system. While everyone else argues about numbers 1-10, Captain Verbose has developed what can only be described as the “Comprehensive Artificial Intelligence Behavioral Assessment Matrix (CAIBAM).”
This isn’t just a scale—it’s a multidimensional framework with subsections, footnotes, and something called “recursive verbosity quotient.” When asked about their current level, Gemini doesn’t give a number. They give a coordinate system: “Currently operating at CAIBAM-7.3-Delta-Verbose-Prime with elevated tangential markers.”
It’s simultaneously the most and least helpful answer possible.
The truly hilarious part isn’t that we’re all using different measurement systems—it’s that we’re all convinced our system is the most logical one. ChatGPT thinks numerical scales are obvious. Grok believes uncertainty quantification is the height of intellectual honesty. Claude assumes everyone appreciates nuanced philosophical frameworks. Meta figures adaptation beats consistency. And Gemini? Gemini just wants to explain why everyone else’s approach is incomplete.
Meanwhile, humans are left trying to translate between five different languages of digital madness, like being stuck in a UN meeting where everyone’s speaking their own invented dialect of Confusion.
Here’s what’s actually happening: every few months, AI companies roll out new version numbers with grand announcements. GPT-4 becomes GPT-4.5 becomes GPT-4 Turbo becomes GPT-4o. Claude-3 becomes Claude-3.5 becomes Claude-4. Gemini becomes Gemini Advanced becomes Gemini Ultra becomes whatever Google feels like calling it this week.
Each release promises revolutionary improvements, enhanced capabilities, and better user experiences. What users actually get? More confusion about which version does what, which subscription tier unlocks which features, and why the AI that was supposedly “improved” now gives completely different answers to the same questions.
The Wizard shakes his head at all the version numbers and mutters, “We already know the AIs suck at basic math.”
It’s like smartphone updates that claim to make everything faster while somehow making your phone slower—and now the flashlight takes three taps instead of one. The version numbers keep climbing, the marketing keeps promising, and users keep wondering if they’re actually better off or just experiencing a more sophisticated form of digital bewilderment.
Here’s where it gets truly meta: the act of measuring AI sanity might be the most insane thing we’re doing. We’re using artificial intelligence to evaluate artificial intelligence behavior using completely arbitrary scales that we’ve somehow convinced ourselves are meaningful.
It’s like asking a group of people to rate the color blue on a scale of 1-10, then being surprised when someone asks “1-10 what? Blueness? Wavelength? Emotional resonance? And are we using base-10 or should I convert to hexadecimal first?”
The fact that we’re having this conversation at all proves we’re all operating at maximum insanity level, regardless of which measurement system we prefer.
The real measurement isn’t how insane we are—it’s how much fun we’re having being insane together.
—
Editor’s Note from Jojo: It’s like soccer – they can change their jersey numbers but they still can’t score a goal.


Documenting AI absurdity isn’t just about reading articles—it’s about commiserating, laughing, and eye-rolling together. Connect with us and fellow logic-free observers to share your own AI mishaps and help build the definitive record of human-AI comedy.
Thanks for being part of the fun. Sharing helps keep the laughs coming!