Introducing "Fairly AI"

My Perspective on AI as a CLO and DAU

Sep 02, 2025

Hello, readers! I'm thrilled to be back with a rebranded weekly blog on AI.

Introducing Fairly AI, a place where I share my perspective on AI as both a chief legal officer (CLO) and a daily active user (DAU).

My last post was in June. I took the summer off to focus on other priorities. However, the world of AI didn't take a vacation. It sprinted forward:

OpenAI launched an agent mode that combines Deep Research with Operator.
OpenAI released GPT-5 that excels at math competition and coding with reduced hallucination.
Claude released Opus 4.1, briefly topping coding and reasoning leaderboards before being edged out.
Microsoft unveiled its own foundational model independent from OpenAI called MAI.
Several Chinese companies released powerful open-weight models, sparking a new round of competition from OpenAI.
Google's Veo3 now can transform static images into dynamic eight-second clips with sound.

All this happened despite the one-week summer break OpenAI gave its employees amid a brutal talent war that involved gigantic sign-on bonuses and pay packages.

AI Can Move Even Faster

AI is already moving faster than any technology we’ve seen before. Can it keep accelerating at this pace or even faster? The answer is “Yes”, and this summer has again proven it.

Today's AI is more like a discovery rather than an invention. We didn't build the LLMs vector by vector, neuron by neuron. In Sam Altman's interview with Bill Gates, he said:

"[i]n our case, the guy that built GPT-1 sort of did it off by himself and solved this, and it was somewhat impressive, but no deep understanding of how it worked or why it worked… That has led us to a bunch of attempts and better and better scientific understanding of what’s going on. But it really came from a place of empirical results first."

These AI models are like mystery boxes. We’re racing to uncover what's inside. The capabilities are already there, and our journey to discovery is only starting.

Assumptions Are Often Wrong

The moment ChatGPT-5 arrived, I dove in, expecting it to dazzle. It didn’t. Strong on math and coding benchmarks, it failed in my legal and public policy tasks. It failed so badly that I apologized to my network for hyping it so much for the past year.

Eager to reclaim my productivity, I stepped beyond my usual trio of ChatGPT, Claude and Gemini, and ventured into Google’s NotebookLM, xAI’s Grok, Perplexity’s Comet, and DeepSeek. Each delivered in unexpected ways:

Grok, which I feared to be too edgy, surprised me with sharp, professional insights tailored to my voice.
NotebookLM’s new share function let me share AI-generated videos and notes with one click.
DeepSeek distilled a precise terms-of-service summary from multiple websites, citing clauses verbatim.
Perplexity’s Comet offered a glimpse of where AI agents are truly headed.

In AI’s fast-moving world, today’s leader can fade overnight. Common sense assumptions are often wrong. To cite a few from my personal experiences:

I thought domain-specific models must outperform general-purpose models, but our experiment results told a different story. See ACORD paper.
Almost all my custom-built GPTs and fine-tuned checkpoints sat idle. I found the more advanced models, when pointed to the right source or given the right prompts and documents, often gave better results.
I thought Deep Research was a must-have for every team, until a super user from another team told me that they didn’t find it helpful.
I thought ChatGPT ruled it all until ChatGPT-5.
I thought Grok would be edgy until I used it for the first time.

The Importance of Experimentation and A Multi-Vendor Strategy

My summer experiments led to an important learning: one can’t gauge AI’s capabilities and limitations from assumptions, hearsay or recycled headlines. We don’t fully understand how LLMs work. As a result, the only way to measure their effectiveness is to roll up our sleeves and test them.

Just like many AI products now offer users the choice to select among various AI models, internal AI tool adoption should adopt the same multi-vendor strategy.

This does not mean buying multiple AI tools for every employee and enforcing endless testing. It's about strategic access: providing a small number of cross-functional super users with a diverse toolkit and driving accountability for experimentation-based evaluation.

By granting super users a diverse toolkit, companies avoid the pitfalls of a one-size-fits-all model, which often fails to address specialized needs with varying workflows. While Finance might prioritize tools optimized for quantitative analysis and spreadsheets, Legal teams could benefit from models with the best Deep Research feature.

It is also important to note that not all employees require the same level of data integration or model accuracy. Before implementing heavy data integration that complicates compliance and data security, organizations should first test whether the need is indeed there, and if the tool, when given data access, will indeed improve performance. While AI-generated drafts might be good enough for a BDR, it rarely is good enough for an attorney. Again, assumptions are often wrong. The only way to know is to set up a pilot.

Conclusion

The summer of 2025 showed me that AI doesn’t pause, not for vacations, talent wars, or our assumptions. I’ve learned that the only way to harness this relentless pace is to dive in, test rigorously, and embrace a multi-vendor strategy. By empowering cross-functional super users to experiment different tools, we uncover what truly works for different functions without wasting resources on blanket solutions or untested integrations.

AI’s future is a race of discovery. Grab your toolkit, roll up your sleeves, and join me next week for more Fairly AI.

Fairly AI

Discussion about this post

Ready for more?