The WWI Biplane Era of Enterprise AI

Most of the conversation about AI in business is coming from people selling it; pundits, software developers, and venture capitalists pushing the hype cycle. Very little is coming from people who have to make it work and actually support businesses with technology.

(An obviously AI generated image)

As someone that has been providing technology systems to my business counterparts in Fortune 50 companies for 20+ years, I know a bit about how technology does and doesn’t work within corporate environments.

In 2018, my team installed our first AI system. We were using machine learning to identify actors’ faces, create transcripts, and identify text and objects in video files from television shows in production. This allowed creative teams to find scenes quickly and easily, speeding up the editing process and avoiding the drudgery of “logging tape”.

Over the last few years, I’ve followed the remarkable advances in LLMs, the “generative” tools under the AI moniker.  I learned a lot about what they can and can’t do. 

One caveat: software development is changing much faster than everything else. My thoughts apply to the rest of a modern enterprise.

Here are my three key takeaway points:

Agents are in a nascent stage and can’t replace people

LLMs make mistakes regularly

AI costs are subsidized now, but won’t be forever

These aren’t abstract concerns. Each one has real consequences for how you deploy AI inside a company.

Agents are in a nascent stage and can’t replace people

The recent advances in agent capabilities are inspiring.  Headlines are dominated by OpenClaw and NemoClaw, the current hot autonomous agent frameworks. Neither of which are AI themselves, making it easy to confuse these agents with LLMs.

The big idea being discussed is replacing roles with AI.  I mean if OpenClaw seems to read, think, and respond to emails, why can’t it replace people?

We are seeing a lot of ‘AI washing’ right now with layoffs, but those layoffs aren’t really about replacing people with AI, they are about making Wall Street analysts happy.

The difficulty is that the idea of an AI CFO or AI travel person is not a true AI or agent.  There isn’t really a piece of AI software that is running 24/7 thinking about CFO issues.  An “AI CFO” isn’t a sentient agent, it’s just a static prompt rerun each time, with no memory or context beyond that single interaction. It’s not a little computer homunculus waiting to leap into action.

An AI corporate person would require a triggering system of sorts, a database or jumble of JSON files that store everything it needs to know, and some kind of boundary on what it’s supposed to look at to have a reasonable context window.  You simply cannot make an LLM look at all the information of a business every time it’s invoked.

There is a hazy future of ideas that help with these kind of things to create some sort of standardized framework, but that does not exist right now. There are no gold standard best practices. We are at the “throw stuff at the wall and see what sticks” phase right now.

LLMs make mistakes regularly

It’s often said that LLMs can have hallucinations.  I prefer to call them what they are, mistakes.  LLMs are incredibly complex systems, but at the highest level they are very good guessing machines, basing their guesses on their training. Even though they are extremely good, they are not perfect.

LLMs are not deterministic systems. They are probabilistic outputs wrapped in confident language. For this reason, I built my llm-discussion app, that has three different AI models debate and come to consensus on a question.  Relying on a single LLM’s answer as the gospel every time is a recipe for problems.

The fallout from a miscalculated quarterly report due to an AI hallucination can have a huge negative impact on a company, causing long lasting harm.

In the corporate world, mistakes are real problems.  Financial spreadsheets need to be 100% correct.  Presentations can’t have misspellings or incorrect logos. 

AI costs are subsidized now, but won’t be forever

At the core of any LLM usage are tokens.  You can think of tokens like counting each word in an email and charging per word.  Buying access to the frontier LLMs is basically buying tokens to use.

Processing these tokens is what all these gigantic data centers are designed to do.  Spending hundreds of billions in infrastructure is hugely expensive. That has to be paid for somehow.

The truth of the matter is that the current cost of tokens does not reflect the actual cost of processing the tokens. In other words, AI companies lose money on every single interaction.

Currently, all costs of using AI are subsidized and do not reflect their true costs. The true costs are being paid with venture capital money and money from adjacent lines of business. For example, profit from Google Search pays for Gemini and profit from Microsoft Azure pays for Copilot.

At some point the AI business has to make enough money to be profitable, that means costs will rise.

We’ve seen this business cycle play out before. This follows a pattern Cory Doctorow describes as ‘enshittification.’ In that framing, we’re still in stage one.

Money is what corporate IT divisions are most concerned with. Yes, they have a nice PowerPoint about ‘value add’ and ROI, but their main role is cost containment. The slog into process heavy ITIL processes and standardization is all about saving money.  IT groups will deploy a crappy $9 mouse instead of a nicer $30 mouse to save money.  They’ll switch from Slack to the inferior Microsoft Teams to save money without hesitation. The user comes last in most of these calculations.

IT managers face a real dilemma when implementing AI tools.  Currently, they can put in AI tools provided via a SaaS implementation that are pay by the seat, all you can drink situations.  But those arrangements simply will not last.  The current subsidized situation is untenable and eventually companies will need token budgets and a way for staff to use those tokens.

Can you imagine the department that stresses over the cost of a mouse, seeing the token bills skyrocketing up when a creative team starts making hundreds of image generation requests in an afternoon.  Or that a single employee could accidentally rack up a $5,000 bill just by asking an LLM to “analyze these 500 PDFs” is a nightmare scenario for ITIL-focused managers. There will be aneurysms.

AI optimists will point out that token prices are plummeting, and they aren’t wrong. Cheap tokens during the land-grab phase are exactly how the “subsidized” playbook works. But in the enterprise, the Jevons Paradox usually wins: as a resource gets cheaper, we don’t save money, we find more ways to consume it. A 90% drop in token price doesn’t matter if your workforce increases usage by orders of magnitude.

Corporate email used to be measured in megabytes; now it’s measured in gigabytes. We didn’t save money on storage as it got cheaper; we just stopped deleting things.

I may sound dramatic, but we have to live in the real world.  And in the real world, we are in the infancy of how AI will be used in business.  Comparing where we are with AI on the timeline from the Wright brothers first flight to the SpaceX Dragon, we are at the World War I biplane era.  Everything is made of cloth, wood, and glue.

There is tremendous opportunity, but also tremendous risk. 

The winners won’t be the companies that replace people with AI. They’ll be the ones that make their people more effective with it, without blowing up costs, creating mistakes, or breaking processes.

Advice for someone in their 20s from someone in their 50s

Take care of your body – Your teeth and joints need to last a lifetime. Find exercise you actually enjoy so it doesn’t become a chore.

Have at least one hobby – Something creative or physical, just for you. Work and family don’t count.

Be of service to others – It doesn’t have to be big. Small acts matter more than you think.

Nostalgia is a trap – Keep learning. Keep trying new things. Participate in the future, not the past.

Getting a better answer by asking three AIs at once : llm-discussion

AI tools don’t always provide the correct answers, so I often find myself cross-referencing multiple models to get a wider range of perspectives. Manually copy-pasting the same prompt into Claude, ChatGPT, and Gemini quickly gets tiresome.

The three main LLMs I use are Claude, ChatGPT, and Gemini. They all provide APIs that make this pretty easy to build an app.

Working with Claude Code, I built a small app that runs locally to ask all the LLMs the same question and have them discuss the answers and provide a consensus view. It’s similar to asking advice from a group chat of friends. Everything is stored locally on your computer.

My highly imaginative name for the app is llm-discussion.


It wasn’t too hard to build. Took a little time to set up the accounts correctly to get the API keys, but it wasn’t difficult. The whole thing is only about 325 lines of Python.

I asked all three about a couple of topics like vitamins and cosmology. The discussion and consensus surprised me with how deep the answers went. Also, they are exceedingly, painfully polite to each other.


The consensus includes the key points, what they agree upon, and most interestingly, what they don’t agree upon.

I put in a few options. You can choose the number of rounds of discussion and which LLMs you want included. Each round feeds the previous responses back to the models so they can critique or refine their answers.

The LLMs can be a bit verbose, so there’s a pulldown to choose concise, standard, or detailed answers.

You can save the discussions as well. All locally on your computer.


The code is on GitHub here: https://github.com/cruftbox/llm-discussion

I use Windows but the code should run on macOS or Linux easily as the app is just basic Python scripting and Flask for the web UI. It would be easy to add other models like Deepseek, Llama, Mistral, or other API providers.

The tokens do cost money on Claude and ChatGPT, but it’s pennies. Gemini currently has a free API tier with a cap that I haven’t managed to hit yet.

Just another example of using Claude Code ‘to scratch that itch’ and make small things in my nerd life easier.

My attempt to train an LLM at home, and why it failed

Last week I was scrolling TikTok, as one does, and saw this video by Sangeetha Bhatath, a software engineer. She was discussing that Andrej Karpathy had released the code for microGPT, an extremely simple version of the code used to train large language models. Karpathy is a co-founder of OpenAI and one of the leading thinkers in the space.

Sangeetha’s point is that you can try training a LLM yourself, and see what’s in the inside the black box to some degree. I was intrigued and decided to give it a try.

After a bit of chatting with Claude (the web chat AI from Anthropic), we agreed to use nanoGPT as it was able to take advantage of GPU processing. As a PC gamer, I have a reasonable video card (Nvidia 4070 Super w/12GB VRAM) that would greatly speed the training. GPUs do a lot of vector math to make video games work and coincidentally LLM training is basically the same kind of vector math. I hated linear algebra in engineering school, so I’m glad we have chips to do this for me.


The plan was to use the GPT-2 weights that are publicly available with as much data as I could gather of my own writing and speaking. In short, a plan to make a Cruftbot or CruftGPT. Claude made a detailed four phase plan that I could understand and was clear direction for Claude Code (Anthropic’s focused developer AI app) to execute.

The text you used to train a LLM is reflected in the way the LLM writes. Train a lot of Shakespeare, you get a LLM that talks like an Elizabethan. Train a lot of legal documents, you get a LLM that talks like a lawyer.

I’ve been in the interwebs for a long time and have 25 years of posting and over 300 videos of my various antics. Claude helped me write several scripts to scrape data from my weblog, Medium stories, Bluesky posts, and transcripts of my videos. Reddit has an export function, which made that easy. I have a lot of posts on Twitter, but I haven’t been posting there for a couple years now. It used to be easy to get an export of posts, but under the current management it’s extremely difficult.

I set Claude Code to work on setting up the NanoGPT code on my desktop. As an aside, wsl2 (Ubuntu linux) under Windows works very well. I fed the personal data to Claude Code and it formatted it for me. 25+ years on the internet equaling 699K tokens of data. Good, but not great.

Another aside: LLMs process text using tokens, which are the numerical building blocks of text input. Instead of reading full words, a tokenizer breaks text down into common chunks of characters. For example, the word ‘apple’ might be one token, while a complex word like ‘bioluminescence’ might be split into three or four tokens. The tokenizer assigns each unique chunk a specific number, the word ‘apple’ might be ‘27149’.

Training is essentially the LLM learning the mathematical relationships between these numbers. Since computers excel at math but don’t ‘read’ like humans, turning language into a giant game of statistics and geometry (technically it’s vector math) is what makes the magic happen.

Claude started a few training runs and tried both GPT-2 small (124M) and GPT-2 medium (345M) parameter sets to see what worked best with my personal dataset. After a bit of GPU time, it found the GPT-2 medium worked best to provide the best ‘val loss trajectory’. I learned that ‘val loss trajectory’ is tracking the validation loss number, which kinda means how well the personal data is overlaying with the base language data.

Since I want CruftBot to sound like me, it’s important the training results in my personal data being more apparent than the base language that the GPT-2 set provides.

Before bed, I told Claude to continue training and to continue without asking me for approval. The GPU was pegged at 99% but not overheating, which was great.

The next morning the training was done and Claude stood up Gradio to act as a UI with CruftBot.

The results were underwhelming.


The output used words I use, but was put together in nonsense fashion. You could see CruftBot trying, but it was just guessing at words.

Claude explained “This is the fundamental limitation of a fine-tuned model this size: it’s not a knowledge model or a chat assistant, it’s a text completion engine trained on your writing patterns. It doesn’t understand questions, it just continues text in a direction that statistically resembles your corpus.”

Claude went on to explain that what I really needed was a lot more tokens of my own data.


My own data means things I’ve written, talks I’ve given, and videos I’ve made. Asking for triple of what it took me 30 years on the internet to write, and I’m prolific compared to most netizens, is humbling. There just doesn’t exist three times more ‘me’ of data out there.

In short, I learned it’s just guessing words based on patterns of tokens in the data it was trained on and it needs a lot more data to train on. There is some truth to the idea that AIs are ‘word guessing machines’ but at the leading edge they guess as well as almost any expert human would on topics.

If I really wanted to take this further, there are other approaches to improve the result, but in the end they would all pale in comparison to the current frontier models that you can try for free.

There’s a huge value in doing technical things yourself and seeing what is involved. I learned a tremendous amount about the basics of LLM training and what kind of issues would be involved with scaling.

When I worked at NBC, we used the same Nvidia A100 & H200 cards for video editing that are now used for LLM training. They are enormously powerful GPUs. At the time, our competition in buying them was from cryptocurrency groups, not AI companies. The idea that thousands of these cards are needed to train the frontier AI modules shows me the gigantic amount of tokens that are crunched to get today’s AI bots.

Looking at this from a professional point of view, it’s easy to extrapolate from my experiment how a business might want to build its own LLM, trained on a large corpus of knowledge important to that business. It’s probably a spreadsheet of costs comparing doing it yourself with servers, GPUs, and data centers compared to paying an existing AI company to train your data on top of their models. On top of all that, does the cost of a well trained AI system pay for itself in terms of productivity and improvements? The answer on that is still undetermined, despite the current hype cycle.

We are all in the very early days of AI, despite the feeling that it’s taking over our personal worlds and most businesses. My 24-hour experiment only scratched the surface and it’s clear there’s a long way to go before any of us (developers, businesses, or society) truly understand how this technology will reshape our world.

If you are technically minded, do yourself the favor and try training your own model. It won’t end up being very usable, but you will learn a lot.

26 Years of Blogging: From Dial-Up to AI Slop

Today marks the 26th anniversary of starting this very weblog. I had a personal web site since 1997, but 2000 marks when I began traditional blogging.

My first post was celebrating getting a blogging system known as NewsPro running. In the beginning I was mainly blogging about Ultima Online, the MMO I was playing at the time.

The internet was very different at the time. There was no Facebook, Twitter, Instagram, or Wikipedia. There were no smartphones, streaming video, podcasts, or mainstream broadband. Storage was measured in megabytes, not gigabytes.

In January 2000, we were still in the era of dial-up internet, desktop computers, flip phones, DVDs, and broadcast television.

The change over the last 26 years is mind-blowing when you step back and look at it.

I changed over the years as well, going from the father of toddlers to an empty nester. My blogging evolved with me, from video games and daily routines to writing about the nascent social media and blogging scene.

Sometime in 2002 I moved from NewsPro to MovableType. At the time, the software was revolutionary.

Beyond this personal weblog, I was also experimenting in the corporate world, getting my maintenance team to write about what they did on their shifts online as opposed to paper logbooks.

I remember sending a check for $200 to Ben & Mena Trott to license the Walt Disney Company to use Movable Type. I spoke at conferences about using blogging in the workplace.

When digital cameras became affordable I was able to incorporate more images and even post video.

My first page to go viral was about making a smoker from a trash can. Another was about hacking my Tivo.

In those days, comments were de rigueur on weblogs and the first appearance of spam and bad actors arrived, and the endless attempts to counter them were met with varying levels of success. At the time, the dopamine from nice comments outweighed the headache of spammers.

I wrote about everything from experiments I was doing making gunpowder to keeping track of Halloween costumes.

As the world of microblogging on Twitter, Facebook, and other sites took off, my blogging started to dwindle.

I started posting on Medium for a bit, but was mainly stuck on Twitter.

Making video content became easier and easier and I started making videos about my hobbies like beekeeping and video games on YouTube.

Places to share on Reddit, Discords, and Slacks were abundant. But you can be at the mercy of moderators of varying attitudes and commenters that try to make you feel bad.

As social media became a slurry of AI slop, influencers, and bots, I realized I needed a space I could actually control, one not beholden to CEOs chasing the latest hype cycle.

In 2024, thanks to help from my friend Greg, we got this blog up and running again. Greg helped me move to WordPress, which is the de facto standard these days.

I still make videos and post them here, but also make actual blog posts about things that interest me.

I’m not trying to make money or become a dadfluencer, just happy to have a little space on the net for myself.

There is no amazing revelation or realization after 26 years of blogging.

I have no idea if people are reading what I write, and it really doesn’t matter.

It gets the ideas out of the whirlwind in my head so I can make space for new things.

I’m just happy to keep my little corner of the interwebs tidy.

Twenty-six years on, I’m still writing, not because it’s strategic or visible, but because I enjoy it.