I built a programming language using Claude Code

Over the course of four weeks in January and February, I built a new programming language using Claude Code. I named it Cutlet after my cat. It’s completely legal to do that. You can find the source code on GitHub, along with build instructions and example programs.

I’ve been using LLM-assisted programming since the original GitHub Copilot release in 2021, but so far I’ve limited my use of LLMs to generating boilerplate and making specific, targeted changes to my projects. While working on Cutlet, though, I allowed Claude to generate every single line of code. I didn’t even read any of the code. Instead, I built guardrails to make sure it worked correctly (more on that later).

I’m surprised by the results of this experiment. Cutlet exists today. It builds and runs on both macOS and Linux. It can execute real programs. There might be bugs hiding deep in its internals, but they’re probably no worse than ones you’d find in any other four-week-old programming language in the world.

I have Feelings™ about all of this and what it means for my profession, but I want to give you a tour of the language before I get up on my soapbox.

A tour of Cutlet

If you want to follow along, build the Cutlet interpreter from source and drop into a REPL using /path/to/cutlet repl.

Arrays and strings work as you’d expect in any dynamic language. Variables are declared with the my keyword.

cutlet> my cities  = ["Tokyo", "Paris", "New York", "London", "Sydney"]
=> [Tokyo, Paris, New York, London, Sydney]

Variable names can include dashes. Same syntax rules as Raku. The only type of number (so far) is a double.

cutlet> my temps-c = [28, 22, 31, 18, 15]
=> [28, 22, 31, 18, 15]

Here’s something cool: the @ meta-operator turns any regular binary operator into a vectorized operation over an array. In the next line, we’re multiplying every element of temps-c by 1.8, then adding 32 to each element of the resulting array.

cutlet> my temps-f = (temps-c @* 1.8) @+ 32
=> [82.4, 71.6, 87.8, 64.4, 59]

The @: operator is a zip operation. It zips two arrays into a map.

cutlet> my cities-to-temps = cities @: temps-f
=> {Tokyo: 82.4, Paris: 71.6, New York: 87.8, London: 64.4, Sydney: 59}

Output text using the built-in say function. This function returns nothing, which is Cutlet’s version of null.

cutlet> say(cities-to-temps)
{Tokyo: 82.4, Paris: 71.6, New York: 87.8, London: 64.4, Sydney: 59}
=> nothing

The @ meta operator also works with comparisons.

cutlet> my greater-than-seventy-five = temps-f @> 75
=> [true, false, true, false, false]

Here’s another cool bit: you can index into an array using an array of booleans. This is a filter operation. It picks the element indexes corresponding to true and discards those that correspond to false.

cutlet> cities[greater-than-seventy-five]
=> [Tokyo, New York]

Here’s a shorter way of writing that.

cutlet> cities[temps-f @> 75]
=> [Tokyo, New York]

Let’s print this out with a user-friendly message. The ++ operator concatenates strings and arrays. The str built-in turns things into strings.

cutlet> say("Pack light for: " ++ str(cities[temps-f @> 75]))
Pack light for: [Tokyo, New York]
=> nothing

The @ meta-operator in the prefix position acts as a reduce operation.

cutlet> my total-temp = @+ temps-c
=> 114

Let’s find the average temperature. @+ adds all the temperatures, and the len() built-in finds the length of the array.

cutlet> (@+ temps-c) / len(temps-c)
=> 22.8

Let’s print this out nicely, too.

cutlet> say("Average: " ++ str((@+ temps-c) / len(temps-c)) ++ "°C")
Average: 22.8°C
=> nothing

Functions are declared with fn. Everything in Cutlet is an expression, including functions and conditionals. The last value produced by an expression in a function becomes its return value.

cutlet> fn max(a, b) is
    ...   if a > b then a else b
    ... end
=> <fn max>

Your own functions can work with @ too. Let’s reduce the temperatures with our max function to find the hottest temperature.

cutlet> my hottest = @max temps-c
=> 31

Cutlet can do a lot more. It has all the usual features you’d expect from a dynamic language: loops, objects, prototypal inheritance, mixins, a mark-and-sweep garbage collector, and a friendly REPL. We don’t have file I/O yet, and some fundamental constructs like error handling are still missing, but we’re getting there!

See TUTORIAL.md in the git repository for the full documentation.

Why build this?

I’m a frontend engineer and (occasional) designer. I’ve tried using LLMs for building web applications, but I’ve always run into limitations.

In my experience, Claude and friends are scary good at writing complex business logic, but fare poorly on any task that requires visual design skills.

Turns out describing responsive layouts and animations in English is not easy. No amount of screenshots and wireframes can communicate fluid layouts and animations to an LLM. I’ve wasted hours fighting with Claude about layout issues it swore it had fixed, but which I could still see plainly with my leaky human eyes.

I’ve also found these tools to excel at producing cookie-cutter interfaces they’ve seen before in publicly available repositories, but they fall off when I want to do anything novel. I often work with clients building complex data visualizations for niche domains, and LLMs have comprehensively failed to produce useful outputs on these projects.

On the other hand, I’d seen people accomplish incredible things using LLMs in the last few months, and I wanted to replicate those experiments myself. But my previous experience with LLMs suggested that I had to pick my project carefully.

  • I didn’t want to solve a particularly novel problem, but I wanted the ability to sometimes steer the LLM into interesting directions.
  • I didn’t want to manually verify LLM-generated code. I wanted to give the LLM specifications, test cases, documentation, and sample outputs, and make it do all the difficult work of figuring out if it was doing the right thing.
  • I wanted to give the agent a strong feedback loop so it could run autonomously.
  • I don’t like MCPs. I didn’t want to deal with them. So anything that required connecting to a browser, taking screenshots, or talking to an API over the network was automatically disqualified.
  • I wanted to use a boring language with as few external dependencies as possible.

A small, dynamic programming language met all my requirements.

  • LLMs know how to build language implementations because their training data contains thousands of existing implementations, papers, and CS books. I was intrigued by the idea of creating a “remix” language by picking and choosing features I enjoy from various existing languages.
  • I could write a bunch of small deterministic programs along with their expected outputs to test the implementation. I could even get Claude to write them for me, giving me a potentially infinite number of test cases to verify that the language was working correctly.
  • Language implementations can be tested from the command line, with purely textual inputs and outputs. No need to take screenshots or videos or set up fragile MCPs. There’s no better feedback loop for an agent than “run make test and make check until there are no more errors”.
  • C is as boring as it gets, and there are a large number of language implementations built in C.

Finally, this was also an experiment to figure out how far I could push agentic engineering. Could I compress six months of work into a few weeks? Could I build something that was beyond my own ability to build? What would my day-to-day work life look like if I went all-in on LLM-driven programming? I wanted to answer all these questions.

I went into this experiment with some skepticism. My previous attempts at building something entirely using Claude Code hadn’t worked out. But this attempt has not only been successful, but produced results beyond what I’d imagined possible. I don’t hold the belief that all software in the future will be written by LLMs. But I do believe there is a large subset that can be partially or mostly outsourced to these new tools.

Building Cutlet taught me something important: using LLMs to produce code does not mean you forget everything you’ve learned about building software. Agentic engineering requires careful planning, skill, craftsmanship, and discipline, just like any software worth building before generative AI. The skills required to work with coding agents might look different from typing code line-by-line into an editor, but they’re still very much the same engineering skills we’ve been sharpening all our careers.

Four skills for agentic engineering

There is a lot of work involved in getting good output from LLMs. Agentic engineering does not mean dumping vague instructions into a chat box and harvesting the code that comes out.

I believe there are four main skills you have to learn today in order to work effectively with coding agents:

  • Understanding which problems can be solved effectively using LLMs, which ones need a human in the loop, and which ones should be handled entirely by humans.
  • Communicating your intent clearly and defining criteria for success.
  • Creating an environment in which the LLM can do its best work.
  • Monitoring and optimizing the agentic loop so the agent can work efficiently.

Understanding which problems can be solved effectively using LLMs

Models and harnesses are changing rapidly, so figuring out which problems LLMs are good at solving requires developing your intuition, talking to your peers, and keeping your ear to the ground.

However, if you don’t want to stay up-to-date with a rapidly-changing field—and I wouldn’t judge you for it, it’s crazy out there—here are two questions you can ask yourself to figure out if your problem is LLM-shaped:

  • For the problem you want to solve, is it possible to define and verify success criteria in an automated fashion?
  • Have other people solved this problem—or a similar one—before? In other words, is your problem likely to be in the training data for an LLM?

If the answer to either of those questions is “no”, throwing AI at the problem is unlikely to yield good results. If the answer to both of them is “yes”, then you might find success with agentic engineering.

The good news is that the cost of figuring this out is the price of a Claude Code subscription and one sacrificial lamb on your team willing to spend a month trying it out on your codebase.

Communicating intent

LLMs work with natural language, so learning to communicate your ideas using words has become crucial. If you can’t explain your ideas in writing to your co-workers, you can’t work effectively with coding agents.

You can get a lot out of Claude Code using simple, vague, overly general prompts. But when you do that, you’re outsourcing a lot of your thinking and decision-making to the robot. This is fine for throwaway projects, but you probably want to be more careful when you’re building something you will put into production and maintain for years.

You want to feed coding agents precisely written specifications that capture as much of your problem space as possible. While working on Cutlet, I spent most of my time writing, generating, reading, and correcting spec documents.

For me, this was a new experience. I primarily work with early-stage startups, so for most of my career, I’ve treated my code as the spec. Writing formal specifications was an alien experience.

Thankfully, I could rely on Claude to help me write most of these specifications. I was only comfortable doing this because Cutlet was an experiment. On a project I wanted to stake my reputation on, I might take the agent out of the equation altogether and write the specs myself.

This was my general workflow while making any change to Cutlet:

  • First, I’d present the LLM with a new feature (e.g. loops) or refactor (e.g. moving from a tree-walking interpreter to a bytecode VM). Then I’d have a conversation with it about how the change would work in the context of Cutlet, how other languages implemented it, design considerations, ideas we could steal from interesting/niche languages, etc. Just a casual back-and-forth, the same way you might talk to a co-worker.
  • After I had a good handle on what the feature or change would look like, I’d ask the LLM to give me an implementation plan broken down into small steps.
  • I’d review the plan and go back and forth with the LLM to refine it. We’d explore various corner cases, footguns, gotchas, missing pieces, and improvements.
  • When I was happy with the plan, I’d ask the LLM to write it out to a file that would go into a plans/doing/ directory. Sometimes we’d end up with 3-4 plan files for a single feature. This was intentional. I needed the plans to be human-readable, and I needed each plan to be an atomic unit I could roll back if things didn’t work out. They also served as a history of the project’s evolution. You can find all the historical plan files in the Cutlet repository.
  • I’d read and review the generated plan file, go back and forth again with the LLM to make changes to it, and commit it when everything looked good.
  • Finally, I’d fire up a Docker container, run Claude with all permissions—including sudo access—and ask it to implement my plan.

This workflow front-loaded the cognitive effort of making any change to the language. All the thinking happened before a single line of code was written, which is something I almost never do. For me, programming involves organically discovering the shape of a problem as I’m working on it. However, I’ve found that working that way with LLMs is difficult. They’re great at making sweeping changes to your codebase, but terrible at quick, iterative, organic development workflows.

Maybe my workflow will evolve as inference gets faster and models become better, but until then, this waterfall-style model works best.

Creating an environment for the agent to do its best work

I find this to be the most interesting and fun part of working with coding agents. It’s a whole new class of problem to solve!

The core principle is this: coding agents are computer programs, and therefore have a limited view of the world they exist in. Their only window into the problem you’re trying to solve is the directory of code they can access. This doesn’t give them enough agency or information to be able to do a good job. So, to help them thrive, you must give them that agency and information in the form of tools they can use to reach out into the wider world.

What does this mean in practice? It looks different for different projects, but this is what I did for Cutlet:

  • Comprehensive test suite. My project instructions told Claude to write tests and make sure they failed before writing any new code. Alongside, I asked it to run tests after making significant code changes or merging any branches. Armed with a constantly growing test suite, Claude was able to quickly identify and fix any regressions it introduced into the codebase. The tests also served as documentation and specification.
  • Sample inputs and outputs. These were my integration tests. I added a number of example programs to the Cutlet repository—most of them written by Claude itself—that not only serve as documentation for humans, but also as an end-to-end test suite. The project instructions told Claude to run all of them and verify their output after every code change.
  • Linters, formatters, and static analysis tools. Cutlet uses clang-tidy and clang-format to ensure a baseline of code quality. Just like with tests, the project instructions asked the LLM to run these tools after every major code change. I noticed that clang-tidy would often produce diagnostics that would force Claude to rewrite parts of the code. If I had access to some of the more expensive static analysis tools (such as Coverity), I would have added them to my development process too.
  • Memory safety tools. I asked Claude to create a make test-sanitize target that rebuilt the entire project and test suite with ASan and UBSan enabled (with LSan riding along via ASan), then ran every test under the instrumented build. The project instructions included running this check at the end of implementing a plan. This caught memory errors—use-after-free, buffer overflows, undefined behavior—that neither the tests nor the linter could find. Running these tests took time and greatly slowed down the agent, but they caught even more issues than clang-tidy.
  • Symbol indexes. The agent had access to ctags and cscope for navigating the source code. I don’t know how useful this was, because I rarely ever saw it use them. Most of the time it would just grep the code for symbols. I might remove this in the future.
  • Runtime introspection tools. Early in the project, I asked Claude to give Cutlet the ability to dump the token stream, AST, and bytecode for any piece of code to the standard output before executing it. This allowed the agent to quickly figure out if it had introduced errors into any part of the execution pipeline without having to navigate the source code or drop into a debugger.
  • Pipeline tracing. I asked Claude to write a Python script that fed a Cutlet program through the interpreter with debug flags to capture the full compilation pipeline: the token stream, the AST, and the bytecode disassembly. It then mapped each token type, AST node, and opcode back to the exact source locations in the parser, compiler, and VM where they were handled. When an agent needed to add a new language feature, it could run the tracer on an example of a similar existing feature to see precisely which files and functions to touch. I was very proud of this machinery, but I never saw Claude make much use of it either.
  • Running with every possible permission. I wanted the agent to work autonomously and have access to every debugging tool it might want to use. To do this, I ran it inside a Docker container with --dangerously-skip-permissions enabled and full sudo access. I believe this is the only practical way to use coding agents on large projects. Answering permissions prompts is cognitively taxing when you have five agents working in parallel, and restricting their ability to do whatever they want makes them less effective at their job. We will need to figure out all sorts of safety issues that arise when you give LLMs the ability to take full control of a system, but on this project, I was willing to accept the risks that come with YOLO mode.

All these tools and abilities guaranteed that any updates to the code resulted in a project that at least compiled and executed. But more importantly, they increased the information and agency Claude had access to, making it more effective at discovering and debugging problems without my intervention. If I keep working on this project, my main focus will be to give my agents even more insight into the artifact they are building, even more debugging tools, even more freedom, and even more access to useful information.

You will want to come up with your own tooling that works for your specific project. If you’re building a Django app, you might want to give the agent access to a staging database. If you’re building a React app, you might want to give it access to a headless browser. There’s no single answer that works for every project, and I bet people are going to come up with some very interesting tools that allow LLMs to observe the results of their work in the real world.

Optimizing the agentic loop

Coding agents can sometimes be inefficient in how they use the tools you give them.

For example, while working on this project, sometimes Claude would run a command, decide its output was too long to fit into the context window, and run it again with the output piped to head -n 10. Other times it would run make check, forget to grep the output for errors, and run it a second time to capture the output. This would result in the same expensive checks running multiple times in the course of making a single edit. These mistakes slowed down the agentic loop significantly.

I could fix some of these performance bottlenecks by editing CLAUDE.md or changing the output of a custom script. But there were some issues that required more effort to discover and fix.

I quickly got into the habit of observing the agent at work, noticing sequences of commands that the agent repeated over and over again, and turning them into scripts for the agent to call instead. Many of the scripts in Cutlet’s scripts directory came about this way.

This was very manual, very not-fun work. I’m hoping this becomes more automated as time goes on. Maybe a future version of Claude Code could review its own tool calling outputs and suggest scripts you could write for it?

Of course, the most fruitful optimization was to run Claude inside Docker with --dangerously-skip-permissions and sudo access. By doing this, I took myself out of the agentic loop. After a plan file had been produced, I didn’t want to hang around babysitting agents and saying Yes every time they wanted to run ls.

As Cutlet evolved, the infrastructure I built for Claude also evolved. Eventually, I captured many of the workflows Claude naturally followed as scripts, slash commands, or instructions in CLAUDE.md. I also learned where the agent stumbled most, and preempted those mistakes by giving it better instructions or scripts to run.

The infrastructure I built for Claude was also valuable for me, the human working on the project. The same scripts that helped Claude automate its work also helped me accomplish common tasks quickly.

As the project grows, this infrastructure will keep evolving along with it. Models change all the time. So do project requirements and workflows. I look at all this project infrastructure as an organic thing that will keep changing as long as the project is active.

Is software engineering as we know it dead?

Now that it’s possible for individual developers to accomplish so much in such little time, is software engineering as a career dead?

My answer to this question is nope, not at all. Software engineering skills are just as valuable today as they were before language models got good. If I hadn’t taken a compilers course in college and worked through Crafting Interpreters, I wouldn’t have been able to build Cutlet. I still had to make technical decisions that I could only make because I had (some) domain knowledge and experience.

Besides, I had to learn a bunch of new skills in order to effectively work on Cutlet. These new skills also required technical knowledge. A strange and new and different kind of technical knowledge, but technical knowledge nonetheless.

Before working on this project, I was worried about whether I’d have a job five years from now. But today I’m convinced that the world will continue to have a need for software engineers in the future. Our jobs will transform—and some people might not enjoy the new jobs anymore—but there will still be plenty of work for us to do. Maybe we’ll have even more work to do than before, since LLMs allow us to build a lot more software a lot faster.

And for those of us who never want to touch LLMs, there will be domains where LLMs never make any inroads. My friends who work on low-level multimedia systems have found less success using LLMs compared to those who build webapps. This is likely to be the case for many years to come. Eventually, those jobs will transform, too, but it will be a far slower shift.

Is it fair to take credit for Claude’s work?

Is it fair to say that I built Cutlet? After all, Claude did most of the work. What was my contribution here besides writing the prompts?

Moreover, this experiment only worked because Claude had access to multiple language runtimes and computer science books in its training data. Without the work done by hundreds of programmers, academics, and writers who have freely donated their work to the public, this project wouldn’t have been possible. So who really built Cutlet?

I don’t have a good answer to that. I’m comfortable taking credit for the care and feeding of the coding agent as it went about generating tokens, but I don’t feel a sense of ownership over the code itself.

I don’t consider this “my” work. It doesn’t feel right. Maybe my feelings will change in the future, but I don’t quite see how.

Because of my reservations about who this code really belongs to, I haven’t added a license to Cutlet’s GitHub repository. Cutlet belongs to the collective consciousness of every programming language designer, implementer, and educator to have released their work on the internet.

(Also, it’s worth noting that Cutlet almost certainly includes code from the Lua and Python interpreters. It referred to those languages all the time when we talked about language features. I’ve also seen a ton of code from Crafting Interpreters making its way into the codebase with my own two fleshy eyes.)

This wasn’t good for my mental health

I’d be remiss if I didn’t include a note on mental health in this already mammoth blog post.

It’s easy to get addicted to agentic engineering tools. While working on this project, I often found myself at my computer at midnight going “just one more prompt”, as if I was playing the world’s most obscure game of Civilization. I’m embarrassed to admit that I often had Claude Code churning away in the background when guests were over at my place, when I stepped into the shower, or when I went off to lunch. There’s a heady feeling that comes from accomplishing so much in such little time.

More addictive than that is the unpredictability and randomness inherent to these tools. If you throw a problem at Claude, you can never tell what it will come up with. It could one-shot a difficult problem you’ve been stuck on for weeks, or it could make a huge mess. Just like a slot machine, you can never tell what might happen. That creates a strong urge to try using it for everything all the time. And just like with slot machines, the house always wins.

These days, I set limits for how long and how often I’m allowed to use Claude. As LLMs become widely available, we as a society will have to figure out the best way to use them without destroying our mental health.

This is the part I’m not very optimistic about. We have comprehensively failed to regulate or limit our use of social media, and I’m willing to bet we’ll have a repeat of that scenario with LLMs.

What do we do with these new superpowers?

Now that we can produce large volumes of code very quickly, what can we do that we couldn’t do before?

This is another question I’m not equipped to answer fully at the moment.

That said, one area where I can see LLMs being immediately of use to me personally is the ability to experiment very quickly. It’s very easy for me to try out ten different features in Cutlet because I just have to spec them out and walk away from the computer. Failed experiments cost almost nothing. Even if I can’t use the code Claude generates, having working prototypes helps me validate ideas quickly and discard bad ones early.

I’ve also been able to radically reduce my dependency on third-party libraries in my JavaScript and Python projects. I often use LLMs to generate small utility functions that previously required pulling in dependencies from NPM or PyPI.

But honestly, these changes are small beans. I can’t predict the larger societal changes that will come about because of AI agents. All I can say is programming will look radically different in 2030 than it does in 2026.

What’s next for Cutlet?

This project was a proof of concept to see how far I could push Claude Code. I’m currently looking for a new contract as a frontend engineer, so I probably won’t have the time to keep working on Cutlet. I also have a few more ideas for pushing agentic programming further, so I’m likely to prioritize those over continuing work on Cutlet.

When the mood strikes me, I might still add small features now and then to the language. Now that I’ve removed myself from the development loop, it doesn’t take a lot of time and effort. I might even do Advent of Code using Cutlet in December!

Of course, if you work at Anthropic and want to give me money so I can keep running this experiment, I’m available for contract work for the next 8 months :)

For now, I’m closing the book on Cutlet and moving on to other projects (and cat).


Thanks to Shruti Sunderraman for proofreading this post. Also thanks to Cutlet the cat for walking across the keyboard and deleting all my work three times today.

I used a local LLM to analyze my journal entries

In 2025, I wrote 162 journal entries totaling 193,761 words.

In December, as the year came to a close and I found myself in a reflective mood, I wondered if I could use an LLM to comb through these entries and extract useful insights. I’d had good luck extracting structured data from web pages using Claude, so I knew this was a task LLMs were good at.

But there was a problem: I write about sensitive topics in my journal entries, and I don’t want to share them with the big LLM providers. Most of them have at least a thirty-day data retention policy, even if you call their models using their APIs, and that makes me uncomfortable. Worse, all of them have safety and abuse detection systems that get triggered if you talk about certain mental health issues. This can lead to account bans or human review of your conversations.

I didn’t want my account to get banned, and the very idea of a stranger across the world reading my journal mortifies me. So I decided to use a local LLM running on my MacBook for this experiment.

Writing the code was surprisingly easy. It took me a few evenings of work—and a lot of yelling at Claude Code—to build a pipeline of Python scripts that would extract structured JSON from my journal entries. I then turned that data into boring-but-serviceable visualizations.

This was a fun side-project, but the data I extracted didn’t quite lead me to any new insights. That’s why I consider this a failed experiment.

The output of my pipeline only confirmed what I already knew about my year. Besides, I didn’t have the hardware to run the larger models, so some of the more interesting analyses I wanted to run were plagued with hallucinations.

Despite how it turned out, I’m writing about this experiment because I want to try it again in December 2026. I’m hoping I won’t repeat my mistakes again. Selfishly, I’m also hoping that somebody who knows how to use LLMs for data extraction tasks will find this article and suggest improvements to my workflow.

I’ve pushed my data extraction and visualization scripts to GitHub. It’s mostly LLM-generated slop, but it works. The most interesting and useful parts are probably the prompts.

Now let’s look at some graphs.

Everybody loves graphs

I ran 12 different analyses on my journal, but I’m only including the output from 6 of them here. Most of the others produced nonsensical results or were difficult to visualize.

For privacy, I’m not using any real names in these graphs.

Here’s how I divided time between my hobbies through the year:

A graph of how I divided time between my hobbies through the year 2025

Here are my most mentioned hobbies:

A bar chart of my most mentioned hobbies in the year 2025

This one is media I engaged with. There isn’t a lot of data for this one:

A bar chart of the media I engaged with most in the year 2025

How many mental health issues I complained about each day across the year:

A GitHub style graph of the number of mental health issues I complained about each day across the year 2025

How many physical health issues I complained about each day across the year:

A GitHub style graph of the number of physical health issues I complained about each day across the year 2025

The big events of 2025:

A simple image listing all the notable events in my life in 2025

The communities I spent most of my time with:

A bar chart of the communities I mentioned most in my journal in the year 2025

Top mentioned people throughout the year:

A bar chart of the top people mentioned in my journal entry in the year 2025

Tech stack

I ran all these analyses on my MacBook Pro with an M4 Pro and 48GB RAM. This hardware can just barely manage to run some of the more useful open-weights models, as long as I don’t run anything else.

For running the models, I used Apple’s mlx-lm package.

Picking a model took me longer than putting together the data extraction scripts. People on /r/LocalLlama had a lot of strong opinions, but there was no clear “best” model when I ran this experiment. I just had to try out a bunch of them and evaluate their outputs myself.

If I had more time and faster hardware, I might have looked into building a small-scale LLM eval for this task. But for this scenario, I picked a few popular models, ran them on a subset of my journal entries, and picked one based on vibes.

This project finally gave me an excuse to learn all the technical terms around LLMs. What’s quantization? What does the number of parameters do? What does it mean when a model has instruct, coder, thinking, or A32 in its name? What is a reasoning model? What’s MoE? What are active parameters? This was fun, even if my knowledge will be obsolete in six months.

In the beginning, I ran all my scripts with Qwen 2.5 Instruct 32b at 8-bit quantization as the model. This fit in my RAM with just enough room left over for a browser, text editor, and terminal.

But Qwen 2.5 didn’t produce the best output and hallucinated quite a bit, so I ran my final analyses using Llama-3.3 70B Instruct at 3bit quantization. This could just about fit in my RAM if I quit every other app and increased the amount of GPU RAM a process was allowed to use.

While quickly iterating on my Python code, I used a tiny model: Qwen 3 4b Instruct quantized to 4bits.

Deciding what questions to ask

A major reason this experiment didn’t yield useful insights was that I didn’t know what questions to ask the LLM.

I couldn’t do a qualitative analysis of my writing—the kind of analysis a therapist might be able to do—because I’m not a trained psychologist. Even if I could figure out the right prompts, I wouldn’t want to do this kind of work with an LLM. The potential for harm is too great, and the cost of mistakes is too high.

With a few exceptions, I limited myself to extracting quantitative data only. From each journal entry, I extracted the following information:

  • List of things I was grateful for, if any
  • List of hobbies or side-projects mentioned
  • List of locations mentioned
  • List of media mentioned (including books, movies, games, or music)
  • A boolean answer to whether it was a good or bad day for my mental health
  • List of mental health issues mentioned, if any
  • A boolean answer to whether it was a good or bad day for my physical health
  • List of physical health issues mentioned, if any
  • List of things I was proud of, if any
  • List of social activities mentioned
  • Travel destinations mentioned, if any
  • List of friends, family members, or acquaintances mentioned
  • List of new people I met that day, if any

None of the models was as accurate as I had hoped at extracting this data. In many cases, I noticed hallucinations and examples from my system prompt leaking into the output, which I had to clean up afterwards. Qwen 2.5 was particularly susceptible to this.

Some of the analyses (e.g. list of new people I met) produced nonsensical results, but that wasn’t really the fault of the models. They were all operating on a single journal entry at a time, so they had no sense of the larger context of my life.

Running the analysis

I couldn’t run all my journal entries through the LLM at once. I didn’t have that kind of RAM and the models didn’t have that kind of context window. I had to run the analysis one journal entry at a time. Even then, my computer choked on some of the larger entries, and I had to write my scripts in a way that I could run partial analyses or continue failed analyses.

Trying to extract all the information listed above in one pass produced low-quality output. I had to split my analysis into multiple prompts and run them one at a time.

Surprisingly, none of the models I tried had an issue with the instruction produce only valid JSON, produce no other output. Even the really tiny models had no problems following the instruction. Some of them occasionally threw in a Markdown fenced code block, but it was easy enough to strip using a regex.

My prompts were divided into two parts:

  • A “core” prompt that was common across analyses
  • Task-specific prompts for each analysis

The task-specific prompts included detailed instructions and examples that made the structure of the JSON output clear. Every model followed the JSON schema mentioned in the prompt, and I rarely ever ran into JSON parsing issues.

But the one issue I never managed to fix was the examples from the prompts leaking into the extracted output. Every model insisted that I had “dinner with Sarah” several times last year, even though I don’t know anybody by that name. This name came from an example that formed part of one of my prompts. I just had to make sure the examples I used stood out—e.g., using names of people I didn’t know at all or movies I hadn’t watched—so I could filter them out using plain old Python code afterwards.

Here’s what my core prompt looked like:

The user wants to reflect on all the notable events that happened to them in the year 2025. They have maintained a detailed journal that chronicles their life. Your job is to summarize and curate the text of their journal in order to surface the most important events.

You will be given a single journal entry wrapped in `<journal_entry>` tags along with further instructions. The instructions will ask you to extract some data from this entry. Only extract the information that is requested from you. Do not include any other text in your response. Only return valid JSON.

Further instructions follow.

To this core prompt, I appended task-specific prompts. Here’s the prompt for extracting health issues mentioned in an entry:

Extract physical health information **experienced by the author** in this journal entry.

1. Focus on the author: do not extract health issues relating to other people (friends, family, partners) mentioned in the text.
2. Symptoms & conditions: extract specific physical symptoms (e.g., "Sore throat", "Migraine", "Back pain") or diagnosed conditions (e.g., "Flu", "Covid"). Normalize informal descriptions to standard terms where possible (e.g., "my tummy hurts" -> "Stomach ache").
3. Exclusions:
   - Do not include general fleeting states like "tired", "hungry" unless they are described as severe (e.g. "Exhaustion").
   - Do not include physical activities (e.g., "went for a run") unless they resulted in an injury or pain.
   - Do not include purely mental health issues (e.g. "Anxiety", "Depression"), though physical symptoms of them (e.g. "Panic attack symptoms" like racing heart) can be included if explicitly physical.

## JSON structure

Return a single JSON object with this exact key:

- `physical_health_issues`: Array of strings (e.g., ["Back pain", "Stomach ache"]).

If no relevant issues are found, return an empty array.

## Example output

### Example 1: Issues found

```json
{
  "physical_health_issues": ["Sore throat", "Headache"]
}
```

### Example 2: No issues found

```json
{
  "physical_health_issues": []
}
```

You can find all the prompts in the GitHub repository.

The collected output from all the entries looked something like this:

{
  "physical_sick_days": [
    "2025-01-01.md",
    "2025-01-03.md",
    "2025-01-06.md",
    "2025-01-07.md",
    "2025-01-08.md",
    "2025-01-10.md",
    "2025-01-11.md",
    "2025-01-12.md",
    "2025-01-13.md",
    ...
  ],
  "physical_sickness_map": {
    "Allergies": [
      "2025-03-19.md"
    ],
    "Allergy": [
      "2025-04-08.md"
    ],
    "Anxiety": [
      "2025-04-02.md",
      "2025-08-23.md"
    ],
    "Back pain": [
      "2025-01-15.md",
      "2025-07-25.md"
    ],
    "Body feeling tired": [
      "2025-08-22.md"
    ],
    "Brain feeling tired": [
      "2025-08-22.md"
    ],
    "Brain fog": [
      "2025-02-07.md"
    ],
    "Burning eyes": [
      "2025-11-20.md"
    ],
    "Cat bite on the face": [
      "2025-04-24.md"
    ],
    "Chronic sinus issue": [
      "2025-08-22.md"
    ],
    "Clogged nose": [
      "2025-02-04.md"
    ],
    ...
  }
}

Dealing with synonyms

Since my model could only look at one journal entry at a time, it would sometimes refer to the same health issue, gratitude item, location, or travel destination using different synonyms. For example, “exhaustion” and “fatigue” should refer to the same health issue, but they would appear in the output as two different issues.

My first attempt at de-duplicating these synonyms was to keep a running tally of unique terms discovered during each analysis and append them to the end of the prompt for each subsequent entry. Something like this:

Below is a list of health issues already identified from previous entries. If the journal entry mentions something that is synonymous with or closely equivalent to an existing term, use the EXISTING term exactly. Only create a new term if nothing in the list is a reasonable match.

- Exhaustion
- Headache
- Heartburn

But this quickly led to some really strange hallucinations. I still don’t understand why. This list of terms wasn’t even that long, maybe 15-20 unique terms for each analysis.

My second attempt at solving this was a separate normalization pass for each analysis. After an analysis finished running, I extracted a unique list of terms from its output file and collected them into a prompt. Then asked the LLM to produce a mapping to de-duplicate the terms. This is what the prompt looked like:

You are an expert data analyst assisting with the summarization of a personal journal.

You will be provided with a list of physical health-related terms (symptoms, illnesses, injuries) that were extracted from various journal entries.

Because the extraction was done entry-by-entry, there are inconsistent naming conventions (e.g., "tired" vs "tiredness", "cold" vs "flu-like symptoms"). Your task is to normalize these terms into a cleaner, consolidated set of categories.

# Instructions

1. **Analyze the input list:** Look for terms that represent the same underlying issue or very closely related issues.
2. **Determine canonical names:** For each group of synonyms/variants, choose the most descriptive and concise canonical name. (e.g., map "tired", "exhausted", "fatigue" -> "Fatigue").
3. **Map every term:** Every term in the input list MUST appear as a key in your output map. If a term is already good, map it to itself (or a capitalized version of itself).
4. **Output format:** Return a JSON object where keys are the *original terms* from the input list, and values are the *canonical terms*.

# Example

Input:

```json
["headache", "migraine", "bad headache", "tired", "exhaustion", "sore throat"]
```

Output:

```json
{
  "headache": "Headache",
  "migraine": "Migraine",
  "bad headache": "Headache",
  "tired": "Fatigue",
  "exhaustion": "Fatigue",
  "sore throat": "Sore Throat"
}
```

Now, process the following list of terms provided by the user. Return ONLY the JSON object.

There were better ways to do this than using an LLM. But you know what happens when all you have is a hammer? Yep, exactly. The normalization step was inefficient, but it did its job.

This was the last piece of the puzzle. With all the extraction scripts and their normalization passes working correctly, I left my MacBook running the pipeline of scripts all day. I’ve never seen an M-series MacBook get this hot. I was worried that I’d damage my hardware somehow, but it all worked out fine.

Data visualization

There was nothing special about this step. I just decided on a list of visualizations for the data I’d extracted, then asked Claude to write some matplotlib code to generate them for me. Tweak, rinse, repeat until done.

This was underwhelming

I’m underwhelmed by the results of this experiment. I didn’t quite learn anything new or interesting from the output, at least nothing I didn’t already know.

This was only partly because of LLM limitations. I believe I didn’t quite know what questions to ask in the first place. What was I hoping to discover? What kinds of patterns was I looking for? What was the goal of the experiment besides producing pretty graphs? I went into the project with a cool new piece of tech to try out, but skipped the important up-front human-powered thinking work required to extract good insights from data.

I neglected to sit down and design a set of initial questions I wanted to answer and assumptions I wanted to test before writing the code. Just goes to show that no amount of generative AI magic will produce good results unless you can define what success looks like. Maybe this year I’ll learn more about data analysis and visualization and run this experiment again in December to see if I can go any further.

I did learn one thing from all of this: if you have access to state-of-the-art language models and know the right set of questions to ask, you can process your unstructured data to find needles in some truly massive haystacks. This allows you analyze datasets that would take human reviewers months to comb through. A great example is how the NYT monitors hundreds of podcasts every day using LLMs.

For now, I’m putting a pin in this experiment. Let’s try again in December.

The only correct recipe for making chai

All my friends have their own personal recipes for making chai. I love my friends, so it hurts me to say that they’re wrong. My friends are, unfortunately, wrong about chai. I’m still coming to terms with this upsetting fact, but I’ll live.

What follows is the only correct recipe for making chai.

The ingredients

The only correct choice of tea leaves is Tata Tea Gold. Keep it in an airtight jar. Shake it up a bit so there’s an even mix of smaller grains and whole tea leaves. The smaller grains make for a stronger chai and they tend to settle at the bottom, so take that into account when measuring.

You need full-cream milk for this recipe. Amul Gold is a good choice. I buy the tetrapacks because they survive in the fridge for longer, but the plastic bags work as well. According to the pack, Amul Gold has 6% fat. If you can’t find Amul Gold, try to find an equivalent milk.

For a basic chai, you only need tea leaves, water, sugar, and milk. But we don’t want to make a basic chai, do we? No. So we’re going to add some elaichi (green cardamom) and saunf (fennel).

Try to find fresh spices, if you can. I don’t have recommendations for specific brands here because most of them are fine.

I learned the hard way that you get two kinds of saunf in the supermarket: green and brown. Green saunf tastes sweet and fresh, almost like a dessert. The brown saunf has a stronger flavor but is also bitter. We want the green saunf.

Sometimes you find old elaichi at the store that’s gone a bit brown. Don’t buy that. Your elaichi should be green in color, just like the saunf.

The recipe

This recipe makes three cups of chai. Why three? Because that’s how much chai I drink every day. You can adjust this recipe to make more or fewer cups, as long as you keep all the ratios the same.

Dig out your mortar and pestle from the drawer it has been languishing in. Add six pods of elaichi—two for each cup. Add half a tablespoon of saunf. You can use a bit more of both these spices if you want a more flavorful chai. Grind the spices into a semi-powdery mix. You don’t have to turn it into a fine powder, just grind them enough so that the flavors come through.

Put two cups of water in a saucepan and add the spice mix. Put it on a high flame until boiling. When the water is boiling, reduce the flame to medium.

Add three dessert spoons full of tea leaves to the boiling water. A dessert spoon is slightly smaller than a tablespoon. If all you have is a tablespoon, try about 3/4 tablespoons of tea leaves for each cup. Then add the same amount of sugar. You can adjust the amount of sugar based on how sweet you want your chai, but if you don’t add enough sugar the flavors won’t come through.

Allow the mixture to boil on the stove for about 3-4 minutes. Then add a cup of milk. At this stage you should add a tiny bit of extra milk to account for the water evaporating, otherwise you won’t have three full cups of chai. About 1/5 of a cup should be enough, but I’ve been known to add a bit more to make the chai richer.

Stir the mixture a bit to ensure everything is properly mixed together, then allow it to sit on the stove until the milk boils over.

This next step is crucial. It will make or break your chai. I swear it’s not superstition.

When the milk boils over, turn the stove to simmer. Allow it to settle back down into the pan. Then turn it up to medium heat again until it boils over once more. Repeat one more time. The milk should boil over and settle down three times total.

Your chai is ready! Use a strainer to strain it into cups and enjoy.

Should you eat a Parle-G with your chai? Maybe a Rusk? I have strong opinions on this matter but I’m running out of time, so I’ll leave that decision up to you.

Write quickly, edit lightly, prefer rewrites, publish with flaws

Over two years of consistent writing and publishing, I’ve internalized a few lessons for producing satisfying—if not necessarily “good”—work:

  1. Write quickly
  2. Edit lightly
  3. Prefer rewriting to editing
  4. Publish with flaws

I covered similar ground previously in Writing without a plan. This post builds on the same idea.

Write quickly

If I want to see the shape of the idea I’m trying to communicate in my writing, I must get it down on paper as quickly as possible. This is similar to how painters lay down underdrawings on canvas before applying paint.

I can’t judge the quality of my idea unless I finish this underdrawing. Without this basic sketch to guide me, I might end up writing the wrong thing altogether. More than once, I’ve slaved away at a long blog post for days, only to realize that my core thesis was bunk. Writing quickly allows me to see the idea in its entirety before I waste time and energy refining it.

How do I define quickly? For blog posts like this one, I try to produce a first draft in about 45 minutes. For longer pieces, I take about the same time but work in broad strokes and make heavy use of placeholders.

Edit lightly

It’s easy to edit the life and vitality out of a piece by over-editing it. I’ve done it many times. I’m prone to spending hours upon hours polishing the same few paragraphs in a work, complicating my sentences by attaching a hundred sub-clauses, burying important ideas under mountains of caveats, turning direct writing into purple prose, and inflating my word counts to planetary proportions.

Light edits to a first draft improve my writing. If I keep going, I reach a point of diminishing returns where every new edit feels like busywork. And then, if I keep going some more, I start making the writing worse rather than better.

Spending too much time editing puts me in a mental state that’s similar to semantic satiation, but at the scale of a full essay or story. The words in front of my eyes begin to lose their meaning, ideas become muddled, and I can no longer tell if anything I’ve written makes sense at all. At that point, I have no choice but to walk away from the work and come back to it another day. It’s no fun.

I try to spend a little more time editing than I do writing, but only a little. I’ve learned to recognize that if editing a draft takes me significantly longer than it took me to write it, there’s probably something wrong with the piece. If editing takes too long, it’s better to throw it away and redo from start.

Prefer rewriting to editing

If it’s taking too long to edit, rewrite.

By writing quickly, I’ve convinced my brain that rewriting something wholesale is cheap and easy. It’s profitable and practical for me to write out a single idea multiple times, exploring it from different angles, finding new insight and depth every time I take a fresh stab at it.

If writing a first draft takes 45 minutes, making multiple attempts at the same idea is no big deal. If it takes four hours, I’m more likely to go with my first attempt. Spending too much time on first drafts is a good way for me to get married to bad ideas.

I wrote this very blog post three times because I couldn’t quite capture what I wanted to say in the first two drafts. The content of the post changed entirely with every new attempt, but the core ideas remained the same.

Publish with flaws

No piece of writing is ever perfect. If I keep looking, I can find flaws in every single piece of writing I’ve ever published. I find it a waste of time to keep refining my work once it reaches the good enough stage. If I’ve communicated my ideas clearly and haven’t misrepresented any facts, I can allow a few clumsy sentences or a bad opening paragraph to slide.

Even as I publish imperfect work, I try to look back at my past writing, notice the mistakes I keep repeating, and try to do better next time. I find that publishing a lot of bad work and learning from each mistake is a better way to learn and grow compared to writing a small number of “perfect” pieces.

Write bad, have fun

By working quickly, I’ve been able to produce a lot of bad-to-mediocre writing, but I feel satisfied. As I keep saying, finding joy in the work I do is more important to me than producing something extraordinary.

I’d rather write a hundred bad essays with gleeful abandon than slave over a single perfect manuscript. There’s joy in finishing something, closing the book on it, calling it a day, and moving on. There’s joy in trying out different styles, voices, subjects, ideas, personalities. There’s joy in knowing that there will always be a next thing to write, and the next, and the next.

When I’m stuck writing something that’s not fun to work on, I find a certain consolation in knowing that I’ll be done soon. That my sloppy writing process means I’m allowed to finish my piece quickly, put it out into the world, and move on to something more enjoyable.

Now you’ve reached the end of this post, and I don’t quite know how to leave you with a solid kicker. Instead of doing a good job, I’ll end with this Ray Bradbury quote that I copied off somebody’s blog:

Don’t think. Thinking is the enemy of creativity. It’s self-conscious and anything self-conscious is lousy. You can’t “try” to do things. You simply “must” do things.

Perfect. I’ve never liked thinking anyway.

Generative AI and the era of increased gatekeeping

Generative AI models can create text, images, code, and music faster and in larger quantities than our ability to absorb them. Before ChatGPT was introduced to the world in November 2022, producing a piece of media took longer than consuming it. In 2026, the equation has been turned on its head.

If your job—thus far—involved curating or evaluating the work of other humans, this is a problem. Generative AI is bad news for teachers and professors, editors at magazines and publishing companies, maintainers of open-source projects, academics doing peer-review or replicating studies, and anyone else who must review the work of their peers in order to give them feedback or as quality control.

If you have a job that fits this description, then you’ve probably been inundated with a deluge of low-quality AI-generated content in the past few months. It only takes a few minutes for somebody to “write” a short story using an LLM; it takes a human hours to read and evaluate it.

In response to the increasing burden on curators, organizations are tightening the rules around how they handle submissions. Some are taking the moderate stance of asking AI-generated submissions to be identified and cleaned up prior to submission, but many are banning outside contributions altogether. For example:

Other organizations are placing strict restriction on number of submissions and making submission rules more stringent:

This is a net negative for society. Organizations lose out on potentially good contributions, people early in their careers lose out on a chance to get feedback from experienced professionals, and the rest of us lose because fewer good works make their way into publications and the commons.

I see three possible futures ahead of us.

First: the novelty of using ChatGPT to produce work and throw it over the wall without reading it wears off. It becomes a social faux pas to submit AI-generated work for publication without extensively vetting and editing it. Enough people are named and shamed that new social norms around the use of generative AI emerge. Our societies adapt so that putting your name on a work without verifying its quality is an act that destroys your reputation.

Second: we come up with methods to prove that you have in fact done the work you claim to have done. Like proof of work in cryptography1, but for humans. Submitting anything without proof of work becomes an automatic rejection. I can’t imagine what this would look like, though. More importantly, I can’t imagine that we will collectively agree to put ourselves through the indignity of being judged by an algorithm. But hey points to everything look at the world we’ve made. Society has a high tolerance for algorithmically inflicted indignities.

Third: we enter a new era of gatekeeping, in which most of us can no longer fix a bug in our favorite open-source projects, submit stories to literary magazines, apply for public job postings, or get peer-review on our papers. Unless you’re a well-known name, or you know somebody who knows somebody, or you can get somebody to vouch for the veracity of your work, you’re considered a nonentity. An era of eroding trust, where anything created by a stranger you don’t personally know is considered suspect. An era of increased gatekeeping that only allows some of us to publish, and the rest of us perish.

Personally, I think we’ll land on a combination of the three possible outcomes. Some organizations will name and shame, some will ask for proof of work, and yet others will step up their gatekeeping. And who knows, there’s probably a secret fourth option that I haven’t thought of. I’ve never been great at predicting the future.

That said, I remain optimistic2 about our ability to handle this situation. I believe people are generally nice and just want to help, even the ones sending 5,000 line vibecoded pull requests to open-source projects. Our societies are still adjusting to a strange new technology, and the social norms around its use have not been written yet. Until we collectively figure out how to behave reasonably, we might see slightly increased gatekeeping, but my hunch is that it’ll be temporary.

I believe we’ll eventually get to a point where we all learn to be editors and reviewers and slush-pile readers of our own AI-generated work. That’s an interesting future to consider: one in which generative AI has turned us all into more discerning readers.

Footnotes

  1. Cryptography, not cryptocurrency. The crypto-bros have given perfectly reasonable mathematical techniques a bad name, so I feel it’s important to mention that here.

  2. Sloptimistic? Ha.

Pushing the smallest possible change to production

I wrote this post as an exercise during a meeting of IndieWebClub Bangalore.

During my first week of work with a new client, I like to push a very small, almost-insignificant change into production. Something that makes zero difference to the org’s product, but allows me to learn how things are done in the new environment I’m going to be working in.

If my client already has a working webapp, this change could be as simple as fixing a typo. If they don’t, I might build a tiny “Hello world!” app using a framework of their choice and make it available at a URL that’s accessible to everyone within the company (or at least everyone who is involved in the project I’m working on).

This exercise helps me figure out everything I need to navigate the workplace and be productive within its constraints. It’s better than any amount of documentation, meetings, or one-on-ones.

  • Where is the source code hosted? How do I get access to it? Who will give me access?
  • How do I build and run the software on my dev machine? Is there documentation? Is there somebody who can guide me through the process?
  • What does the version control strategy look like? What workflows am I expected to follow? Are there special conventions for naming branches?
  • Does the codebase have automated tests? Is there a CI server?
  • What’s the process for getting a change merged? Should I open a PR and wait for a code review? How long do code reviews typically take? Who reviews my code?
  • Is there a staging server? When does staging get merged into production?
  • How can I provision new servers if I need them? Who will help me do that?
  • Are there any third-party services in play? What providers does the org use for auth, CDN, media transformation, LLMs? How do I get access to these services? Alternatively, how do I mock them in development?

Doing this work after I’ve already spent weeks building out features is frustrating. When I’m in the middle of solving a problem, I want to iterate fast and get my work in front of users and managers as quickly as possible. I like to go into meetings and stand-up calls with working prototypes that people can play with on their own computers, not with vague promises of code that kinda-sorta works on my own machine.

This work also brings me in contact with a variety of people from across the organization, which is always helpful. I like being able to reach out to my co-workers when I’m stuck. As an independent contractor, I can only do that if I put in the effort to build relationships with my team. I also like to have a sense of camaraderie with my co-workers. I want to see my co-workers as more than just names on a Slack channel, which is only possible if I actually talk to them.

Pushing the smallest possible change into production helps me do all this and sets the tone for a fruitful working relationship. Plus, it’s always satisfying to end your first week of work at a new workplace with something tangible to show for it.

I'm not listening to full albums anymore

In the last few years, I’ve lost my appetite for discovering new music. The main reason is that I’ve focused on listening to full albums for most of the last decade, but increasingly I find it irritating and anxiety-inducing to listen to albums from start to finish.

I’ve been reading/watching music reviews online since my early twenties. In this time, I’ve also been part of a number of music communities online. One defining characteristic of music discussions online—at least in the Anglophone world—is that they’re centered entirely around the album.

To most critics and “serious” music fans, the album seems to be a single, indivisible unit of music that must always be considered as a whole. Five minutes with a search engine will dredge up hundreds of blog posts and op-eds touting the benefits of listening to complete albums rather than individual tracks. Some people almost attach morality to the act of listening to an album. If you’re not listening to full albums from start to end, in a darkened room, with your eyes closed, with noise-cancelling headphones, then do you even respect the music? Are you even a true fan?

In my mid-twenties, I wanted to be a musician (I still do, but not as intensely). So when I started taking music more seriously myself, I focused my listening mainly on albums. This wouldn’t have been such a terrible thing had it not been for the fact that I didn’t enjoy listening to full albums. At all. Never had, never will.

I grew up in an era of cassette tapes. When I was a teenager, my parents would take me to the local Planet M once a month, where I would be allowed to buy exactly one album on tape. My family couldn’t afford CDs because CDs cost ₹400-500, whereas cassette tapes cost ₹100-150. So even when most of the world had moved on to CDs, I was still listening to all my music on tape.

When you’re listening to albums on tape, you’re not really skipping around the tracklist. Sure, it’s possible to skip tracks on a cassette tape. That’s what the rewind and fast-forward buttons are for. But it’s not easy. You have to know precisely how long to rewind and/or fast-forward so you land where you want to. If you don’t get it right the first time, it becomes a frustrating back-and-forth dance between the rewind and fast-forward buttons until you manage to find the exact spot you need to be. Kind of like parallel parking in a tight spot.

Listening to most of my music on tape, I should’ve grown up to be the sort of adult who enjoys listening to albums, right? But that’s not what happened. The moment I discovered I could record my favorite tracks from their original tapes onto blanks, I stopped listening to full albums altogether. I had discovered the joys of making mixtapes, and there was no going back.

When my family finally bought a computer, and I discovered how to download MP3s from the internet, I completely gave up on listening to full albums or even downloading them illegally. Freed from the constraints of linear analog media, I began collecting individual MP3s, making playlists, and curating music for myself.

This changed in my twenties. As I started participating in music communities, going to gigs and festivals, and running a music blog, I also started forcing myself to listen to albums. That’s how all the pros did it, after all, and didn’t I want to be a pro?

However, even when I was listening to nearly a hundred new albums each year, they didn’t quite make sense to me. They still don’t. To me, they just seem like a convenient packaging for a collection of music. Outside of some loose themes and sonic similarities that hold an album together, I don’t see why a certain set of tracks placed in a certain sequence makes for a better listening experience than a slightly altered set of tracks in a slightly altered sequence.

There’s a lot of talk about the artistic intent that goes into curating and sequencing an album. But when artists play their music live, they often curate setlists by mixing and matching tracks from several different albums. DJs go even further, curating their mixes from tracks by many different artists, genres, and eras. If musicians themselves don’t constrain themselves to the album format, why must I?

Sure, there are some artists who have turned the album into an art form. There are albums out there that are designed to be one unified, cohesive experience. But those albums are exceedingly rare. I’m willing to bet less than one in a hundred albums is designed to be listened to as a unit. Most albums are just collections of tracks that an artist made in a certain time period, or which share a common theme or sound.

Albums also seem to me a modern invention, one that came about because of the technical limitations of recording media rather than a human tendency for enjoying a certain amount of music at a time. An LP is about 40-50 minutes long, not because that’s a magic number but because that’s how much music a vinyl record can hold. That’s why so many rock albums are still around 45-50 minutes long to this day. Rock music rose to prominence in the heyday of the vinyl record. On the other hand, hip-hop rose to prominence around the time audio CDs became more common, which is why many rap albums are a bit longer at around 60-70 minutes. The length of an album has little to do with something inherent in the genre or user preferences and everything to do with the technical limitations of the media it’s distributed on.

And now that streaming music has become more common, we see many artists breaking the mold. Some artists release albums that last several hours, while others only ever release singles. Unless an artist is planning to release physical versions of their music, they’re no longer constrained to the album format.

So after more than a decade of forcing myself to listen to albums, I too am releasing myself from the constraint of album-centric listening. Starting this year, I’m going to listen to music in the way I enjoy: by seeking out individual tracks that move me, and arranging them into curated playlists for myself and my friends.

To discover new music, I’m listening to more singles, curated playlists, and radio stations. I’m reading Bandcamp editorials and diving deep into obscure tags. I’m allowing myself to open an artist’s Spotify page and click around on whatever tracks catch my fancy. I’m even allowing myself to listen to albums on shuffle, something I’ve already done with Audrey Hobert’s Who’s the Clown over this last week.

I’m hoping that by freeing myself to listen to music in the way I want will allow me to discover a lot more this year.

Tech predictions for 2026, presented without nuance, context, or evidence

I wrote this post as an exercise during a meeting of IndieWebClub Bangalore.

I’m writing these in order, from most likely to happen to least likely.

  • GTA6 becomes the biggest game release of the year, breaking all previous sales records on launch day.
  • LLMs become a load-bearing part of all developer workflows. A little less than half of all commits on Github in 2026 are generated using coding assistants.
  • A new open-world 3D Mario game is announced in the summer and released in the holiday season.
  • The Playstation 6 is not announced.
  • Apple turns the Mac Studio into a dedicated computer for running LLMs and machine-learning models, including support for rack mounting and disgusting amounts of RAM. MLX becomes the most widely supported method of running open-weights LLMs.
  • Hardware to run local LLMs becomes more affordable, and multiple startups make it even easier with “it just works” boxes you can buy and stash in a corner of your office. This massively cuts into OpenAI and Anthropic’s revenues.
  • The launch of the new Steam Machine, along with multiple new SteamOS-based handhelds, finally pushes Linux market-share in the Steam hardware survey to about 5%.
  • Folding phones make up for at least 20% of all new smartphone sales.
  • Apple releases a folding iPhone.
  • The most egregious design and accessibility sins caused by Liquid Glass are slowly rolled back over the course of the year.
  • Spotify launches fully AI generated podcasts. The product is discontinued after massive public backlash.
  • A new, better version of Siri is released. It actually works as advertised.
  • The new, improved version of Siri is immediately compromised via prompt injection attacks within 4-6 weeks of release.
  • Multiple world governments—led by the EU—prohibit the use of American software and cloud services for government work.
  • Apple and Microsoft ship competent local LLMs built into Windows and macOS.
  • Apple and Google are forced to allow side-loading on all their devices after a massive antitrust lawsuit brought on by a coalition of tech companies and advocacy organizations. The change is region-locked to the EU, US, Japan, South Korea, and China.
  • Microsoft buys OpenAI.

Getting a Gemini API key is an exercise in frustration

Last week, I started working on a new side-project. It’s a standard React app partly made up of run-of-the-mill CRUD views—a perfect fit for LLM-assisted programming. I reasoned that if I could get an LLM to quickly write the boring code for me, I’d have more time to focus on the interesting problems I wanted to solve.

I’ve pretty much settled on Claude Code as my coding assistant of choice, but I’d been hearing great things about Google’s Gemini 3 Pro. Despite my aversion to Google products, I decided to try it out on my new codebase.

I already had Gemini CLI installed, but that only gave me access to Gemini 2.5 with rate limits. I wanted to try out Gemini 3 Pro, and I wanted to avoid being rate limited. I had some spare cash to burn on this experiment, so I went looking for ways to pay for a Gemini Pro plan, if such a thing existed.

Thus began my grand adventure in trying to give Google my money.

What is a Gemini, really?

The name “Gemini” is so overloaded that it barely means anything. Based on the context, Gemini could refer to:

  • The chatbot available at gemini.google.com.
  • The mobile app that lets you use the same Gemini chatbot on your iPhone or Android.
  • The voice assistant on Android phones.
  • The AI features built into Google Workspace, Firebase, Colab, BigQuery, and other Google products.
  • Gemini CLI, an agentic coding tool for your terminal that works the same way as Claude Code or OpenAI Codex.
  • The Gemini Code Assist suite of products, which includes extensions for various IDEs, a GitHub app, and Gemini CLI.
  • The underlying LLM powering all these products.
  • Probably three more products by the time I finish writing this blog post.

To make things even more confusing, Google has at least three different products just for agentic coding: Gemini Code Assist (Gemini CLI is a part of this suite of products), Jules, and Antigravity.

And then there’s a bunch of other GenAI stuff that is powered by Gemini but doesn’t have the word Gemini in the name: Vertex AI Platform, Google AI Studio, NotebookLM, and who knows what else.

I just wanted to plug my credit card information into a form and get access to a coding assistant. Instead, I was dunked into an alphabet soup of products that all seemed to do similar things and, crucially, didn’t have any giant “Buy Now!” buttons for me to click.

In contrast, both Anthropic and OpenAI have two primary ways you can access their products: via their consumer offerings at claude.ai and chatgpt.com respectively, or via API credits that you can buy through their respective developer consoles. In each case, there is a form field where you can plug in your credit card details, and a big, friendly “Buy Now!” button to click.

After half an hour of searching the web, I did the obvious thing and asked the free version of Gemini (the chatbot, not one of those other Geminis) what to do:

How do I pay for the pro version of Gemini so i can use it in the terminal for writing code? I specifically want to use the Gemini 3 Pro model.

It thought for a suspiciously long time and told me that Gemini 3 Pro required a developer API key to use. Since the new model is still in preview, it’s not yet available on any of the consumer plans. When I asked follow up questions about pricing, it told me that “Something went wrong”. Which translates to: we broke something, but we won’t tell you how to fix it.

So I asked Claude for help. Between the two LLMs, I was able to figure out how to create an API key for the Gemini I wanted.

Creating an API key is easy

Google AI Studio is supposed to be the all-in-one dashboard for Google’s generative AI models. This is where you can experiment with model parameters, manage API keys, view logs, and manage billing for your projects.

I logged into Google AI Studio and created a new API key. This part was pretty straightforward: I followed the on-screen instructions and had a fresh new key housed under a project in a few seconds. I then verified that my key was working with Gemini CLI.

It worked! Now all that was left to do was to purchase some API credits. Back in Google AI Studio, I saw a link titled “Set up billing” next to my key. It looked promising, so I clicked it.

That’s where the fun really began.

Google doesn’t want my money

The “Set up billing” link kicked me out of Google AI Studio and into Google Cloud Console, and my heart sank. Every time I’ve logged into Google Cloud Console or AWS, I’ve wasted hours upon hours reading outdated documentation, gazing in despair at graphs that make no sense, going around in circles from dashboard to dashboard, and feeling a strong desire to attain freedom from this mortal coil.

Turns out I can’t just put $100 into my Gemini account. Instead, I must first create a Billing Account. After I’ve done that, I must associate it with a project. Then I’m allowed to add a payment method to the Billing Account. And then, if I’m lucky, my API key will turn into a paid API key with Gemini Pro privileges.

So I did the thing. The whole song and dance. Including the mandatory two-factor OTP verification that every Indian credit card requires. At the end of the process, I was greeted with a popup telling me I had to verify my payment method before I’d be allowed to use it.

Wait. Didn’t I just verify my payment method? When I entered the OTP from my bank?

Nope, turns out Google hungers for more data. Who’d have thunk it?

To verify my payment method for reals, I had to send Google a picture of my government-issued ID and the credit card I’d just associated with my Billing Account. I had to ensure all the numbers on my credit card were redacted by manually placing black bars on top of them in an image editor, leaving only my name and the last four digits of the credit card number visible.

This felt unnecessarily intrusive. But by this point, I was too deep in the process to quit. I was invested. I needed my Gemini 3 Pro, and I was willing to pay any price.

The upload form for the government ID rejected my upload twice before it finally accepted it. It was the same exact ID every single time, just in different file formats. It wanted a PNG file. Not a JPG file, nor a PDF file, but a PNG file. Did the upload form mention that in the instructions? Of course not.

After jumping through all these hoops, I received an email from Google telling me that my verification will be completed in a few days.

A few days? Nothing to do but wait, I suppose.

403 Forbidden

At this point, I closed all my open Cloud Console tabs and went back to work. But when I was fifteen minutes into writing some code by hand like a Neanderthal, I received a second email from Google telling me that my verification was complete.

So for the tenth time that day, I navigated to AI Studio. For the tenth time I clicked “Set up billing” on the page listing my API keys. For the tenth time I was told that my project wasn’t associated with a billing account. For the tenth time I associated the project with my new billing account. And finally, after doing all of this, the “Quota tier” column on the page listing my API keys said “Tier 1” instead of “Set up billing”.

Wait, Tier 1? Did that mean there were other tiers? What were tiers, anyway? Was I already on the best tier? Or maybe I was on the worst one? Not important. The important part was that I had my API key and I’d managed to convince Google to charge me for it.

I went back to the Gemini CLI, ran the /settings command, and turned on the “Enable experimental features” option. I ran the /models command, which told me that Gemini 3 Pro was now available.

Success? Not yet.

When I tried sending a message to the LLM, it failed with this 403 error:

{
  "error": {
    "message": "{\n  \"error\": {\n    \"code\": 403,\n    \"message\": \"The caller does not have permission\",\n    \"status\":\"PERMISSION_DENIED\"\n  }\n}\n",
    "code": 403,
    "status": "Forbidden"
  }
}

Is that JSON inside a string inside JSON? Yes. Yes it is.

To figure out if my key was even working, I tried calling the Gemini API from JavaScript, reproducing the basic example from Google’s own documentation.

No dice. I ran into the exact same error.

I then tried talking to Gemini 3 Pro using the Playground inside Google AI Studio. It showed me a toast message saying Failed to generate content. Please try again. The chat transcript said An internal error has occurred.

At this point I gave up and walked away from my computer. It was already 8pm. I’d been trying to get things to work since 5pm. I needed to eat dinner, play Clair Obscur, and go to bed. I had no more time to waste and no more fucks to give.

Your account is in good standing at this time

Just as I was getting into bed, I received an email from Google with this subject line:

Your Google Cloud and APIs billing account XXXXXX-XXXXXX-XXXXXX is in good standing at this time.

With the message inside saying:

Based on the information you provided and further analysis by Google, we have reinstated your billing account XXXXXX-XXXXXX-XXXXXX. Your account is in good standing, and you should now have full access to your account and related Project(s) and Service(s).

I have no idea what any of this means, but Gemini 3 Pro started working correctly after I received this email. It worked in the Playground, directly by calling the API from JavaScript, and with Gemini CLI.

Problem solved, I guess. Until Google mysteriously decides that my account is no longer in good standing.

This was a waste of time

This was such a frustrating experience that I still haven’t tried using Gemini with my new codebase, nearly a week after I made all those sacrifices to the Gods of Billing Account.

I understand why the process for getting a Gemini API key is so convoluted. It’s designed for large organizations, not an individual developers trying to get work done; it serves the bureaucracy, not the people doing the work; it’s designed for maximum compliance with government regulations, not for efficiency or productivity.

Google doesn’t want my money unless I’m an organization that employs ten thousand people.

In contrast to Google, Anthropic and OpenAI are much smaller and much more nimble. They’re able to make the process of setting up a developer account quick and easy for those of us who just want to get things done. Unlike Google, they haven’t yet become complacent. They need to compete for developer mindshare if they are to survive a decade into the future. Maybe they’ll add the same level of bureaucracy to their processes as they become larger, but for now they’re fairly easy to deal with.

I’m still going to try using Gemini 3 Pro with Gemini CLI as my coding assistant, but I’ll probably cap the experiment to a month. Unless Gemini 3 Pro is a massive improvement over its competitors, I’ll stick to using tools built by organizations that want me as a customer.

My first online shopping experience, at the tender age of twelve

I wrote the first draft of this post as an exercise during a meeting of IndieWebClub Bangalore.

When I was eleven or twelve, a friend at school told me that video games like Age of Empires were built using a language called C. When pressed for details, he had nothing more to tell me. What did he mean by language? Was it like English? How could you use a language to make things move around on the computer screen? He didn’t know. His older brother had informed him about the existence of C and its relationship with video games, but he had inquired no further.

I was desperate to learn C, whatever it was.

Only, I had no idea who to ask or how to go about learning this “language”. I could’ve searched the web, but this was 2002. If I was lucky, I was allowed to use the dial-up for an hour every week. Search engines were bad and my search skills were rudimentary. The web was not yet filled with thousands upon thousands of freely available programming tutorials. And in any case, it was far more important to download the latest Linkin’ Park MP3.

Luckily, Dad knew somebody who worked at the National Institute for Science Communication and Policy Research. They had just published a series of computer education books targeted towards an increasing population of new computer users. My dad had mentioned to him that I was interested in computers, so one day he handed him a giant brown envelope containing two books: a book about the basics of Microsoft PowerPoint, and a slim volume called The C Adventure.

The C Adventure only covered the very basics of C: variables, conditionals, loops, and functions. It didn’t cover structs, pointers, macros, splitting programs into multiple files, build systems, or anything else that would allow me to build the kind of real-world programs I wanted to build. But that didn’t matter. The universe had heard my plea. I finally knew what C was. I could even write a bit of it! They were simple programs that ran in the terminal, but at the time I felt drunk with power. I was one step closer to building Age of Empires.

I could do anything with a computer, anything at all. The only limit was myself.

But The C Adventure wasn’t enough. If I wanted to build games, there was a lot more I’d need to learn: reading and writing files, connecting to the internet, opening windows, rendering 3D graphics, playing sound, writing game AI, and who knew what else. But once again, I didn’t know where to find more learning resources.

The government had come to my aid once, but I couldn’t rely on obscure government departments to come to my aid every time. I had to take matters into my own hands. And I did just that, this time finding my salvation in the free market.

I had no idea where to buy programming books in Delhi, but I reasoned that it might be possible to buy them on the internet. I’d visited American e-commerce websites. Might there be Indian equivalents? I’d seen TV and newspaper ads for a website called Baazee.com. The ads said something about buying and selling. Perhaps this was where I’d find my next C book?

One evening, during my one weekly hour of parentally approved and monitored internet session, I typed Baazee.com into the Internet Explorer 6 URL bar and began my search.

A few minutes of searching led me to a product called “101 Programming eBooks”. It was allegedly a CD containing, well, 101 programming ebooks. The seller had good reviews, and the product description looked compelling so, with an excess of hope, I clicked the buy button.

At the end of the checkout process, the website asked for my credit card details, and that’s where I realized my whole endeavor had been doomed from the very start.

The problem was that my parents didn’t have a credit card at the time, and no amount of convincing on my part would induce them to get one. They’d heard too many horror stories of people getting deep into credit card debt and losing their homes. The newspapers were filled with stories of scams and frauds on the nascent internet, most of which involved stealing people’s credit card details and using them to run up huge bills.

But teenage Ankur needed to learn C. It was a life and death situation, couldn’t they see? I would not allow my parents’ stubborn disapproval of predatory American financial instruments to stand between myself and Age of Empires. So I did what anyone in my place would have done: I begged my Dad to get a credit card, just this once, just for this one purchase. I threw a tantrum. I cried until I ran out of tears.

But nope, it was all in vain. Computers and money were not allowed to mix, not in our household. No credit cards, full stop.

Dejected, I went back to the computer to close all my Internet Explorer 6 windows, when the universe once again chose to smile upon me. The person selling “101 Programming Ebooks” had left their email address on the product description page! Dad urged me to write to them and figure out if we could pay for the CD with cash or cheque.

So I sent an email, and the seller responded with his phone number. He lived in Delhi, not very far from where my family lived. He said we could pick up the CD from his address and pay him in cash. Oh sweet joy! Oh divine providence!

I wanted to go meet the seller myself, and do so immediately. But Dad was more cautious. He first talked to the seller on the phone to make sure he was a real person. He asked him a bunch of questions. When he was satisfied that nothing shady was going on, he went to pick up the CD himself. He might have taken a friend or co-worker with him. He’d read enough scary stories about the internet in the newspapers, and he did not want to appear on page seven or whatever of Times of India.

The seller was just a college kid, still in his early twenties. He was pirating ebooks, burning them to CDs, and selling them online out of his bedroom for some extra cash. My dad was impressed by the entrepreneurial spirit on display, but neither him nor I understood that selling pirated ebooks was illegal. It didn’t matter, though. Everything that had to do with computers and the internet in India was illegal in the 2000s, so nobody cared.

One joyous evening Dad returned home with the CD. It came in an unmarked white paper envelope, with the words “101 Programming Ebooks” scrawled on the disc in permanent marker.

I inserted the disc into the family computer and found it contained exactly what had been advertised: a collection of technical ebooks sorted into directories, mostly published by O’Reilly, in PDF and CHM formats. In fact, there were a lot more than just 101 ebooks in there! My first online purchase had turned out to be incredibly satisfying. The Baazee.com pirate had underpromised and overdelivered.

I spent a lot of time reading the books on that CD. I don’t remember if I ever read any single one of them cover to cover, but I remember dipping in and out of tens of them, picking up something at random whenever the fancy struck me. I remember learning a bit of Perl and writing some simple programs. I remember trying to learn Java but being turned off by public static void main. I remember spending hours reading a book about XML but having no clue why I would want to use it. Could I use it to build Age of Empires? No? Then I didn’t care.

I never ended up building my own version of Age of Empires, but I did go on to use some of the books in the collection to learn and use C (and some C++) profitably for many small projects. Later, when I was in college I even learned some Objective-C, and made a bit of money building an iPad game for a small marketing agency. So technically I’ve been paid to build a video game, and technically some part of it was built with C. Success? Let’s call it a success.

While no single book on the “101 Programming Ebooks” CD changed my life, the collection gave me a vast buffet of tools and technologies to sample from. It expanded my mind and allowed me to see the full spectrum of possibilities in a computer career. Looking back at that event 23 years later, the only book I can remember clearly is the Camel Book, but I’m sure there were many more in that collection that I used to occupy slow evenings.

I sometimes wonder where that college kid is now, the one who was selling pirated ebooks out of his bedroom. Did he go on to start his own tech company? Did he move to America, as so many people in tech do? Or does he still live somewhere in Delhi, ripping Hindi TV shows off Amazon Prime and helping people jailbreak their Nintendo Switches?

Wherever he is, I hope he’s done well for himself. I am forever grateful for “101 Programming Ebooks” and the wild-west internet of the 2000s.