Tiered Thinking in the Age of AI

Not too long ago, everyone was wondering whether $20 a month for an AI coding tool was worth it. Today, devs are easily blowing past $200 worth of capacity in a single billing cycle, and it’s easy to see a near future where heavy users casually spend $1,000 or more each month. For now, I'm not really here to talk about runtime compute and provider capacity; I'm thinking about the development process itself and how everyone went from talking about how the cost of AI kept dropping that it may as well become free cognition, to quickly feeling like it's not gonna be enough. What used to feel infinite has become rationed.

In a way, it's kinda funny how the platforms themselves are responding. OpenAI, for example, has rolled out a model router that quietly decides whether you really need the flagship model or if a cheaper one will do. On the surface, that’s a cost-saver for the user; but step back and you see the inversion. Instead of freely choosing the most powerful model because you had options, you’re now forced into conserving the big guns. The agency shifts. Providers ration from above, and you ration from below.

In practice, this means developers will start budgeting their thinking across tiers. A dev might use their $20 ChatGPT subscription as a sketchpad — a place for outlines, pseudocode, or quick Q&A. Maybe they'll use Microsoft Copilot at their job to run longer running research tasks (since you only get a handful of those even at Pro tier on ChatGPT). And as a dev, the coding agents (Github Copilot, Cursor, Claude) is where you do the "real work" ... and so you have to make sure you're using that allotment for the "real work" to avoid burning through your credits.

If this sounds familiar, it’s because we’ve been here before. Mainframe engineers once planned their work carefully, then used expensive cycles sparingly. Cloud teams still batch jobs to minimize costly runs. We’re simply returning to a world where the economics of compute shape how we think, not just how we ship.

And maybe this is the part worth remembering. For all the tiers, limits, and routers, the human brain still holds an advantage. It’s the only large language model that computes tokens for the price of a cup of coffee.