Using Claude to make a house-buying decision

I use AI for small decisions all the time. This is the same loop pointed at the biggest one my partner and I are making right now: buying a house. I wanted to see if the method that helps me pick a rental car or a fueling strategy holds up at life-sized stakes. Mostly it does, with some honest caveats.

To be clear up front: we haven't bought anything. It's an active search, and what the process produced so far is a shared, written-down decision and a real plan. That honesty is kind of the point of the whole site, so I'm leaning into it.

What I actually built

The deliverable is a small private website my partner and I both use, hosted on my own NAS, in two phases: first figure out what we actually want, then get concrete about how to find it. Pages for our criteria, a map of candidate areas, a rough renovation-cost tool, and a search profile we can hand to an agent.

Building the tool matters. It's a thing we both open, on our own phones, that holds the decision in one place, so it doesn't live only in my head or in a chat only I had.

The method

Same loop as everything else I do with Claude. I described the whole situation in one go, then had it fold research straight into the tool:

Launch a research into common house-buying questions and pitfalls. Work those into the tool.

That fanned out background research and pulled the findings (area data, cost ranges, the legal steps) into the product, each with sources and a date.

The best early moment was letting the tool challenge us. We thought we'd basically decided on one area. I asked Claude to actually check it, and the research turned up meaningful aircraft noise there, so we dropped it. A decision tool that talks you out of your first answer is doing its job.

Then I pointed critique agents at my own work, twice:

Run 3 specialist research agents over our questionnaire and have them find the gaps and the wrong questions.

Spawn 5 estate-agent sub-agents. Challenge our findings from their point of view.

Using AI adversarially against your own draft is underrated. It's much better at poking holes in a thing that exists than at producing the perfect thing first try.

The idea worth stealing

My first version of the "what do we want" tool was a simple yes/no swipe on each criterion. It was misleading, and I said so at the time:

The swipe texts are misleading. Most often the answer is: it depends.

The realization was that a single yes/no is being forced to carry three completely different things at once, and collapsing them into one bit is what felt wrong. So the tool now separates them:

1. Sort each criterion:  Dealbreaker / Important / Nice / Don't-care
2. Spend 100 points across the "Important" ones   (you can't max everything)
3. Attach conditions where it depends:
     "yes, only if under €X"   "yes, if it also has Y"
4. Head-to-head trade-offs:  "bigger garden or move-in-ready?"
5. Each of us fills it in alone, then we swap a link and compare

That last step is the useful part. It shows the overlap, the hard no-gos, and a "things to talk about" view of where our scores differ a lot. It surfaces the disagreement instead of quietly averaging it away. For two people, that did more than any ranking.

A couple of other moves that worked: turning "it felt loud there" into an actual map, each area drawn as a walk-radius around its station and colored by noise level. A "modernize step by step" plan for buying something dated but livable and fixing it over time. And a tiered agent strategy instead of "call one" or "call all."

Where AI clearly couldn't help

This is the honest half, and the useful half.

It happily burned a pile of usage over-researching an area I'd already ruled out, until I told it flatly to stop:

You are burning tokens like crazy on [that area]. I dont give a damn about it, stop.

You have to steer this thing hard. Left alone it will go deep on the wrong thing with total confidence.

The first way it framed the problem was too simplistic (the yes/no version). The value came from iterating the structure of the decision itself, over several rounds, while the answers it suggested mattered a lot less. Building your own decision tool also means you own its bugs: at one point a storage glitch held onto old answers and showed one criterion wrong, which is exactly the kind of thing that quietly erodes trust in your own tool.

Most importantly, the genuinely high-stakes parts, the legal and tax and money questions, are precisely the ones the AI would not decide. It was good at structuring those questions and correctly refused to answer them, pointing at a real notary or tax advisor each time. And every number it produced is a starting point to verify. That honesty is baked into the tool, and it should be baked into how you use it.

If there's one thing to take from this: AI is very good at turning a big, vague, emotional, two-person argument into something written down and comparable. It won't make the call for you, and it shouldn't. That part is still yours. It just makes sure you're both looking at the same thing when you make it :)