Essay20 May 2026 · Engineering · By

Under the Hood: How PINtDEXTER Picks Your Pint

An earlier essay argued for predicting the pint rather than the rating. That piece was about the user side of the recommender — how PINtPOINT learns what you like. This one is the other side: how each beer gets a shape worth recommending. It comes down to four layers, a per-beer DNA, and the misclassified pale lager that exposed the most recent fix.

A recommender is only as good as what it knows about the thing it's recommending.

The default answer, and where it breaks on beer

The dominant approach for the last fifteen years is collaborative filtering — factor a user-by-item rating matrix and predict missing ratings. Netflix made it famous; the cleanest beer example is ŷhat's June 2013 item-item R prototype on 1.5M BeerAdvocate reviews.

The ŷhat post is honest about what makes the approach work. Their worked example pits Fat Tire against Michelob Ultra and Dale's Pale Ale: Fat Tire fans rate Dale's similarly and Michelob lower, and the recommender quantifies the difference. Intuitive — on those three beers. The catch is in the filter: they only considered beers with 500+ reviews, and the demo ran on the 20 most-reviewed beers in the dataset.

That threshold isn't a footnote, it's the design speaking. Collaborative filtering needs density. The UK alone releases more than a hundred new beers a week, and most never accumulate that volume. Cold start hits twice — new users have no history, new beers have no community signal — and the fallback in both cases is popularity, which is a ranked list of famous things rather than a recommendation.

The other half of the field, suddenly cheap

The alternative is content-based recommendation: score items by their own features. Pandora's the canonical version — paid musicologists labelling every song before the recommender saw it. Beautiful in theory, historically prohibitive for a catalogue the size of beer.

There's a quiet ten-year arc worth naming. The ŷhat post said, of the BeerAdvocate review text: "the text does provide some excellent opportunities for analysis, we're going to focus only on the ratings." They threw it away. A decade later, Nishant Kushwaha's 2023 prototype went back and did exactly the analysis they'd skipped — feeding the review text through an LLM to extract attributes per beer. Clever, but the features still depended on how much the community had written about the beer. Cold start in disguise.

What changed in 2025–2026 is that asking an LLM to label the beer itself — not its discussion — became cheap enough to be a practical catalogue step rather than a research luxury. Pandora's labelling problem became a script that runs overnight, with no community discourse required.

What goes into a beer's DNA

Each row in the catalogue gets the following inferred features, plus the ABV and declared style that came from the source data:

Five axes become the visual signature in the app — Sweet, Sour, Body, Bitter, and Strength (the last derived from ABV). The TastePentagon renders them on a beer's detail card and in compact list views: a five-pointed silhouette built to recognise at thumbnail size.

The same five anchor a second view. The FlavourEqualizer swaps the silhouette for eight vertical columns — the five fixed axes (Bitter, Sweet, Sour, Body, ABV) plus three adaptive personality slots that float across hop, malt, and fruit families by score. A NEIPA might surface Citrusy, Tropical, and Juicy; a stout, Roasty, Chocolatey, and Malty. The pentagon answers is this beer my shape at a glance; the equaliser answers it side-by-side — vertical bars that compare cleanly when two beers sit next to each other in Head-to-Head. The four hop labels are a compressed view of the richer six-category teaching taxonomy covered separately in The Drinker's Guide to Hops.

The four layers, end to end

PINtDEXTER stacks four signals. Each answers a different question. Take any one out and recommendations get worse in a different direction.

  1. Preference capture — Sip-or-Skip binary swipes and Head-to-Head pair rounds. Forced choices, not free-form ratings. Output: a soft style profile after about ten interactions.
  2. Sliders (TUNeDEXTER) — the user can take direct control of five flavour sliders plus body and strength when the inferred profile is wrong. The app never overrides explicit slider settings; preference learning is suggestion, not law.
  3. Per-beer DNA — the LLM-scored content features above. Independent of any user. Cached per beer, re-run only when we explicitly backfill or invalidate.
  4. Availability filter — recent tap-list activity at venues near you, drawn from Untappd's public feeds. The recommendation is grounded in a specific bar's recent pours, not a beer name plucked from a global catalogue.

Layers 1 and 2 are the user side. Layer 3 is the beer side. Layer 4 is the world side. The cross-product is what gets surfaced.

The Amstel case

In mid-May 2026, a quiet bug surfaced. A drinker whose profile pointed firmly at pale ales was being served Amstel as an 83% match. That should never have happened. Amstel is a pale lager, not a pale ale.

The trace took about an hour. The beer had been ingested with its style set to "Pale Ale" — the source data was simply wrong. PINtDEXTER's cross-family penalty existed and worked correctly, but it depended on a lookup from declared style to fermentation family. The style said pale ale. The lookup said ale. So the penalty for serving a lager to an ale drinker never fired.

The interesting part was that the system already had the right instinct. The LLM's own rationale described the beer as "a mainstream European lager-leaning pale, clean and restrained despite 'Pale Ale' label." The system knew. The recommender wasn't asking.

The fix was straightforward: give the LLM an explicit family axis and prefer that value when it disagrees with the style-derived one. Once that override existed, the score corrected immediately — not just for this beer, but for the broader class of category-drift errors scattered through the catalogue, from cider misfiled as fruit beer to lambic misfiled as Belgian ale.

That is a small change that fixes a whole class of bug, which is about the nicest thing you can say about an engineering fix.

What we're not claiming

None of this means content-based recommendation is new, or that PINtPOINT invented LLM feature extraction. The contribution is narrower: feature extraction from the beer's own metadata rather than from community discussion, combined with a four-layer pipeline tied to what's been pouring recently at venues nearby — not a global catalogue, not a leaderboard.

Most beer recommenders predict ratings.
PINtPOINT picks the pint that's on the bar.

Frequently asked questions

What if the LLM gets a beer's profile wrong?

Three corrective layers sit on top. The cross-family penalty catches lager-pretending-to-be-ale errors. The TUNeDEXTER sliders let the user override the inferred preferences directly. And confidence influences scoring — low-signal beers (no specific cue from name or brewery) get treated more cautiously in ranking. Mis-extractions still happen, but they get caught by the architecture rather than escaping into the recommendation silently.

How quickly does a brand-new beer become recommendable?

Usually within a day or two of the beer being publicly listed somewhere — sometimes longer. The point is that the recommendation shape doesn't depend on community ratings accumulating first: once a beer is in our catalogue with style and ABV, the LLM can score it without waiting for anyone to write about it.

What are PINtDEXTER's four layers?

Layer 1 captures preference via Sip-or-Skip binary swipes and Head-to-Head pair rounds. Layer 2 lets the user override directly with five flavour sliders plus body and strength (TUNeDEXTER). Layer 3 is the per-beer LLM-inferred DNA — the content features. Layer 4 scores those features against what's been pouring recently at venues near you.

What is the fermentation-family override?

Two beers with similar flavour numbers can taste fundamentally different if one is an ale and one is a lager — yeast character, finish dryness, and clean-versus-estery profile diverge sharply across fermentation families. PINtDEXTER applies a 0.5x penalty to cross-family matches. The family was originally derived from the beer's declared style; that broke when a beer's style was wrong (Amstel labelled "Pale Ale" but actually a pale lager). The LLM now scores fermentation family directly per beer, overriding the style-derived value when they disagree.

Source material:
Download PINtPOINT