Essay20 May 2026 · Engineering · By Sophie Ro

Under the Hood: How PINtDEXTER Picks Your Pint

Q: What if the LLM gets a beer's profile wrong?

Four corrective layers sit on top of the inferred profile. The cross-family penalty catches lager-pretending-to-be-ale errors. The TUNeDEXTER sliders let the user override the inferred preferences directly. The style-contextual resolver scopes the override to the user's dominant style family so cross-family beers aren't punished for being a different shape. And confidence influences scoring — low-signal beers get treated more cautiously in ranking. Mis-extractions still happen but get caught by the architecture rather than escaping into the recommendation silently.

Q: How quickly does a brand-new beer become recommendable?

Usually within a day or two of a beer being publicly listed somewhere, sometimes longer. The point is that the recommendation shape doesn't depend on community ratings accumulating first: once a beer is in the PINtPOINT catalogue with style and ABV, the LLM can score it without waiting for anyone to write about it.

An earlier essay argued for predicting the pint rather than the rating. That piece was about the user side of the recommender — how PINtPOINT learns what you like. This one is the other side: how each beer gets a shape worth recommending. It comes down to four layers, a per-beer DNA, and the misclassified pale lager that exposed the most recent fix.

A recommender is only as good as what it knows about the thing it's recommending.

The default answer, and where it breaks on beer

The dominant approach for the last fifteen years is collaborative filtering — factor a user-by-item rating matrix and predict missing ratings. Netflix made it famous; the cleanest beer example is ŷhat's June 2013 item-item R prototype on 1.5M BeerAdvocate reviews.

The ŷhat post is honest about what makes the approach work. Their worked example pits Fat Tire against Michelob Ultra and Dale's Pale Ale: Fat Tire fans rate Dale's similarly and Michelob lower, and the recommender quantifies the difference. Intuitive — on those three beers. The catch is in the filter: they only considered beers with 500+ reviews, and the demo ran on the 20 most-reviewed beers in the dataset.

That threshold isn't a footnote, it's the design speaking. Collaborative filtering needs density. The UK alone releases more than a hundred new beers a week, and most never accumulate that volume. Cold start hits twice — new users have no history, new beers have no community signal — and the fallback in both cases is popularity, which is a ranked list of famous things rather than a recommendation.

The other half of the field, suddenly cheap

The alternative is content-based recommendation: score items by their own features. Pandora's the canonical version — paid musicologists labelling every song before the recommender saw it. Beautiful in theory, historically prohibitive for a catalogue the size of beer.

There's a quiet ten-year arc worth naming. The ŷhat post said, of the BeerAdvocate review text: "the text does provide some excellent opportunities for analysis, we're going to focus only on the ratings." They threw it away. A decade later, Nishant Kushwaha's 2023 prototype went back and did exactly the analysis they'd skipped — feeding the review text through an LLM to extract attributes per beer. Clever, but the features still depended on how much the community had written about the beer. Cold start in disguise.

What changed in 2025–2026 is that asking an LLM to label the beer itself — not its discussion — became cheap enough to be a practical catalogue step rather than a research luxury. Pandora's labelling problem became a script that runs overnight, with no community discourse required.

What goes into a beer's DNA

Each row in the catalogue gets the following inferred features, plus the ABV and declared style that came from the source data:

Hoppy — aroma and flavour from hops (citrus, pine, tropical, floral). Not bitterness.
Bitter — palate bite from boil hops; the lingering, drying side of hop chemistry.
Sweet — residual sugar, lactose, caramel malt, fruit sweetness.
Fruity — juicy, estery, tropical, stone-fruit. Whether the source is hops, yeast, or adjuncts.
Roast — malt character of any colour: biscuit and toffee at the pale end, coffee and cocoa at the dark end.
Tart — sour or acidic, from wild fermentation or fruit puree.
Body — light, medium, or full.
Fermentation family — ale, lager, mixed, wild, cider, or mead. The one that matters most for cross-style mistakes.

Five axes become the visual signature in the app — Sweet, Sour, Body, Bitter, and Strength (the last derived from ABV). The TastePentagon renders them on a beer's detail card and in compact list views: a five-pointed silhouette built to recognise at thumbnail size.

The same five anchor a second view. The FlavourEqualizer swaps the silhouette for eight vertical columns — the five fixed axes (Bitter, Sweet, Sour, Body, ABV) plus three adaptive personality slots that float across hop, malt, and fruit families by score. A NEIPA might surface Citrusy, Tropical, and Juicy; a stout, Roasty, Chocolatey, and Malty. The pentagon answers is this beer my shape at a glance; the equaliser answers it side-by-side — vertical bars that compare cleanly when two beers sit next to each other in Head-to-Head. The four hop labels are a compressed view of the richer six-category teaching taxonomy covered separately in The Drinker's Guide to Hops.

Interactive · TastePentagon

Is this beer my shape?

tap a beer →

A pale-led IPA: high hoppy, moderate bitter and fruity, low sweet, no roast.

Interactive · Head-to-Head

Side-by-side DNA comparison

pick two beers ↓

Landlord

Timothy Taylor's · Pale Ale - English · 4.3%

The Porter

Anspach & Hobday · Porter · 6.7%

Same five axes; structural twins (e.g. two American IPAs) line up across the rows. Cross-family pairs (Landlord vs The Porter) show the swap: hop and fruit bars vs roast bars high.

The four layers, end to end

PINtDEXTER stacks four signals. Each answers a different question. Take any one out and recommendations get worse in a different direction.

Preference capture — Sip-or-Skip binary swipes and Head-to-Head pair rounds. Forced choices, not free-form ratings. Output: a soft style profile after about ten interactions.
Sliders (TUNeDEXTER) — the user can take direct control of five flavour sliders plus body and strength when the inferred profile is wrong. Preference learning is suggestion, not law. The resolver applies that tuning per style family rather than globally, so a max-bitter / zero-fruit IPA tuning doesn't punish a Red Ale for being red-ale-shaped — cross-family beers score against per-family baselines instead. (Per-family slider UI is a future TUNeDEXTER slice; the resolver handles the scoping internally for now.)
Per-beer DNA — the LLM-scored content features above. Independent of any user. Cached per beer, re-run only when we explicitly backfill or invalidate.
Availability filter — recent tap-list activity at venues near you, drawn from Untappd's public feeds. The recommendation is grounded in a specific bar's recent pours, not a beer name plucked from a global catalogue.

Layers 1 and 2 are the user side. Layer 3 is the beer side. Layer 4 is the world side. The cross-product is what gets surfaced.

When the source data is wrong

One category-drift bug worth flagging: a beer ingested with a wrong style tag (a pale lager labelled "Pale Ale") was bypassing PINtDEXTER's cross-family penalty because the family was derived from the style, not measured directly. The fix was to give the LLM an explicit family axis and trust it over the style-derived value when they disagree — a small change that corrects the specific case and a whole class of similar mislabels, from cider misfiled as fruit beer to lambic misfiled as Belgian ale.

When the model overreaches

The other class of failure is the model being internally consistent but user-wrong. The recent example: a user tuned TUNeDEXTER to max bitter, zero fruit — expressing their West Coast IPA preference — and the recommender started rendering a 100% match on an American Pale Ale that happened to land at the same axis coordinates as the user's saved vector. The math was right; the framing was wrong. The user wasn't claiming to want max bitter / zero fruit from every beer. They were claiming it from IPAs.

The fix runs at the resolver. The user's flavour vector now applies only when the beer is in the user's dominant style family. Cross-family beers score against per-family baselines — a Red Ale gets resolved against a sensible Red Ale shape, a Porter against a Porter shape, and so on. The resolver supports per-family overrides internally (a later TUNeDEXTER slice will expose them as sliders); for now the baselines stand in. The end result: tuning hard for one style doesn't silently punish every other style the user might still enjoy.

Three smaller honesty changes ship alongside. Match percentages are capped at 99% — the model can't know any human's taste to one-percent precision, so claiming a perfect match reads as overclaim even when the math agrees. A worst-axis cap prevents a perfect body match from papering over a large flavour disagreement on any single axis. And tapping the match chip in the app now expands a per-axis breakdown, so a user looking at "40%" can see which axis the model thinks is misaligned without leaving the beer detail screen.

What we're not claiming

None of this means content-based recommendation is new, or that PINtPOINT invented LLM feature extraction. The contribution is narrower: feature extraction from the beer's own metadata rather than from community discussion, combined with a four-layer pipeline tied to what's been pouring recently at venues nearby — not a global catalogue, not a leaderboard.

Most beer recommenders predict ratings.
PINtPOINT picks the pint that's on the bar.

Frequently asked questions

What if the LLM gets a beer's profile wrong?

Four corrective layers sit on top. The cross-family penalty catches lager-pretending-to-be-ale errors. The TUNeDEXTER sliders let the user override the inferred preferences directly. The style-contextual resolver scopes the override to the user's dominant style family so cross-family beers aren't punished for being a different shape. And confidence influences scoring — low-signal beers (no specific cue from name or brewery) get treated more cautiously in ranking. Mis-extractions still happen, but they get caught by the architecture rather than escaping into the recommendation silently.

How quickly does a brand-new beer become recommendable?

Usually within a day or two of the beer being publicly listed somewhere — sometimes longer. The point is that the recommendation shape doesn't depend on community ratings accumulating first: once a beer is in our catalogue with style and ABV, the LLM can score it without waiting for anyone to write about it.

What are PINtDEXTER's four layers?

Layer 1 captures preference via Sip-or-Skip binary swipes and Head-to-Head pair rounds. Layer 2 lets the user override directly with five flavour sliders plus body and strength (TUNeDEXTER); the resolver applies that tuning per style family rather than globally, so the override sticks to the user's dominant style without spilling onto unrelated styles (per-family slider UI is a future slice). Layer 3 is the per-beer LLM-inferred DNA — the content features. Layer 4 scores those features against what's been pouring recently at venues near you.

What is the fermentation-family override?

The LLM scores fermentation family directly per beer, overriding the style-derived family when the two disagree — so a 0.5x cross-family penalty still fires on rows where the source style was wrong.

Source material:

Beer Recommendation Systems: What Most Get Wrong — the earlier PINtPOINT essay this follows up.
A Beer Recommendation System Made With R — David Smith / ŷhat, June 2013. Item-item collaborative filtering on 1.5M BeerAdvocate reviews; ratings only.
Beer Recommendation Engine: Harnessing the Power of LLMs and Embeddings — Nishant Kushwaha, 2023. LLM-based attribute extraction from community review text.

Download PINtPOINT