Essay29 March 2026 · Product · By Sophie Ro

What's Wrong With Beer Recommendation Systems (And What PINtPOINT Does Differently)

Q: Why is the cold start problem so hard for beer recommenders?

Collaborative-filtering recommenders like the SVD++ model in Ninkasi or typical Untappd-rating models need dozens of ratings from a user before their predictions converge. For a new drinker opening the app, there's no signal to work with, so the system falls back to popular beers — which is exactly the least personalised thing you could recommend. PINtPOINT sidesteps this with Sip-or-Skip: a structured preference-elicitation flow where ~10 quick taps produce a usable style profile from day one.

Q: What's wrong with using Untappd ratings as the main input to a recommender?

Untappd ratings are a great record of what someone drank, but they're a biased sample. Heavily rated beers (popular releases, hype cans) dominate the dataset, which means a model trained on them learns popularity more than preference. You end up recommending Pliny the Elder to anyone with a vaguely IPA-shaped history. A recommender built on purpose-collected preference signals — structured comparisons, not reviews of random things — produces sharper personalisation.

Q: How does PINtPOINT's PINtDEXTER recommender work?

PINtDEXTER layers three signals. First, Sip-or-Skip and Head-to-Head pair-choice rounds build a fast style profile by asking the user to make forced choices (A or B) rather than rate individual beers on a 1-5 scale. Second, that profile sits behind a 13-style preference layer plus direct flavour, body and strength tuning the user can adjust — the resolver applies that tuning per style family rather than globally. Third, recent tap activity at nearby venues is scored against that profile, so recommendations stay tied to beer recently seen on tap near you rather than a global catalogue.

Q: Is a good beer recommender even possible given how noisy ratings are?

It depends what you're trying to predict. A model that tries to predict star ratings from other people's star ratings will hit the noise floor quickly — RateBeer and Untappd both have heavy reviewer bias, style preferences, and context effects. A model that tries to answer 'of the four-or-so pints recently seen on tap at this bar, which one are you most likely to enjoy' is a narrower, more tractable problem. Narrow the question and the recommender gets sharper.

Q: How is this different from Next Glass, BeerMenus, or Untappd's own recommendations?

Next Glass (acquired into Untappd) and other rating-driven systems optimise for cataloguing and historical preference. PINtPOINT optimises for the real-time decision at a specific venue. The two are complementary — Untappd is the diary, PINtPOINT is the decision engine.

Every couple of years, a smart data-science student builds a beer recommender on top of Untappd or RateBeer data and posts the write-up. The models are often solid. The question they rarely answer is simpler: is this useful to someone standing at a bar tonight?

The best beer recommendation system is useless
if it ignores what's actually pouring tonight.

Two pieces are worth reading as a starting point:

Ethan Haley — Untappd as a recommender (RPubs, 2021)
NYC Data Science — NINKASI: Beer Recommender System (2020)

Haley's piece treats Untappd's rating data as fuel for a collaborative-filtering recommender. Ninkasi scrapes RateBeer, runs SVD++ and Restricted Boltzmann Machines, and wraps the output in a Flask app. Both are careful, honest work — the kind you wish more consumer-app companies did publicly.

They also both bump into the same three walls, and those walls are what PINtPOINT was designed around rather than over.

Wall 1: The cold-start problem

Collaborative filtering needs history. These models need dozens of ratings per user before predictions stabilise. What does a rating-based recommender serve to a brand-new user with zero check-ins?

In practice: popular beers. Pliny the Elder for anyone with "IPA" in their signature. Guinness for anyone who once rated a stout. That's not a recommendation. It's a bestseller list.

PINtPOINT uses a structured preference-elicitation step instead: a feature called Sip-or-Skip. Ten quick card-swipes — "would you order this? yes/no" — produces a usable style profile in under a minute. It's the same principle Tinder uses for matchmaking: forced binary choices beat free-form ratings for fast signal.

Asking "rate this beer 1-5" is a harder cognitive task than "would you order this now?" — and it produces noisier data.

Wall 2: Popularity bias

Any model trained on open rating data learns what's popular long before it learns what's personal. RateBeer and Untappd both have a heavy head: the top 1% of beers collect 50%+ of ratings. A recommender trained on that data is, mathematically, a popularity predictor disguised as personalisation.

The bias isn't just statistical, it's stylistic. Zach Mack's 2016 Thrillist piece looked at it from the other side of the bar. Mack — a Certified Cicerone — found that 41 of RateBeer's 50 top-ranked beers worldwide were imperial stouts, porters, or imperial IPAs. Three categories out of the 100+ the BJCP recognises.

As Mack puts it: "It's why most novice beer drinkers assume there's something wrong with them for not being obsessed with hops or bold, boozy stouts." Rating-driven recommenders inherit that bias by default.

This isn't a new problem. The same year Mack was tallying RateBeer's top 50, data scientist and beer judge Will Chernetsky built a recommendation engine using the descriptive language in 700,000 reviews across 20,000 beers — TF-IDF plus latent semantic analysis — rather than ratings or style labels. It could distinguish fruity IPAs from resinous ones and find meaningful similarities across traditional style boundaries. But Chernetsky landed on the same residual problem: even an accurate similarity match isn't useful if the beer isn't available nearby.

This shows up in practice as a kind of ceiling. Haley's model and Ninkasi can both predict ratings well — but "predict rating" and "predict the pint you'll actually enjoy ordering next" are different tasks. Predicting ratings is already a mature problem. Predicting the pint someone will actually want next is much less solved.

PINtPOINT's approach is to use pair-choice rounds (Head-to-Head) where both options are plausible for the user — forcing the model to learn genuine preference gradients, not just which beer is on the hype cycle this month.

Wall 3: Geography-blindness

The biggest practical problem with nearly every public beer recommender is that it doesn't know, and doesn't try to know, what you can actually order right now.

Ninkasi's output is: "you'll probably like Westmalle Tripel." Great. The bar in front of you is a 30-pub craft garden in Shoreditch. Which of those 30 taps is actually the Westmalle-flavoured experience? The model can't say, because it only knows beers, not venues.

PINtPOINT scores your style profile against recent tap activity at pubs near you. The recommendation engine runs on beer recently seen on tap, not a global catalogue:

You browse nearby venues, or pull a specific one to refresh
The app fetches that venue's recent tap activity (cask + keg, freshness-indicated)
Your PINtDEXTER profile scores each line
The top match gets flagged with a taste-match percentage

The unit of recommendation is a pint recently seen at a bar you can reach — not an abstract beer name you'll screenshot and forget. We don't claim guaranteed live stock; we claim the right shape of recommendation for the venue-bound question.

Interactive · The thesis figure

Same bar. Different drinkers. Different top pints.

switch profile ↓

Below: a real tap list from Chelmsford Brewing Company (10 beers pouring as of this morning) scored against four reader profiles. Tap a profile, watch the ranking re-sort. The same beer doesn't win every time. That's the geography-blindness fix.

Real beers, real LLM-inferred axes (the same data the live app uses). Match% uses the recommender's actual flavour-distance formula simplified for blog inline use. The %s shift; the top pick changes. That's the point.

What a good beer recommender is actually solving for

Haley and Ninkasi are both solving "predict this user's rating of beer X." That's a well-defined ML task. It trains, it evaluates, it publishes nicely.

The consumer-facing question is different:

"Of the pints I can order right now, which should I pick?"

That's a smaller problem — and a more tractable one. You don't need to predict every beer in the world. You need to rank the 8-20 beers on the bar in front of the user, with enough preference signal to be confidently better than "just pick the IPA". Once the problem is scoped that way, cold start, popularity bias, and geography-blindness all shrink.

How PINtDEXTER layers the signals

The engine inside PINtPOINT is called PINtDEXTER. It combines:

Sip-or-Skip — fast binary swipes for cold-start signal
Head-to-Head — pair rounds that sharpen an existing taste profile
Preferred styles — an interpretable layer across 13 style families
TUNeDEXTER sliders — manual overrides when the user disagrees
Safe / Adventurous toggle — a single dial that tilts venue scoring toward focused taprooms or wide-range ones
Venue-aware ranking — recommendations scored against recent tap activity at nearby venues

The layered structure is deliberate: the user can see why the app recommended what it did, and correct it if they want.

BEERS LIKE THIS — twin-finding from the full BEER DNA

Every beer in the catalogue gets an LLM-inferred BEER DNA: six main axes (hoppy, bitter, sweet, fruity, roast, tart), body, an ABV bucket, a fermentation family, and a personality across five sub-families (hop · malt · fruit · yeast · finish). Two beers count as structural twins when their full DNA aligns — not just the broad-axis recipe, but the personality vocabulary the equaliser is showing the user too. A Belgian pale and an English pale that score the same on the six broad axes are no longer treated as identical, because the personality picks (estery vs caramel, soft vs crisp) are read into the match.

The detail screen surfaces this directly. Underneath the equaliser, a BEERS LIKE THIS section lists up to five neighbours: MATCH chips for strict twins (same family + style + recipe + ABV bucket + tiny personality drift); CLOSE chips for full-vector neighbours that don't quite clear the MATCH bar but sit in the same drinker-affinity lane. Weak candidates are dropped entirely — the section never pads to five if the catalogue doesn't have honest neighbours. Each row carries the brewery, style and ABV, plus a "📍 0.4mi · on tap at Venue Name" line when the beer is pouring within range.

The section title carries a rarity badge that matches the chip definition: "Unique in the catalogue" when no twin or cousin exists, "No exact matches" when there are only CLOSE cousins, "Rare DNA shape" for one or two matches, "N share this DNA" for a small cluster, "Frequent DNA shape" at fifty-plus. A drinker landing on a beer learns three things at once: how rare its DNA is, who else shares it, and where the closest cousin is pouring tonight. It's the lookup key Chernetsky's 2016 cosine-similarity engine couldn't quite reach — interpretable, grounded in tap activity instead of a global catalogue, and honest about its uncertainty when the catalogue runs thin.

What we're not claiming

PINtDEXTER isn't a breakthrough in recommender research. It's not going to out-predict SVD++ on a RateBeer leaderboard. Those models and the people who build them are doing harder work than what's in a consumer app.

What we think it is is a better answer to the practical question a drinker is actually asking: what should I get? Framed as a venue-bound, tonight-shaped problem, the recommender gets to use much smaller, cleaner inputs and deliver a sharper output.

Where the public recommender projects are still valuable

Honest respect to both linked pieces. A few things they get exactly right:

Ratings data does have real structure. Style co-occurrences and per-user bias are picked up cleanly by matrix-factorisation approaches like SVD++.
RateBeer and Untappd are generous corpora by industry standards — the beer world is unusually open about its ratings.
Transparent methodology + working Flask/R web apps + public notebooks is a great cultural norm. The models linked above could be dropped into a real product with modest effort.

If you're interested in the ML side of beer recommendation specifically, both links in the intro are worth reading end-to-end. If you're interested in what to drink tonight, PINtPOINT will answer that question faster.

Most beer recommenders predict ratings.
PINtPOINT picks pints.

Frequently asked questions

Why is the cold start problem so hard for beer recommenders?

Collaborative-filtering models like SVD++ or RBMs need dozens of ratings per user before predictions stabilise. A brand-new user gets "popular beers" as a fallback, which isn't personalisation — it's a leaderboard. PINtPOINT solves this with Sip-or-Skip: ~10 binary card swipes yield a usable style profile on first use.

What's wrong with using Untappd ratings as the main input to a recommender?

Untappd ratings are a biased sample — heavy-hitting releases collect disproportionate ratings, so a model trained on them learns popularity more than preference. Recommenders built on purpose-collected pair-choice signals, rather than free-form reviews, produce sharper personalisation.

How does PINtPOINT's PINtDEXTER recommender work?

Three layers: (1) Sip-or-Skip + Head-to-Head pair rounds build a style profile from binary choices; (2) that profile sits behind a 13-style preference layer plus direct flavour, body and strength tuning the user can adjust — the resolver applies that tuning per style family rather than globally; (3) recent tap activity at nearby venues is scored against the profile, so recommendations stay tied to beer recently seen on tap near you.

Why do most beer recommenders ignore location?

Because the training data doesn't include it, or the system is built as a catalogue recommender (what beers exist) rather than an availability recommender (beer recently seen on tap near you). A recommendation for a beer you'll never order isn't really a recommendation.

Is a good beer recommender even possible given how noisy ratings are?

It depends what you're predicting. Predicting star ratings from other star ratings hits a noise floor fast. Predicting "of the four-or-so pints recently seen on tap at this bar, which one will you enjoy most" is a narrower, more tractable problem.

How is this different from Next Glass, BeerMenus, or Untappd's own recommendations?

Next Glass (acquired by Untappd) and similar rating-driven systems optimise for catalogue-style historical preference. PINtPOINT optimises for the real-time decision at a specific venue. The two are complementary — Untappd is the diary, PINtPOINT is the decision engine.

Source material referenced:

Ethan Haley — "Untappd as a recommender" (RPubs)
NINKASI: Beer Recommender System (NYC Data Science blog)
Ninkasi Beer recommender app (Internet Archive)

If you're a data scientist, go read the linked work — it's good. If you're a beer drinker, try PINtPOINT and let PINtDEXTER pick your next pint based on recent tap activity at the bar in front of you, not a global leaderboard.

Download PINtPOINT