Essay29 March 2026 · Product · By

What's Wrong With Beer Recommendation Systems (And What PINtPOINT Does Differently)

Every couple of years, a smart data-science student builds a beer recommender on top of Untappd or RateBeer data and posts the write-up. The models are often solid. The question they rarely answer is simpler: is this useful to someone standing at a bar tonight?

The best beer recommendation system is useless
if it ignores what's actually pouring tonight.

Two pieces are worth reading as a starting point:

Haley's piece treats Untappd's rating data as fuel for a collaborative-filtering recommender. Ninkasi scrapes RateBeer, runs SVD++ and Restricted Boltzmann Machines, and wraps the output in a Flask app. Both are careful, honest work — the kind you wish more consumer-app companies did publicly.

They also both bump into the same three walls, and those walls are what PINtPOINT was designed around rather than over.

Wall 1: The cold-start problem

Collaborative filtering needs history. These models need dozens of ratings per user before predictions stabilise. What does a rating-based recommender serve to a brand-new user with zero check-ins?

In practice: popular beers. Pliny the Elder for anyone with "IPA" in their signature. Guinness for anyone who once rated a stout. That's not a recommendation. It's a bestseller list.

PINtPOINT uses a structured preference-elicitation step instead: a feature called Sip-or-Skip. Ten quick card-swipes — "would you order this? yes/no" — produces a usable style profile in under a minute. It's the same principle Tinder uses for matchmaking: forced binary choices beat free-form ratings for fast signal.

Asking "rate this beer 1-5" is a harder cognitive task than "would you order this now?" — and it produces noisier data.

Wall 2: Popularity bias

Any model trained on open rating data learns what's popular long before it learns what's personal. RateBeer and Untappd both have a heavy head: the top 1% of beers collect 50%+ of ratings. A recommender trained on that data is, mathematically, a popularity predictor disguised as personalisation.

The bias isn't just statistical, it's stylistic. Zach Mack's 2016 Thrillist piece looked at it from the other side of the bar. Mack — a Certified Cicerone — found that 41 of RateBeer's 50 top-ranked beers worldwide were imperial stouts, porters, or imperial IPAs. Three categories out of the 100+ the BJCP recognises.

As Mack puts it: "It's why most novice beer drinkers assume there's something wrong with them for not being obsessed with hops or bold, boozy stouts." Rating-driven recommenders inherit that bias by default.

This isn't a new problem. The same year Mack was tallying RateBeer's top 50, data scientist and beer judge Will Chernetsky built a recommendation engine using the descriptive language in 700,000 reviews across 20,000 beers — TF-IDF plus latent semantic analysis — rather than ratings or style labels. It could distinguish fruity IPAs from resinous ones and find meaningful similarities across traditional style boundaries. But Chernetsky landed on the same residual problem: even an accurate similarity match isn't useful if the beer isn't available nearby.

This shows up in practice as a kind of ceiling. Haley's model and Ninkasi can both predict ratings well — but "predict rating" and "predict the pint you'll actually enjoy ordering next" are different tasks. Predicting ratings is already a mature problem. Predicting the pint someone will actually want next is much less solved.

PINtPOINT's approach is to use pair-choice rounds (Head-to-Head) where both options are plausible for the user — forcing the model to learn genuine preference gradients, not just which beer is on the hype cycle this month.

Wall 3: Geography-blindness

The biggest practical problem with nearly every public beer recommender is that it doesn't know, and doesn't try to know, what you can actually order right now.

Ninkasi's output is: "you'll probably like Westmalle Tripel." Great. The bar in front of you is a 30-pub craft garden in Shoreditch. Which of those 30 taps is actually the Westmalle-flavoured experience? The model can't say, because it only knows beers, not venues.

PINtPOINT scores your style profile against recent tap activity at pubs near you. The recommendation engine runs on beer recently seen on tap, not a global catalogue:

  1. You browse nearby venues, or pull a specific one to refresh
  2. The app fetches that venue's recent tap activity (cask + keg, freshness-indicated)
  3. Your PINtDEXTER profile scores each line
  4. The top match gets flagged with a taste-match percentage

The unit of recommendation is a pint recently seen at a bar you can reach — not an abstract beer name you'll screenshot and forget. We don't claim guaranteed live stock; we claim the right shape of recommendation for the venue-bound question.

What a good beer recommender is actually solving for

Haley and Ninkasi are both solving "predict this user's rating of beer X." That's a well-defined ML task. It trains, it evaluates, it publishes nicely.

The consumer-facing question is different:

"Of the pints I can order right now, which should I pick?"

That's a smaller problem — and a more tractable one. You don't need to predict every beer in the world. You need to rank the 8-20 beers on the bar in front of the user, with enough preference signal to be confidently better than "just pick the IPA". Once the problem is scoped that way, cold start, popularity bias, and geography-blindness all shrink.

How PINtDEXTER layers the signals

The engine inside PINtPOINT is called PINtDEXTER. It combines:

The layered structure is deliberate: the user can see why the app recommended what it did, and correct it if they want.

What we're not claiming

PINtDEXTER isn't a breakthrough in recommender research. It's not going to out-predict SVD++ on a RateBeer leaderboard. Those models and the people who build them are doing harder work than what's in a consumer app.

What we think it is is a better answer to the practical question a drinker is actually asking: what should I get? Framed as a venue-bound, tonight-shaped problem, the recommender gets to use much smaller, cleaner inputs and deliver a sharper output.

Where the public recommender projects are still valuable

Honest respect to both linked pieces. A few things they get exactly right:

If you're interested in the ML side of beer recommendation specifically, both links in the intro are worth reading end-to-end. If you're interested in what to drink tonight, PINtPOINT will answer that question faster.

Most beer recommenders predict ratings.
PINtPOINT picks pints.

Frequently asked questions

Why is the cold start problem so hard for beer recommenders?

Collaborative-filtering models like SVD++ or RBMs need dozens of ratings per user before predictions stabilise. A brand-new user gets "popular beers" as a fallback, which isn't personalisation — it's a leaderboard. PINtPOINT solves this with Sip-or-Skip: ~10 binary card swipes yield a usable style profile on first use.

What's wrong with using Untappd ratings as the main input to a recommender?

Untappd ratings are a biased sample — heavy-hitting releases collect disproportionate ratings, so a model trained on them learns popularity more than preference. Recommenders built on purpose-collected pair-choice signals, rather than free-form reviews, produce sharper personalisation.

How does PINtPOINT's PINtDEXTER recommender work?

Three layers: (1) Sip-or-Skip + Head-to-Head pair rounds build a style profile from binary choices; (2) that profile sits behind a 13-style preference layer plus direct flavour, body and strength tuning the user can adjust; (3) recent tap activity at nearby venues is scored against the profile, so recommendations stay tied to beer recently seen on tap near you.

Why do most beer recommenders ignore location?

Because the training data doesn't include it, or the system is built as a catalogue recommender (what beers exist) rather than an availability recommender (beer recently seen on tap near you). A recommendation for a beer you'll never order isn't really a recommendation.

Is a good beer recommender even possible given how noisy ratings are?

It depends what you're predicting. Predicting star ratings from other star ratings hits a noise floor fast. Predicting "of the four-or-so pints recently seen on tap at this bar, which one will you enjoy most" is a narrower, more tractable problem.

How is this different from Next Glass, BeerMenus, or Untappd's own recommendations?

Next Glass (acquired by Untappd) and similar rating-driven systems optimise for catalogue-style historical preference. PINtPOINT optimises for the real-time decision at a specific venue. The two are complementary — Untappd is the diary, PINtPOINT is the decision engine.

Source material referenced:

If you're a data scientist, go read the linked work — it's good. If you're a beer drinker, try PINtPOINT and let PINtDEXTER pick your next pint based on recent tap activity at the bar in front of you, not a global leaderboard.

Download PINtPOINT