Minnesota Star Tribune · Data Science

New Subscriber Paths

At the Minnesota Star Tribune, growing our subscriber base means understanding what actually drives readers to pay. The question we kept returning to: what kind of storytelling compels someone to pull out their wallet?

Role Data Scientist
Methods Multi-touch Attribution · Negative Binomial Regression
Stack Python · SQL

The Problem with Traffic

Traditional audience metrics — pageviews, sessions, engagement — only tell part of the story. We found that the articles bringing in the most raw traffic often looked nothing like the articles people read right before subscribing. That gap pointed to a measurement problem. We needed a metric that evaluated stories not by how many people they reached, but by how effectively they converted readers into subscribers.

Building the Metric

We developed a multi-touch attribution model that counts how many new subscribers engaged with a story in the 30 days before their purchase. We called the output new subscriber paths — a name chosen deliberately to be legible to everyone in the newsroom, not just the data team.

The results were immediately striking. Editors looked at the top-paths stories and said, unprompted, that they felt like good journalism — the kind of work that earns a subscription.

Diagram showing two readers each consuming three stories before subscribing, with one story shared between them
Each story a new subscriber read in the 30 days before purchase counts as one path for that story. Here, Person A and Person B each read three stories before subscribing — including one story they both visited. That shared story earns two paths: one for each reader it helped convert.

Putting It in Context

Raw path counts are tricky to interpret. Three factors complicate any direct comparison between stories:

Editors needed a way to account for all three when evaluating a story. We turned to a Negative Binomial Regression model, well-suited for count data with overdispersion. The model learns the historical distribution of path counts and attributes variance to two key factors: how long the story has been published, and which section it ran in — a reliable proxy for audience.

System diagram showing four story features feeding into a regression model, which produces an expected range compared against actual paths to yield a performance category
The regression model takes four inputs for each story — article section, days since publication, subscription pricing, and term — to generate an expected range of path counts. Comparing that modeled range against the story's actual paths yields its performance category.

The Framework

With the model fitted, we can generate an expected distribution of paths for any story given its section and age. That gives us a principled basis for comparison.

A story outperforming 90% of its expected distribution earns a "top-performing" label; one in the bottom 20% is flagged as "low-performing."

Right-skewed distribution of path counts divided into five color-coded performance bands, with an example story at 7 paths falling in the top-performing tier
The model produces a right-skewed expected distribution of paths for stories in a given section and age bracket. Most stories cluster near zero — few earn many paths. An example story with 7 paths lands well into the top-performing band (75th percentile and above), placing it among the newsroom's strongest subscriber drivers.

Impact

The metric has become a fixture in how editors set goals and evaluate their sections. It gave the newsroom a new dimension to think about story performance — one that aligns more closely with the Star Tribune's core business objective — and contributed to a broader cultural shift toward evidence-driven editorial decisions.