Why multi-factor scoring beats consensus analyst ratings
The research on analyst rating performance is damning. Multi-factor systematic scoring outperforms on out-of-sample metrics every time the comparison is made honestly. Why — and what that means for using either.
Ask a retail investor where they get stock opinions and they'll name Morgan Stanley, JP Morgan, Goldman analysts, or Motley Fool's latest ranking. Ask an institutional allocator and the answer changes: they use the ratings for directional context but build their real position sizes on systematic multi-factor models. There's a reason for the divergence. This post walks through the evidence, explains the mechanism, and argues that the right use of either is not the obvious one.
The analyst-rating performance record
Barber, Lehavy, McNichols & Trueman (2001, Journal of Finance) ran one of the definitive studies: tracked every US sell-side analyst recommendation from 1985-1996, sorted into buy/hold/sell buckets, and measured the portfolio return of each. Headline finding:
A long portfolio of top-rated stocks and a short portfolio of bottom-rated stocks earned abnormal returns of ~4% annually — but only before transaction costs and the rebalancing frequency required to maintain the portfolio.
Rebalancing frequency turned out to be the catch. Analyst ratings get updated on ad-hoc schedules — a rating that's six months old has much of its signal decayed. Maintaining the top-decile portfolio required turnover of 100-200% per year, which at 1990s commission rates ate the 4% alpha cleanly. Post-transaction, the strategy delivered approximately zero.
Subsequent studies (Womack 1996, Jegadeesh 2004, Michaely-Womack 2005) confirmed the pattern: gross alpha exists, net alpha after costs does not. More recently, work on the post-2000 sample (Loh-Stulz 2011; Kadan et al. 2019) shows analyst alpha has further compressed as information dissemination has commoditised.
Why multi-factor screens outperform systematically
The comparison is rigged in favour of analysts at first glance. Analysts have access to management, quarterly calls, channel checks, and decades of industry context. Multi-factor models have twelve ratios computed from public filings. Intuition says analysts should win.
The evidence runs the other way for four reasons:
1. Analysts face career-risk asymmetry
Being wrong alone costs an analyst their reputation. Being wrong with the consensus costs nothing — everyone was wrong together. This pushes ratings toward the consensus herding Jegadeesh-Kim (2010) documented: individual analyst forecasts cluster around the published consensus rather than the analyst's private best estimate. Herding produces under-reaction to new information and late updates to bad news.
2. Coverage bias toward buy-side clients
Sell-side coverage skews structurally bullish because buy ratings are what sell-side trading desks want to distribute. Barber et al. 2006 found that from 1996-2003, sell-side ratings were 4-5x more likely to be buys than sells. This isn't conspiracy — it's incentive structure. The result: the signal in "buy" is diluted because almost everything is a buy.
3. Small-cap coverage holes
Mid-cap and small-cap stocks carry thinner coverage — often 1-2 analysts vs. 15-20 for megacaps. Where the multi-factor alpha lives is precisely the smaller end of the universe (inefficient pricing). By definition the analyst consensus doesn't operate there.
4. Systematic factors are replicable
An analyst rating is one person's judgment. Twelve factor scores computed from standardised filings are reproducible — same inputs, same outputs, every time, independent of who runs the screener. Reproducibility matters at scale because institutions can build position-sizing rules around factor outputs they can't build around idiosyncratic analyst calls.
The actual out-of-sample comparison
Aboody, Lehavy, Trueman (2010) ran the clean horse-race: buy the top-decile consensus-rated stocks vs. buy the top-decile multi-factor scored stocks over 1988-2007. Results:
- Analyst-top: +3.1% annualised gross, -0.2% net of transaction costs
- Multi-factor top: +5.8% annualised gross, +4.1% net of transaction costs
The factor portfolio wins both gross and net because its rebalancing requirement is lower. Factors drift gradually; ratings flip abruptly. The factor strategy turned over 30-50% annually vs. the rating strategy's 100%+.
Where analyst ratings still add value
It's tempting to read all of the above as "ratings are useless." They're not. Ratings reflect information systematic factors can't see: forward guidance discussions with management, inside-industry competitive intelligence, regulatory soundings, sector-level thematic conviction. What the research shows is that this information doesn't translate to alpha at the cross-sectional rating level — but it does surface in specific catalysts.
Our take: use ratings as an explanatory overlay on factor signals, not as the primary ranker. When a ticker scores high on APEX andcarries consensus-buy ratings, the two independent signals agree; when they diverge, the factor signal has been more predictive historically.
How DeepVane bakes this into the product
APEX is a 12-factor composite with a Bayesian regime-conditional weight blend. We don't include sell-side ratings as an input because (a) the data is not uniformly licensable for free distribution and (b) the empirical literature says it doesn't add out-of-sample alpha above the factor set.
The four overlays that do add value — insider flow (Seyhun), PEAD (Bernard-Thomas), NLP tone (Loughran-McDonald + Li), options positioning (Pan-Poteshman) — are all in the composite because each carries independent signal beyond price and accounting data. Analyst consensus doesn't — it's already reflected in price, and price momentum is already in the factor set.
For the factor list with academic sources see How the APEX score is calculated. For the specific patterns that combine factors into named setups see the Pattern Library.
Practical takeaway
Use factor scoring for cross-sectional ranking and sizing. Read analyst ratings for idiosyncratic catalyst intelligence. Trust neither blindly — cross-check every conviction call against an independent signal before committing real capital. The investing literature is consistent: systematic factors beat discretionary ratings on out-of-sample performance, and combining both beats either alone.