Alternative data — hype vs. signal
Every few months a new “alternative data” source gets pitched as the next edge — satellite imagery, credit-card panels, app-install data, geolocation. Most of it is correlated noise dressed up as alpha. A small fraction is durable signal.
The question worth asking, before any vendor demo: what does this data tell me that I can’t get from the public version, and how long does the gap last?
A simple test
Before paying for a new dataset, I run it through three filters:
- Lead time. Does this dataset move before the data the rest of the market is looking at? If yes, by how many days?
- Decay. Once enough people are using it, is the edge gone? Most alternative data is a decaying-half-life trade.
- Cost of being wrong. What’s the false-positive rate look like in regimes that aren’t the back-test window?
If the answer to (1) is “a few hours” and (2) is “as soon as the second-largest hedge fund subscribes,” you’re looking at a feature, not a moat.
Where alternative data actually pays
The pattern that consistently works isn’t a single source — it’s the combination of two or three weak signals into something with a higher information ratio than any of them alone. That’s harder for a competitor to replicate, because the value isn’t in the dataset; it’s in the integration.
The boring conclusion: most alternative data is overpriced, the integration work is undervalued, and the people quietly building proprietary feature sets out of public data are eating the lunch of teams who think a vendor subscription is a strategy.