At Nomi, we've built an inference engine that accepts real-time data from several different sources to provide a single best estimate of the traffic in a store. This estimate is produced as a floating point value in the world of real numbers, not as an integer value in the world of natural numbers, like counts really ought to be. Our inference engine operates on a continuous number line using probability distributions, so it's perfectly normal to have an expected value of a visit count to be something like 43.2476784927658.
But, of course, in no world is it possible to have, say, 43.2476784927658 visitors enter a store in a single day.
It's been an ever-present challenge to effectively communicate our level of confidence in the estimates. The goal, of course, is to be completely transparent and earn trust. However, different audiences are likely to have completely different ideas of what appears trustworthy and what appears suspicious.
Really, there are two goals to work toward:
- Providing maximal benefit for customers.
- Instilling maximal confidence in the data.
These two goals are complicated to meet simultaneously because the consumers of the numbers have different styles of sophistication. I'll describe four perspectives.
Statisticians treat measured or calculated numbers as point estimates, and often describe precision in terms of statistical variance (or equivalently, a standard deviation). Both the point estimate and its calculated variance can be carried around with arbitrarily high precision, but inference is only made when combining the two in a formal a statistical test. A good statistical way to present estimated visit counts might be as a 95% confidence interval, wherein we expect 95% of intervals so constructed in experimental replications to contain the the true value. That is, perhaps we would want to say the visits fall within (249, 334) with 95% confidence.
Statisticians distrust people who present obviously estimated numbers without standard deviations or confidence bounds.
Scientists developed the notion of significant figures to deal with measurement precision. They know that when weighing a chemical, small air currents cause a scale's last few digits to fluctuate. A scientist will record the mass by writing down all the stable "trusted" numbers, as well as the first unstable "untrustworthy" number. This is a heuristic, base-ten way of thinking about precision in measurements.
A chemist would distrust people who present estimated visits as 24,543 since it doesn't communicate measurement precision. Instead, a chemist might trust a number like 24,500 -- with three significant figures.
Savvy Intuitive Perspective
An analytics-savvy person without statistical or scientific training is still aware of the idea of measurement error, and probably knows about sampling error for polls. They might trust somebody who provides numbers with confidence intervals disguised as "+/- 5%", and may or may not care whether the point estimates are rounded, or just what the confidence level of the interval was.
They may or may not get suspicious if numbers are presented without these sampling errors.
An intuitive person with little experience interpreting measurements might be unfamiliar with concepts like precision, estimation, or measurement error. Instead, they might expect all numbers to be exact if they are counts and rounded if they're percentages. In this case, trust comes from producing a plausible value on first principles --- a natural number for a count, and nothing with decimal places at all.
A naive intuitive consumer of data might get suspicious if they see visit counts like 24,500 because having an exact multiple of 100 is unlikely. They might also distrust a set of percentages that do not add to 100% merely because of rounding error.