During the 2000s, the United States experienced an unprecedented decline in manufacturing employment.
Roughly six million jobs were lost.
Most economists looked at the data and determined that all these jobs disappeared due to productivity improvements. But, when
Susan Houseman
, an economist with the Upjohn Institute for Employment Research looked into that same data, she came to a very different conclusion.
Houseman found that there had been little productivity improvement in the manufacturing sector during that time. Rather,
nearly all the improvement in productivity came from the
electronics
sector,
a relatively small part of the US economy. Further, all the productivity in the electronics sector was driven by product design or R&D, not by automation (i.e., making things faster and cheaper). Besides, most electronics manufacturing had moved to Asia by then anyway.
Businesses use data every day to draw conclusions and make decisions. But as this example highlights,
using the wrong data or looking at it in the wrong way can lead to faulty “insights.”
With that in mind, and while there are all sorts of data,
today we focus on numerical data of things that have already happened
— not nonnumerical data (e.g., documents) or data projections regarding what may or may not happen in the future.
Disaggregate the Data
The problem with the initial analysis in the example above was that while the total indicated one thing, once the data was broken down into its components a different picture emerged.
This type of disaggregation is also important when analyzing financial data.
For example, if a company is profitable on an annual basis but only makes money a few months of the year, that is very different than a business that makes money every month.
Similarly, while a company may look profitable in the aggregate, it may be that it is only profitable with some customers or some products.
Understand How the Data is Put Together
When looking at customer or product profitability,
it is important to understand how revenue and costs are divvied up between products or customers.
In the case of one of my manufacturing clients, its measures of customer-specific product profitably showed that nearly all customer pricing for nearly all products was under water — unprofitable. Yet the company was profitable overall. The data problem? One commodity raw material representing 50% of revenue in aggregate was measured in a horribly inaccurate way. Other key costs were allocated based on “run time” and measured erroneously as well.
It is also
important to understand how government statistics and other data external to the company are put together.
For example, many statistics are
seasonally
adjusted, a correction that can mask what is really happening.
Surveys in particular can be misleading. An academic study made headlines by claiming that 3.6 million American households live on two dollars a day or less. However,
an analysis by Bruce Meyer
at the University of Chicago Harris School of Public Policy discovered that claim came from looking at a Census Bureau survey that collects data by interviewing households. When other anonymized government data was compared to the survey, it found that 90% of those whom the study said lived on less than two dollars per day were far better off.
Finally, it is important to understand measurement accuracy.
Even small errors can quickly compound when combined with other inaccurate data. Error multiplied by error, etc., always yields a number that increases exponentially.
Sample Size Matters
There’s a reason lenders don’t just look at one year of profitability
— what if the prior four years were not as profitable? They know that a sample size of one is insufficient to make a lending decision.
The manufacturing client mentioned earlier upgraded to a metric based on running a bag of raw materials through the manufacturing process a couple of times a year and then counting how many bags of finished product resulted. This sample size was way too small to be helpful.
There are also mathematical problems with small sample sizes.
Statistically, for any result to cross the 95% threshold and be significant, the finding itself must also be significant. For example, a 60+% difference may be required and numbers like that grab our attention. But the finding may not be real; it’s just a mathematical necessity given the small size of the sample used.
Use the Proper Data Set
Speaking of statistical significance,
you may (vaguely) remember this from statistics class:
"It" is statistically significant if there is a greater than 95% chance that a finding is not random or the result of just noise.
Well, not quite. What the test really says is that
given the data sample examined,
there is a 95% chance that a finding is not random or the result of just noise.
Data is often chosen because it is easy to get.
Or, if we are trying to test a particular theory, we look for data that we think will apply. But the data set may not be sufficiently representative of the world at large.
Other times, we look at — and take action based upon — a given study or analysis that was statistically significant… but we ignore similar studies that found nothing.
A Few Tips
Does the data look reasonable?
Apply common sense. Pay particular attention and dig in if the data is accompanied by an attention-grabbing headline.
Put controls on data collection in place.
For financial data, there is usually a system of controls in place to make sure it is accurate. Do the same for other important data that you routinely collect.
Invest in better data when that data is important.
The manufacturing client mentioned is investing in computerized scales and other computerized data collection tools for accuracy and continuous measurement. They are coupling this investment in equipment with an ERP system to process and house the data.
Beware of confirmation bias.
That is,
focusing on data that supports an existing point of view
while subconsciously ignoring data that doesn’t.
Conclusion
Every well-run company relies on data to operate the business. Keep in mind, however, that information is only as good as the assumptions, techniques, and circumstances through which it is generated.
When using data, always understand how it was obtained and calculated.
|