7 common biases that influence how we understand, use, and interpret the world around us
In 2005, UCLA Econ Graduate, Michael Burry, saw the writing on the wall – the ticking numbers that form the American mortgage market. Burry’s analysis of US lending practices between 2003-2004 led him to believe that housing prices would fall drastically as early as 2007.
And he turned his ideas to good use, pocketing net profits close to a whopping 489% between 2001 and 2008! Those who overlooked his insights earned a little over 2% in the same period.
In the modern world, we can’t overstate the impact of accurate data analysis. The price to pay for small mistakes can be significant – running up to millions of dollars, or the failure to predict election results by a laughably wide margin.
So, why do we make these errors? Why do even the best of us, with years of experience in making data-led decisions and equipped with the latest tools, often struggle to read between the numbers?
Table of Contents
Also called confirmation bias, this theory suggests that decision makers use data to prove or debunk a specific theory.
Unlike Burry, most stakeholders looked at the data with preconceived notions of how the investment market is supposed to behave.
Instead of a generic stance, C-level executives might leverage data with a predetermined goal. That’s where the data scientist comes in – it’s their job to perform an accurate and objective analysis, gaining insights that may or may not validate the business users’ choice, or even turn out to be completely irrelevant.
The broad umbrella of selection bias covers unwitting biases (like survivorship bias) or unavoidable ones, such as availability bias.
Take, for example, the 7 million Americans living outside the country who weren’t included in 2016’s US pre-poll survey. Incomplete data sets let the NYT Presidential Forecast ticker go from 80% to <5% in around 12 hours.
In fact, most surveys are prey to selection bias. “Many businesses only capture a small piece of the pie when it comes to data available to their segment or industry, and this means their data and subsequent analysis are skewed,” said Powerlytics CEO, Kevin Sheets, in an interview with InformationWeek.
Outliers are extreme data points that show a vast difference from the mean. As seen, they tend to generate ‘false’ averages that don’t reflect the real picture.
In 2014, research shows the bottom 50% of the American population earned USD 25,000 on an average, while the top 1% cashed in around 81 times that amount, every year – sizable difference.
However, removing the outliers isn’t always the way forward. For the insurance industry, a set of exceptional claims can impact revenues – but must be analyzed and addressed separately.
Those of us not familiar with the nitty gritty of statistical analysis often fall prey to what experts call ‘Simpson’s Paradox’. It says, combining two data sets might negate – or even reverse – the insights gathered from them individually.
Let’s break it down: in 1973, graduate admissions in Berkeley showed a marked slant towards men who enjoyed 44% successful admission rates, in comparison to 35% for women. But among the 6 largest departments, 4 were biased against men while only 2 favored them!
Interestingly, the Simpson’s Paradox disappears when you factor in causes and other underlying forces.
In our example, it was observed that women mostly applied to highly competitive departments – among the 341 who went for Department F, only 7% finally qualified. On the other hand, from the measly 25 who chose the less competitive B, 68% were successful.
This brings us to the cause – hidden data, called confounding variables, that can hugely impact your analyses.
At first glance, an insight may appear to make perfect sense – and if accepted, can lead to incorrect decisions. Let’s say a study of men and women uncovers that men gain weight faster and more easily than women, leading to the conclusion that gender is a direct cause.
On closer examination, however, it’s revealed that the average man eats more than women, and is more likely to have a desk job.
This is a curious case of the confounding variable, where an earlier overlooked piece of data invalidates the conclusion. In the Berkeley scenario, the fact that women preferred highly competitive courses negated the apparent favoritism towards men.
Clearly, the obvious conclusion isn’t always the right one.
Right at the starting line, if the analytical model employed is out of sync with the data set, the insights generated might be subject to either overfitting or underfitting.
Overfitting arises from statistical models that are overly complex and thorough, taking into account more information than was required. Underfitting, on the other hand, is a result of applying models that are too simple. Not enough aspects are considered, and in both cases the conclusions are likely to be skewed.
Mathematician Spencer Greenberg sums it up perfectly: “Overfitting is one of the most common (and worrisome) biases. It comes about from checking lots of different hypotheses in data. If each hypothesis you check has, say, a 1 in 20 chance of being a false positive, then if you check 20 different hypotheses, you’re very likely to have a false positive occur at least once.”
Normalcy bias occurs when we fail to factor in non-normality, i.e. atypical possibilities.
Some statistical tests, like the t-test, is predicated on the fact that a bell curve – a normal distribution – already exists. However, if that’s not actually the case and data is force-fit into compliance, the conclusions can be vastly misleading.
For instance, a hospital’s target processing time for patients in the emergency room is 4 hours. However, on-floor data mapped as a bell curve suggests it hovers between 12 hours, and 30 minutes! Does that mean the systems in place are critically flawed?
Not necessarily.
Greenberg recalls how a t-test returned a probability value of 0.03, meaning the hypothesis being tested had a 0.03% chance of being true. When passed through non-parametric analysis that doesn’t assume that the data is normal, the same experiment gave a result of 0.06 – a small but visible change.
And the list doesn’t end. From prediction bias to loss aversions, it’s almost as if the human mind is built for flawed data analyses! Yet, biases are hardwired into our thought processes and a vital part of our organic survival mechanisms.
Think about it. In case of a zombie apocalypse, is it better to a) contemplate the forces that would reanimate a corpse and instill it with the desire to eat human flesh, then work out the most effective solution to block this cycle? Or, b) start shooting until it stopped moving.
The difference is, while time and energy continue to be precious resources, modern computational tools and analytical methods have far surpassed such cognitive limitations. Errors can now be easily avoided by applying the right tools on the right information – all you have to do is deep-dive, and explore the vast, ever-growing world of analysis ideas. While you’re at it, why don’t you enjoy this comic strip:
You can build complex web applications easily with Angular. But it’s a challenge to present…
JavaScript charts help transform raw data into clear, interactive visualizations that users can easily understand.…
Modern web applications depend on data visualization to transform complex information into clear, actionable insights.…
Data is a big part of modern software. Companies use charts to track sales, monitor…
Every day, businesses get more data than ever before. Looking at endless rows and columns…
Building interactive React charts from scratch can quickly become complicated. It becomes even more challenging…
View Comments