## What Are Variance And Covariance In Business

Statistics and visual representations of data is a necessity for the true comprehension of what the data means as a whole. Data are too numerous and complex to be analyzed by the individual datum, and various visualizations of data aid in understanding the complexity and use cases for any one set of data, or multiple sets of data as compared to each other.

The metrics used to make those charts are equally important and are based on the statistical analysis.

Depending on your business, its size and the products and services provided, basic statistical information will have more or less significance. Make use of averages, modes, and regressions, as well as a brief discussion of magnitudes.

All of these metrics are important when looking at the messages hidden with large swaths of data, but I want to impart with you two more basic statistical calculations that are useful.

Other useful statistics come in the form of variance and covariance, which will be useful to most firms. Statistical software will sometimes label these differently depending on the knowledge base that they expect from the user; however, their utility will always remain the same. Variance is the difference between the average of a data set and any given number within that set.

This number can be used to determine how closely related datum are within a set. In practical terms, if all items in a store are categorized and the variance is computed on their prices, this will tell us how far off any given product is in price from the average of the store.

This can be useful in cases of looking to eliminate products that are priced outside of what your customers are willing to pay. A price above the variance would be a good measure of where this line could be drawn.

Covariance is the relationship between any two given numbers and how they change with each other. This is useful for data showing the demand for a number of products.

Business owners can determine, based on the covariance, which items are taking the same place in the minds of consumers, knowing where exists product or service overlap and possibly reducing the prominence of one product to bolster the other.

Both of these statistical methods can be displayed in a variety of ways, ranging from bar graphs of variance for each product or service to quarterly plots of whole sections of products and how they negatively or positively affect other sections of a store.

## Hubris And The Limitations Of Big Data

### Unintended Consequences Of Automation

CVS pharmacy ran an experiment a few years ago to reduce the instances of theft across its stores. It took the items that were frequently stolen and automatically encased them in a more secure packaging, packaging that could only be removed by an employee. Items that received this treatment were razor blades and batteries, common items that would be shoplifted. An unintended consequence of this experiment was the accidental discrimination of a large portion of its customer base.

The ‘Just For Men products’ sold via CVS was the next set of items to be encased in the more secure packaging. There was an incredible oversight; however, not every product was given the new security improvement. The algorithm that determines which products needed the beefed up security did not account for accidental racism, and when ‘Just For Men products’ that were more commonly used by African Americans became more secure, there was a reasonable outcry of discrimination on the part of CVS.

The automation through analytics here is a key reason why human intervention is needed at key points in the decision-making process. While Amazon’s recommended products page is harmless, CVS’s attempt to secure its product was a public relations disaster. If management was involved in the decision process, it’s quite possible the ‘Just For Men dye’ would have still been secured, but the encasing would have been given to the entire line of products, not just those that target a particular demographic.

### The Human Element – Proper Analytics With Improper Analysis

Public school districts across the United States have a strong focus on state and federal testing. They are incentivized to care a great deal about these tests, as it determines the funding that each school receives. The growing need to improve test scores among students has led to the widespread use of analytics throughout public schools.

This analysis is done well before the date of the test, and the basic statistical information is given to teachers so that they can get a better sense of how their students are doing. There is a lesson being taught throughout American classrooms, but students are not on the receiving end. The lesson is `statistics that are analyzed poorly have no effect at best, and can have adverse effects at worst’.

Teachers are given the average test scores of their classrooms, as well as a handful of other statistical data. Using this assortment of data points, they are supposed to improve the test scores of their students. How the data is measured and when the data is measured is removed from the teacher. They know that it comes from previous test data, but with the limited information, the data provide little actionable information.

Instead, what many teachers adopted was a goal of raising the class average by seeking out students that could gain the most improvement. This will raise the class average, and a small portion of students increased their test scores.

What this fails to do is take into account the large portion of students for whom the test scores stayed the same, the students that were already at the bottom.

The poor statistical data presented to teachers could only be used in a way that would negatively affect the class as a whole. Focusing on metrics that the district has said are important; teachers are unable to teach to the entire classroom.

There is a lesson in how some American teachers are treating data on their students. While they are using it to raise the metrics provided, the metrics themselves do not give an accurate picture of the entire classroom. Worse, the statistics measured by the school are not the same as the statistics used to determine to fund.

Individual students are the key factor in determining to fund, and a teacher’s focus on a limited set of those students to raise the artificial metrics set by school districts has resulted in an increased setback in an already struggling area of the country’s education system.