Week 8 : DS4100

Statistics

Posted on March 12th, 2017

One of my favorite websites regarding data is fivethirtyeight.com, Nate Silver's brainchild. It's very focused on gathering key insights from the public data repositories. The key is that Silver, or one of his other analysts, use statistical models they designed themselves to run analysis on the data. While everyone does technically have access to the information being used, the fact that distinct statistical models are being used allows the team at fivethirtyeight to ensure that they can get the most out of it.

A good example would be their sports analysis. For the NBA, Silver and his team pull information on team record since the 1950s, when data was first recorded. They then built a model to calculate an "ELO" score for each team that changes on the results of very game. While all the game scores were publically available, this model was able to utilize them to create an entirely new set of data. Now, thanks to the statistical models and algorithms created by the team, it's possible for any visitor to the site to directly compare two teams of different eras against each other. This was all thanks to the proprietary statistics work done by the 538 team.

Another good example of the value of statistics comes with "The Upshot," the New York Times' data blog. They recently posted an article on improving NCAA brackets. While that topic isn't necessarily the most important, the work done by their team and their model allowed them to gain insights that wouldn't be found otherwise. It's useful to a subset of people, but in other realms similar analysis can be extraordinarily useful. Even within the Upshot, articles are written about topics as diverse as blizzards and policies instituted by the Federal Reserve. In every scenario, it's clear that statistical analysis can allow one to make key insights that would otherwise be unknown.