Wednesday 20 June 2012

Some Thoughts on Heat Mapping

From the all-knowing Wikipedia, a heat map can be defined broadly as:
"a graphical representation of data where the individual values contained in a matrix are represented as colors."
A couple of things, first, it is a graphical representation of data.  Second, in spatial applications location is defined by the matrix in the above definition. Finally, representation is done using colors. Heat maps have been used to visualize data in a wide variety of sports applications, for example, pitch location charts in baseball, shot location charts in basketball and hockey, and player movements in soccer.

Typically, the value of the heat map at a given location is defined by a count or a proportion of a count that is successful. This takes an ecological perspective, whereby spatial units are analogous to quadrats (although they need not be square!). In some datasets, the size of the quadrat will be limited by the spatial resolution associated with the data, for those of you familiar with GIS or cartography, this can be interpreted as the minimum mapping unit.

Recently, some of Kirk Goldsberry's nice work on mapping basketball has been featured by major US news networks (for example, the NY Times). In this analysis heat maps are used to visualize and compare the shot frequency and success rates of the 2012 NBA Championship finalists, the Miami Heat and Oklahoma City Thunder, along with specific analysis of the key players in the series, including superstars Lebron James and Kevin Durant. Goldsberry uses heat-maps to effectively visualize differences in the spatial patterning of field goal attempts between the two teams overall, and between individual players (see image from that article below, comparing the Heat vs. Thunder).

What I love about this particular piece of work is that it is able to represent two variables within a single heat map. Shot frequency is displayed using the size of the hexagons (not a square spatial unit!), and shot efficacy (defined as points per field goal attempt) is displayed using color. It makes for a really effective way to visualize these two aspects of field goal shooting. This makes interpreting the results also more informative, as we have more confidence in the values represented by larger hexagons.

In spatial statistics, this type of analysis falls under the category of spatial point pattern analysis. The underlying shot location representing a spatial point pattern with the number points scored with each shot representing a stored attribute (this special case is termed a marked point pattern). In a future post, I will discuss some common pitfalls encountered in spatial point pattern analysis, along with some statistical techniques for generating alternative forms of "heat maps".