Category: Pennant Races

Updated Pennant Race Charts

The 2020, 2021, and 2022 MLB pennant race charts using Retrosheet data have been updated on the Exploratory Server: https://exploratory.io/dashboard/kc2519/Pennant-Races-1901-current-aUu5vDT1EW. All seasons from 1901-2022 are now available using the simple parameter selection (just make sure it’s set to the interactive mode).

Here’s a screenshot:

Views of the 2022 National League pennant races by division

Meanwhile, I’m struggling with some JSON output for my traditional version, so no updates there yet.

Interactive Pennant Races in Exploratory

I’ve been creating MLB pennant race charts for years now, covering every season from 1901 through 2019, with 2020, 2021, and 2022 to come soon. These charts have been available on the site in single charts for each season at a league (American or National) and division level (since 1969). This has always worked reasonably well, but I have always yearned for something a bit more interactive, where users could go to one place and enter the season and league they want to view. Finally, courtesy of the Exploratory Server, such a solution is now available.

Here’s a glimpse of what I’m talking about – first, the old way of doing things, which I’ll continue to maintain. The process starts with a visit to the pennant races page on this site:

Pennant races chart selection

Selecting a specific menu option will display a single pennant race, such as the 1901 American League race shown here:

1901 American League pennant race

These charts work well, and provide some interactivity, but it is strictly one chart per link, so not very efficient.

Now, here’s the alternative option using the Exploratory server. Here I can create very similar charts but with a parameter-driven menu enabling users to select a season and a league:

Exploratory pennant race seasons filter
Exploratory pennant race league filter

Here’s a case where we select the 1901 season and the American League filters, with the following result:

1901 AL pennant race in Exploratory

The real power in this approach comes with the seasons from 1969-2019, where each league had two and then three divisions. Selecting the 2019 season and the American League filter options will now deliver all three divisional charts on a single page!

You can try this out yourself; just make sure to set the Parameters interactive mode to “On” which will activate the filters; you can control the display as well to show one or more columns. I find that a single column works best for the pennant race charts.

https://exploratory.io/viz/kc2519/Pennant-Races-Games-Over-500-Qvx9ZEF0In

I’ll be working more on this as part of the visualization options going forward; there are other cases where I can use similar functionality. Thanks for reading, and see you soon!

Pennant Race Charts Updated!

The last of my big three annual updates is now complete, as all 2016 & 2017 pennant race charts have been created, and now reside in the Visual-Baseball Project portfolio. These charts are created using NVD3, which is built on top of the powerful d3.js framework developed by Mike Bostock. These tools help make the charts highly interactive, allowing you to see where each team stands at any given point in the season, and also providing the ability to zoom in using a smaller sub-chart beneath the primary display.

The structure of the charts is based on every team’s relationship to a .500 winning percentage – a situation where a team wins exactly as many games as it loses. This structure allows for easy interpretation of the results, as we can see which teams hover near the .500 mark (i.e.- consistent mediocrity), others that rise well above this level, and also those teams that descend far below the breakeven point. Allow me to illustrate these thoughts using the 2017 American League Central division, and my hometown Detroit Tigers, who suffered through their worst season since 2003.

2017_AL_Central

As you can see, the darker orange line representing the Tigers takes a steep dive starting in early August, culminating in a final record 34 games below the .500 percentage. Meanwhile, the rival Cleveland Indians (light blue line) present a near mirror image of the Tigers failure, with a sensational month of September that ultimately lands then 42 games over the .500 break-even level.

Similar charts have been created for the other divisions for both the 2016 & 2017 seasons. In fact, you can now view any season, league, and divisional splits dating back to the 1901 campaigns, a total of 380 pennant races to explore! Find all the pennant race charts here. Have fun exploring, and as always, thanks again for reading!

Baseball Grafika Book: Excel Dashboards

Baseball stats are ideally suited for display using a wide variety of charts, network graphs, and other visualization approaches. This is true whether we are using spreadsheet tools such as Microsoft Excel or OpenOffice Calc, data mining tools like Orange, RapidMiner or R, network analysis software such as Gephi and Cytoscape, or web-based visualization tools like D3 or Tableau Public. The sheer scope and variety of available baseball statistics can be brought to life using any one of these or countless other tools.

This is why I felt the need to create a book merging the rich statistical and historical data in the baseball archive with the advanced analytic and visual capabilities of the aforementioned tools. With a bit of good luck and perseverance, the book will be published in late April, coinciding with the early stages of the 2016 baseball season, under the title Baseball Grafika. This series of articles will share a few pieces from the book, which is still undergoing additions and revisions at this stage. I hope these will help provide some insight into how I view the possibilities for visualization, and perhaps generate your interest for how other datasets could benefit from a similar approach.

One of the chapters of the book deals with the creation of dashboards in Excel that allow us to distill large datasets into a single page summarizing information. Here’s an example of a single pennant race, and how it’s unique story can be told using an array of charts, tables, and graphics.

AL_1967_5

Now that we’ve seen an entire dashboard, we’ll look at the component pieces and how they were built. As a reminder, this is all created in Excel, which is often maligned as a visualization tool. Used well, Excel can produce highly effective visualizations, although deploying them to the web is not practical. In the book, I walk through how to create this dashboard using Excel, taking readers through all the steps needed to create formulas, charts, text summaries, and more.

Creating flexible, powerful data displays in Excel frequently involves the use of pivot tables and slicers (filters) that allow for data manipulation. Building charts on top of these tools permits maximum flexibility. Done effectively, this means we can create a template that can be used over and over, with only the source data changing according to our slicer selections. Here’s an example pivot table with slicer options:

team_pivot

The slicer selections allow us to choose the data elements from our base dataset that are to be displayed in a pivot table. From there, name ranges and formulas can be used to select the data programatically, and feed it into charts that are not dependent on any additional manual intervention. One chart, used over and over, makes it simple to display new data with a single click of a slicer button.

Name ranges can be used extensively to automate the dashboard to a high degree, using native Excel functionality. Here’s a screenshot showing a name being defined in Excel:

Excel_name_range

A virtually unlimited number of name ranges can be created, and then used as references in Excel cell formulas, making it easy to populate cells, tables, or charts with updated information.

Each of the following sections of the final dashboard are populated using one or more name ranges based on pivot table data in most cases. All that is required in the dashboard is a simple formula to grab the right data based on the slicer selection.

First, we create a basic text summary recapping each season, which is then pulled into the top section of the dashboard:

AL_1967_1

This is then followed by the pennant race section of the dashboard, including both the pennant race charts as well as a table of season-ending standings information. One pivot table and its references populate the chart, while a second pivot is used to provide the table data, with cell-level formulas performing calculations.

AL_1967_2

Our third section makes use of the wonderful Sparklines for Excel add-in. Our dashboard benefits from the use of horizon and variance charts, as well as box plots. In between, we’re able to add some additional Excel cell calculations to display metric values.

AL_1967_3

The final section of the dashboard takes advantage of some cell formulas to create dotplots displaying relative values within a category. This allows readers to see who was higher or lower in a specific measure, maximizing space along the way, which is often critical when building dashboards.

AL_1967_4

The book will provide much more, including tutorials on creating this type of dashboard, in addition to other visual displays of baseball information. Ultimately, the goal is to share some of my approaches and hope that they drive others to create their own unique approaches, all in the interest of advancing the discipline of baseball data visualization.

Future posts will examine other ways we can explore our baseball data. Text mining, statistical distributions, interactive charts, historical maps, and network graphs will be among our future topics. See you soon.

A New Book Resolution for 2016

As we enter a new year, I find myself eager to create a new book that explores the world of baseball data using a wide array of data visualization approaches. This idea has been in my head for several years at least, and has found partial fulfillment in my previously published pennant races book. However, I wish to tackle something broader that will touch a number of baseball categories as well as multiple data visualization approaches.

The working title for the book is ‘Baseball Grafika’, grafika being the Czech and Polish word for graphics, a word which still conveys the intent of the book regardless of language. If all goes well, the book will be available early in the 2016 baseball season, and will cover the following topics:

  • Franchise player networks
  • Trade pattern networks
  • Hall of Fame connection network
  • Franchise location maps
  • Player birthplace maps
  • Pennant race charts
  • Standings charts
  • Career trajectory graphs
  • Baseball dashboards

Fortunately, much work has been done over the last several years on at least a few of these topics, so we’re not starting from scratch, but this will still be a considerable, yet rewarding, challenge. Updates to come.