Tag: Retrosheet

Updated Pennant Race Charts

The 2020, 2021, and 2022 MLB pennant race charts using Retrosheet data have been updated on the Exploratory Server: https://exploratory.io/dashboard/kc2519/Pennant-Races-1901-current-aUu5vDT1EW. All seasons from 1901-2022 are now available using the simple parameter selection (just make sure it’s set to the interactive mode).

Here’s a screenshot:

Views of the 2022 National League pennant races by division

Meanwhile, I’m struggling with some JSON output for my traditional version, so no updates there yet.

Interactive Pennant Races in Exploratory

I’ve been creating MLB pennant race charts for years now, covering every season from 1901 through 2019, with 2020, 2021, and 2022 to come soon. These charts have been available on the site in single charts for each season at a league (American or National) and division level (since 1969). This has always worked reasonably well, but I have always yearned for something a bit more interactive, where users could go to one place and enter the season and league they want to view. Finally, courtesy of the Exploratory Server, such a solution is now available.

Here’s a glimpse of what I’m talking about – first, the old way of doing things, which I’ll continue to maintain. The process starts with a visit to the pennant races page on this site:

Pennant races chart selection

Selecting a specific menu option will display a single pennant race, such as the 1901 American League race shown here:

1901 American League pennant race

These charts work well, and provide some interactivity, but it is strictly one chart per link, so not very efficient.

Now, here’s the alternative option using the Exploratory server. Here I can create very similar charts but with a parameter-driven menu enabling users to select a season and a league:

Exploratory pennant race seasons filter
Exploratory pennant race league filter

Here’s a case where we select the 1901 season and the American League filters, with the following result:

1901 AL pennant race in Exploratory

The real power in this approach comes with the seasons from 1969-2019, where each league had two and then three divisions. Selecting the 2019 season and the American League filter options will now deliver all three divisional charts on a single page!

You can try this out yourself; just make sure to set the Parameters interactive mode to “On” which will activate the filters; you can control the display as well to show one or more columns. I find that a single column works best for the pennant race charts.

https://exploratory.io/viz/kc2519/Pennant-Races-Games-Over-500-Qvx9ZEF0In

I’ll be working more on this as part of the visualization options going forward; there are other cases where I can use similar functionality. Thanks for reading, and see you soon!

Retrosheet 2022 Data is Here!

One of the primary source data sets I use to create baseball visualizations is the amazingly detailed information captured by the Retrosheet project, a dedicated group of volunteers providing play-by-play and game level information for each MLB season. They have recently passed the 100-year milestone, with data from the 1921 & 1922 seasons now available. I have some catching up to do on the older seasons, but just downloaded the 2022 season for adding to my databases.

The data comes in two distinct sets – game logs being the much easier of the two to work with, due to the smaller data size. Each game played in a season is captured at a summary level (~ 2,400 records), with information pertaining to the score, players, umpires, attendance, and much more. This information is used to feed my game summary visualizations:

2007 Game Summary results

As you can see, these are bite-sized summaries of every game, showing some of the important summary data for a game. They can be filtered to find specific teams, pitchers, scores, and much more. These visualizations are currently available covering the 1955-2019 seasons; one of my immediate goals is to add the 2020, 2021, and 2022 seasons, before starting to work in reverse with pre-1955 campaigns.

Fortunately, I have lots of SQL code built up over the years to make the data update process fairly simple; the 2022 game logs have already been added, and now I’ll get to work on the play-by-play data. Stay tuned for updates, and thanks for reading!

Bad Trades, Red Sox Edition

This is the first in a series of posts where I take a look at notoriously one-sided baseball trades, using the baseball trade networks published on this site earlier in 2022. I won’t necessarily rank these deals in any sort of order; rather I will pick out a few from the network trade graphs and provide some analysis and context for some of the most notorious transactions.

If you haven’t seen the trade networks previously, here’s a link.

The networks were built using data from Retrosheet and Neil Paine, loaded into Gephi, a network analysis and visualization tool, and ultimately pushed to the web where I could finish styling the graphs. Graph nodes (the circles in the networks) are sized based on the total future WAR (Wins Above Replacement) accrued by the teams involved in the trade. All values must occur at the major league level (MLB), so players involved in the deal who don’t reach the MLB level with their new team will have a zero value. Only the cumulative WAR value while playing for the new team is included; we are not calculating WAR once a player leaves one of the teams involved in the transaction.

Finding a bad trade by scanning the networks is more an art than a science; the key is to look for large nodes (indicating a lot of future WAR value), and then dissecting the trade to see how much value each team received. The other alternative is if we already know the player(s) we are looking for; in these cases we can perform a simple search to find the trade. Here’s a classic example that Red Sox fans would love to forget – trading future Hall of Famer Jeff Bagwell for journeyman reliever Larry Anderson. Let’s go to the Red Sox trade network and search for Jeff Bagwell.

Red Sox trade network

Typing in Jeff Bagwell locates him quickly within the trade network. Note that even if a player is involved in multiple trades to or from the same team (rare but possible) the search will locate each transaction. Here’s the Bagwell transaction, showing his player node and future WAR value connected to the transaction node; every player involved in that transaction will be connected to the trade node, as long as there is some future WAR value. If a player in the trade did not play in the majors for the receiving team, they will not be reflected in the graph. Here’s a view of Jeff Bagwell relative to the trade:

Jeff Bagwell transaction

We can also click on the transaction node to see the value provided to each team by all of the players involved in the trade, again assuming they spent time with the team and were not limited to the minor leagues. Clicking on that node will display the respective WAR values in the sidebar on the left of the screen:

WAR values of the trade

Here’s where we get to the details of the trade, and specifically the direct benefits accrued to each team. The Red Sox received 1.1 future WAR from Larry Anderson; to put this in perspective, we might expect this sort of value for an average player for a single season. The Astros, on the other hand received an incredible 93.8 WAR from Jeff Bagwell, or close to 6 WAR per season for 16 years! That is a Hall of Fame level performance, and it eventually led to his selection to Cooperstown in 2017. Here’s a profile that mentions the one-sided trade.

While we have the Red Sox network open, let’s see if there are any other disastrous transactions (other than the cash sale of Babe Ruth to the Yankees, technically not a trade). After scanning the network, we find this one from 1928:

Transaction 59324 – Buddy Myer

This one is clearly not a Bagwell-level disaster, but was still quite negative for the Red Sox, with a WAR differential of 30 points. The primary villain here is Buddy Myer, a solid infielder who hit .300 or better seven times for the Senators. Not a major star, but the owner of a very nice career, including leading the American League in batting average in 1935.

Let’s try to find one more before closing this piece, this time favoring the Red Sox. We zero in on this deal:

Transaction 59403 – Jimmie Foxx

The Red Sox netted nearly 45 future WAR value while surrendering just 0.1; most of the benefit was generated by slugging future Hall of Famer Jimmie Foxx, but they also received a nice three season contribution from pitcher Johnny Marcum. Note how we also removed nodes not involved in the trade by clicking on the edges icon on the bottom left of the display area; this makes it easier to focus on the details.

Feel free to try your hand at finding more of these one-sided deals in the Red Sox or any other trade networks. I’ll be back with some other teams before long. Thanks for reading!

Trade Network Updates, Part 1

A few years back (2016 o be specific) I created network graphs displaying the history of trades made for each MLB franchise, using transactions data from the wonderful Retrosheet project. These graphs presented more than a few challenges in how to present the data but I wound up with what I consider to be a very interesting set of results, which you can find here. I also created some posts on the process at that time, found here and here.

Here’s a snapshot within a graph:

Six seasons have elapsed since I created those graphs, so I thought it was beyond time to update them, but this time with a twist. Last fall I came across a great dataset that captures an array of advanced sabermetric statistics which I hope to use on a regular basis. These statistics can be used to assess a player’s true value relative to his peers each season. What if I could incorporate those into the trade network updates to show the post-trade value of each player to their new team? Ideally, this will help to show the value of each trade and which team wound up getting the better part of the deal.

Of course this would involve adding a degree of complexity to the MySQL code for pulling the data and shaping it for use in creating network graphs. However, the end result could be very revealing and worthwhile. Today I’m at the start of the process, tinkering with SQL code to extract the data in a proper format. Here’s an example:

SELECT h.player_name, p.playerID, tr.season, tr.TransactionID, tr.TeamFrom, tr.TeamTo, ROUND(SUM(h.WAR162),1) as WAR

FROM historical_WAR_and_more h
INNER JOIN People p
ON h.key_bbref = p.bbrefID
INNER JOIN trades2021 tr
ON p.retroID = tr.Player

WHERE tr.season >= 1901 and h.year_ID > tr.season and h.team_ID = tr.TeamTo AND tr.Type = ‘T’

GROUP BY h.player_name, p.playerID, tr.season, tr.TransactionID, tr.TeamFrom, tr.TeamTo

In this case, I’m looking at the cumulative WAR (Wins Above Replacement) for each traded player with their new team. This could be a single season total or the sum of many years in some cases. Here are some results:

We now have post-trade results (starting if the season following the trade) as measured by WAR for each traded player. We see one fairly substantial figure – the second Aaron Harang trade which netted 16.9 WAR points for his new team, the Cincinnati Reds (CIN in the results). Given that a single season WAR above 3 or 4 is considered substantial, it’s clear that his new team probably benefited from a few of those high-value seasons. What we can’t see yet is what they gave away in their half of the trade.

Fortunately, we can access this using the TransactionID field, which provides all the information for each party within the trade. But we’ll save that for another day as I figure out the next progression of the code. As always, thanks for reading!

Welcome to 2022!

I for one am looking forward to 2022 after a couple of interesting, often challenging years affected my desire to generate interesting analytics and data visualizations. The less said the better – simply excited to get back to updating some existing visuals and adding a host of new ones.

I’ll be doing a lot of work using the Exploratory toolkit which keeps improving by the day. It is simply a great tool for handling large (or small) data sets from start to finish; I especially love it’s data wrangling capabilities.

On the data source side, Retrosheet and the Lahman database will continue to feed my analysis and visuals; none of what I create would be possible without these great resources. Retrosheet data (used for game level and play level detail) is already updated through the 2021 season; part of this year’s plan is to add older years (pre-1955) to my local database. The Lahman data (season level) is typically available around February and I’ll be downloading it to my databases at that time.

Stay tuned for updates throughout 2022 – they should be a lot more frequent than the last two years. Happy New Year!

Major League Baseball Trade Networks, Part 1

One topic that has long fascinated me as network graph material is trade data between major league baseball (MLB) teams. I have previously created a static visualization showing activity at a macro level, i.e.- the number of trades between teams over a 100+ year period. Yet there was a desire to do something more, and to make it interactive so users would be able to sift through the data for their favorite teams to understand trade patterns through a visual representation. Today, after weeks spent tinkering with this topic, I finally have something to share, and will walk through how it is created and how to engage with it online. If you want to play with it before reading further, visit the Tigers trade network.

Here’s what we’ll wind up with:

Tigers trade network
Tigers trade network

My tools of choice in this endeavor are familiar ones to anyone working with baseball data, network graphs, or perhaps both, although I haven’t seen many instances of the latter. The trade data can be found at Retrosheet, as part of a seemingly boundless array of baseball data, both statistical and historical in nature. Gephi, the open source network analysis tool is again my choice for creating the network structure from the raw data, and Sigma.js is once more the tool for web implementation. Mix in a bit of Excel and PowerPoint for good measure, and we have all the tools necessary to create a pretty cool (IMHO) finished graph.

So let’s get started. Our first step is to go to the Retrosheeet site and download trade data, found at Retrosheet transactions. Be aware that there is much more than trade data in this dataset; free agent transactions, releases, and many other transaction types are available. My approach is to grab the entire dataset, which I can then load into a MySQL database for filtering and matching to other baseball data from both Retrosheet and the Lahman archives. For our example, only trades will be used; this leaves open the future possibility to examine free agent signings or other transaction types.

Once I have the data in MySQL (I’m purposely skipping over this process), the coding steps begin. This was a very iterative process as I gradually figured out how Gephi would play with the output data, but I won’t bore you with my multiple missteps. Instead, let’s have a look at the code snippets, and I’ll explain their usage and the thought process behind them. We’ll start with a view of the code, created within the (free) Toad for MySQL tool. In creating this code, we need to understand how Gephi (or other network analysis tools) work. At the risk of over-simplifying, Gephi only needs nodes and edges. Nodes will represent the players or teams in our visual, while edges will show the linkages within a single trade – who was traded for whom, and which players moved together from one organization to another.

Node creation is simple – we just grab all players involved in a trade, and do likewise for all teams. Here’s the code for players:

SELECT t.Player as Player, CONCAT(m.nameFirst, ” “, m.nameLast) as Name, count(*) as transactions

FROM trades2015 t
INNER JOIN Master m
ON t.Player = m.retroID

WHERE t.Type = ‘T’ and (t.TeamFrom > ‘A’ OR t.TeamTo > ‘A’)
GROUP BY t.Player, CONCAT(m.nameFirst, ” “, m.nameLast)

All we’re doing here is creating a node size for each player, based on the number of trades they are involved in.

For Teams, the logic is a bit more complex; since team names have changed from season to season, we need to join on both team and season to get the correct name assignments. We also want to account for the direction of each transaction, which we do using a UNION query.

SELECT b.Team AS Id, b.Name As Label, SUM(b.transactions) as Size
FROM
(SELECT t.TeamFrom as Team, te.name as Name, count(*) as transactions

FROM trades2015 t
INNER JOIN Teams te
ON te.teamID = t.TeamFrom and t.Season = te.yearID

WHERE t.Type = ‘T’ and t.TeamFrom > ‘A’
GROUP BY t.TeamFrom, te.name

UNION ALL

SELECT t.TeamTo as Team, te.name as Name, count(*) as transactions

FROM trades2015 t
INNER JOIN Teams te
ON te.teamID = t.TeamTo and t.Season = te.yearID

WHERE t.Type = ‘T’ and t.TeamFrom > ‘A’
GROUP BY t.TeamTo, te.name) b
GROUP BY b.Team, b.Name

After running the queries, we have results that can be posted into Excel or other spreadsheet software, where a tab-delimited file can be saved for use in Gephi. Our file data looks like this:

Id Label Size
aardd001 David Aardsma 4
aaroh101 Hank Aaron 1
aased001 Don Aase 1
abadf001 Fernando Abad 1
abbae101 Ed Abbaticchio 1
abbeb101 Bert Abbey 1
abbof101 Fred Abbott 1
abboj001 Jim Abbott 2

and for the team entries:


OAK Oakland Athletics 936
PH4 Philadelphia Athletics 6
PHA Philadelphia Athletics 355
PHI Philadelphia Blue Jays 28
PHI Philadelphia Phillies 1445
PHI Philadelphia Quakers 3
PIT Pittsburg Alleghenys 9
PIT Pittsburgh Pirates 1416

This is all Gephi requires for displaying nodes – an ID, a Label, and size. Even the label and size are not required fields, but they do make things easier if done in advance. So far, so good. Next we’ll move on to the somewhat more involved process of creating edge files.

As I progressed deeper into this project, it became evident to me that there were four different types of edges to display. The first two were obvious and easy – players being traded to a team, or from a team. Yet I also wanted to see the other players involved in each transaction, which necessitated the addition of two more edge type – traded with other players, and traded for other players. Note that in many cases just two or three of these might come into play, and for many prominent players, we’ll have none at all. Thus, the likes of an Al Kaline or Ted Williams will not be found in any of these graphs, as they remained with a single team for their entire careers.

Here’s the final edge code I wound up with to create the four categories of trades to be displayed in a graph. Gephi requires three edge attributes – a source value, a target value, and an edge type. The edge type must be either undirected or directed; for our graph, all edges will be directed, since we intend to show the bi-directional movements within each transaction. The first bit of code is for instances where a player was traded from a team:

SELECT tr.Season, tr.TransactionID, tr.PrimaryDate, tr.TeamFrom AS Source, tr.Player as Target,
CASE WHEN tr.Type = ‘T’ THEN ‘Trade’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Release’ END as Category,
CONCAT(m.nameFirst, ” “, m.nameLast, ” “, CASE WHEN tr.Type = ‘T’ THEN ‘Traded’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Released’ END, ” on “, tr.PrimaryDate, ” from “, t.name) AS Label, ‘Directed’ as Type, ‘Traded From’ as CategoryDetail

FROM trades2015 tr
INNER JOIN Master m ON tr.player = m.retroID
INNER JOIN Teams t ON tr.TeamFrom = t.teamIDretro and t.yearID = tr.season

WHERE tr.type = ‘T’

Note the legacy code covering free agency and releases, rendered moot by the WHERE clause. These will have to wait for another set of graphs. In a similar fashion we have code for trades where a player comes to a team.

SELECT tr.Season, tr.TransactionID, tr.PrimaryDate, tr.Player AS Source, tr.TeamTo as Target,
CASE WHEN tr.Type = ‘T’ THEN ‘Trade’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Release’ END as Category,
CONCAT(m.nameFirst, ” “, m.nameLast, ” “, CASE WHEN tr.Type = ‘T’ THEN ‘Traded’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Released’ END, ” on “, tr.PrimaryDate, ” to “, t.name) AS Label, ‘Directed’ as Type, ‘Traded To’ as CategoryDetail

FROM trades2015 tr
INNER JOIN Master m ON tr.player = m.retroID
INNER JOIN Teams t ON tr.TeamTo = t.teamIDretro and t.yearID = tr.season

WHERE tr.type = ‘T’

Next, it’s time to create linkages with players from the same transaction, first those moving in the same direction (traded with) in the trade.

SELECT tr.Season, tr.TransactionID, tr.PrimaryDate, tr.Player AS Source, tr2.Player AS Target,
CASE WHEN tr.Type = ‘T’ THEN ‘Trade’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Release’ END as Category,
CONCAT(m.nameFirst, ” “, m.nameLast, ” “, CASE WHEN tr.Type = ‘T’ THEN ‘Traded’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Released’ END, ” on “, tr.PrimaryDate, ” with “, m2.nameFirst, ” “, m2.nameLast) AS Label, ‘Directed’ as Type,
‘Traded With’ as CategoryDetail

FROM trades2015 tr
INNER JOIN trades2015 tr2
ON tr.TransactionID = tr2.TransactionID
INNER JOIN Master m ON tr.player = m.retroID
INNER JOIN Master m2 ON tr2.player = m2.retroID

WHERE tr.type = ‘T’

Note the need to duplicate the Master table in the code, since we now require multiple player names to populate the Source and Target fields in Gephi. The same holds true for our last snippet, where players are traded for one another.

SELECT tr.Season, tr.TransactionID, tr.PrimaryDate, tr.Player AS Source, tr2.Player AS Target,
CASE WHEN tr.Type = ‘T’ THEN ‘Trade’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Release’ END as Category,
CONCAT(m.nameFirst, ” “, m.nameLast, ” “,CASE WHEN tr.Type = ‘T’ THEN ‘Traded’ WHEN tr.Type = ‘F’ THEN ‘Free Agent Signing’ WHEN tr.Type = ‘Fg’
THEN ‘Free Agent Granted’ WHEN tr.Type = ‘R’ THEN ‘Released’ END, ” on “, tr.PrimaryDate, ” for “, m2.nameFirst, ” “, m2.nameLast) AS Label, ‘Directed’ as Type,
‘Traded For’ as CategoryDetail

FROM trades2015 tr
INNER JOIN trades2015 tr2
ON tr.TransactionID = tr2.TransactionID
INNER JOIN Master m ON tr.player = m.retroID
INNER JOIN Master m2 ON tr2.player = m2.retroID

WHERE tr.type = ‘T’

Each of these bits of code outputs results, which are then copied and pasted into our edges spreadsheet. Here are five rows showing each of our four trade categories:

Season TransactionID PrimaryDate Source Target Category Label Type CategoryDetail
2010 62908 20100731 KCA ankir001 Trade Rick Ankiel Traded on 20100731 from Kansas City Royals Directed Traded From
2010 60709 20100831 TEX ariaj001 Trade Joaquin Arias Traded on 20100831 from Texas Rangers Directed Traded From
2010 62264 20101118 COL barmc001 Trade Clint Barmes Traded on 20101118 from Colorado Rockies Directed Traded From
2010 72627 20101217 TBA bartj001 Trade Jason Bartlett Traded on 20101217 from Tampa Bay Rays Directed Traded From
2010 72622 20100709 TEX beavb001 Trade Blake Beavan Traded on 20100709 from Texas Rangers Directed Traded From

2010 62908 20100731 ankir001 ATL Trade Rick Ankiel Traded on 20100731 to Atlanta Braves Directed Traded To
2010 60709 20100831 ariaj001 NYN Trade Joaquin Arias Traded on 20100831 to New York Mets Directed Traded To
2010 62264 20101118 barmc001 HOU Trade Clint Barmes Traded on 20101118 to Houston Astros Directed Traded To
2010 72627 20101217 bartj001 SDN Trade Jason Bartlett Traded on 20101217 to San Diego Padres Directed Traded To
2010 72622 20100709 beavb001 SEA Trade Blake Beavan Traded on 20100709 to Seattle Mariners Directed Traded To

2010 62908 20100731 ankir001 blang001 Trade Rick Ankiel Traded on 20100731 for Gregor Blanco Directed Traded For
2010 62908 20100731 ankir001 chavj001 Trade Rick Ankiel Traded on 20100731 for Jesse Chavez Directed Traded For
2010 62908 20100731 ankir001 collt001 Trade Rick Ankiel Traded on 20100731 for Tim Collins Directed Traded For
2010 60709 20100831 ariaj001 franj004 Trade Joaquin Arias Traded on 20100831 for Jeff Francoeur Directed Traded For
2010 72627 20101217 bartj001 figuc001 Trade Jason Bartlett Traded on 20101217 for Cole Figueroa Directed Traded For

2010 62908 20100731 ankir001 farnk001 Trade Rick Ankiel Traded on 20100731 with Kyle Farnsworth Directed Traded With
2010 66840 20101219 betay001 greiz001 Trade Yuniesky Betancourt Traded on 20101219 with Zack Greinke Directed Traded With
2010 72622 20100709 beavb001 luekj001 Trade Blake Beavan Traded on 20100709 with Josh Lueke Directed Traded With
2010 72622 20100709 beavb001 smoaj001 Trade Blake Beavan Traded on 20100709 with Justin Smoak Directed Traded With
2010 62908 20100731 blang001 chavj001 Trade Gregor Blanco Traded on 20100731 with Jesse Chavez Directed Traded With

We have now successfully prepared the data for Gephi. In our next post, I’ll examine the process starting with the Gephi data import phase. Thanks for reading!