2018 NBA Playoffs Shot Location Data
There is a scene in Interstellar where Matthew McConaughey’s character, Cooper, is stuck in this five?four? dimension world. In that world he has access to every single day, year, second, millisecond, or literally any unit of time from his life. It was all there in front of him. It was like someone had laid out each moment on a graph and he could see all of it at the same time. Imagine taking that same concept and applying it to sports, specifically Basketball.
When you watch a basketball game, you watch it play by play, quarter by quarter, and minute by minute, and when the game is over, you pretty much care about one thing. Did your team score more points than the other team? This whole process is pretty linear. You can’t go back in time or fast forward to the end of the game. But what if you could get access to every shot from a game or playoff series at the same time? Now you can see the big picture. With the advent of shot tracking technology in the NBA we can do just.
Step 1: Ideation
The LA times had done this great piece on every shot that Kobe Bryant had taken during his career in the NBA. It inspired my to create something similar for my exploratory data visualization project. I originally wanted to collect shot location data for every game played so far during the 2019–2020 NBA season. It turned out that collecting that type of data was a lot harder than expected. The reason? The NBA owns all that data. If you don’t want to pay the NBA for that service, you have to scrape that data off of their main stats website, and while there were some great tutorials on how to do that, I didn’t have time to do that. Luckily, there was a great dataset available on Kaggle, that contained the shot location data for every shot taken during the 2018 NBA playoffs.
Now if I had been able to scrape shot location data for each game during the 2019–2020 regular season, I would have had a lot of data from a lot of games and players. The nice thing about this dataset on Kaggle was that it was filtered to only include data from the playoffs. Because my dataset was narrow and only from the 2018 NBA playoffs, it would be easy/manageable enough for someone to explore.
Step 2: Sketch/Design Phase
I had my data. Now I needed to start designing how my visualization was going to be laid out. First up was figuring out how an individual using this visualization would walk through it and use it.
My first rough sketch only included one basketball court diagram with multiple filters. One filter would allow a user to choose the series they wanted to see shot location data for. Another would be for specific games in that series and another would be for players, teams, and so on. A lot of filters! Here is what that first sketch looked like.
Not so great right? I realized soon enough, that I did not want to implement that many filters and figure out the correct logic so that correct data points were filtered and shown on the chart. I scrapped that idea and went back to the drawing board. I figured that it out would work much easier if instead of choosing a play-off series from a filter, a user could choose a series by clicking on a node in a play-off bracket type visualization. Based off of that choice, the data could be filtered accordingly and displayed on the chart. I didn’t want only a shot chart, so I also thought that it might be helpful and insightful for a user if there was also a bar chart showing the number made/miss shots from each area on the court.
Then there was the process of trying to figure out how this would functionally work. My original plan was to have the bracket be on a home page. When a user clicked on a specific play-off series, that would bring them to another page where they would have access to the shot chart as well as bar graphs. Each series (15 in total) would have its own page. Then a user could click on the home button taking them back to the bracket and the process would start all over again. I started coding this and it soon became very much a nightmare keeping track of all the different files, images, datasets, etc. NEVER AGAIN. So I decided to keep all the necessary components on the same page and have the data update in a state object on changes of certain filters.
Step 3: Data Cleaning
The dataset contained a lot of information. I loaded the dataset onto a jupyter notebook to first do some basic exploratory analysis. What was the datatype of each of my fields? How many null values did I have?
Each row in the dataset represents a single shot taken by a player. It shows us what game that shot was taken, whether or not it was made or missed, its x and y location data, and more. Because, I want the user to be able to only see data based off of the play-off series they chose, I needed to add a field to the dataset that shows which series that shot was taken in. I created that field using the HTM(Home Team) and VTM (Visiting Team)fields to identify what series the shot was from. So for example if a shot had HTM=”Toronto Raptors” and VTM= “Cleveland Cavaliers” or vice versa, then the shot would be mapped to the “Cavaliers vs Raptors series”
What I then needed to do was convert the “GAME_ID” field from a number that looked like this (41700101) to this (Game 1). Luckily all the game ID’s for each play-off series had a common trend. If it ended in a 1 then that was Game 1, if it ended in a 7 it was game 7 and so on. I had to map these GAME_ID values to a game number in another column because I needed to allow the user to filter shots based on the Game # from each series as well.
The last task that I had to do was clean up some of the team names so that it would fit within my play-off bracket visualizations. There were two teams in particular that I had to do this for. One being the TimberWolves and the other being the TrailBlazers. I changed them to T-Wolves and T-Blazers.
Step 4: Code
Now Im not going to provide a step by step walkthrough of the code but I did want to take some time and go through some of the methods, examples I used to make specific components of my visualization. For a full run down of the code, check out my GitHub repo.
Lets start with the play-off bracket. I knew that in order to make the bracket I would have to use some form of D3.hierarchy, but I first had to create a json file that had the correct parent/child nodes. I decided not to use d3.stratify to create the json file because my data was small enough that I could just brute force it and create the json file myself with the correct structure. I made the bracket using d3.hierarchy/tree map. The next challenge was spacing each node and then adding in the links between each node. Here is what it ended up looking like.
The next and I think most difficult part of this task was creating the basketball court diagram and then mapping each of the x and y locations of each shot to the diagram. I originally had no idea how to make this. I knew that I had to create some circles, arcs, rectangles, etc, but I had no idea how to position each of those circles or what range my scales needed to be at so that the shot mapped correctly. Luckily I found this great example done by Youth Bread on blocks. I used their scales and layout to map each shot correctly. Here is how it turned out.
The last part of the vis was a simple bar graph. This bar graph visualized the number of made/missed shots from each area on the court. As a user updates each filter, the bar graph and its associated scales update accordingly. I used D3's simple enter/update/exit methodology to update and animate values based on the associated filter selections. Here is how it turned out.
Step 5: Get some Feedback
When it came time to present my project, I was nowhere near done. In fact my vis looked absolutely nothing like what you see above. I was still struggling to get everything on the same page. But I showed my class what I had, and I have to tip my hat off to my classmates and professor. With barely nothing, they were still able to provide great feedback on my project. I want to highlight two pieces of feedback that I was able to incorporate into the final version of the vis.
The first piece of feedback came from my professor. The version that I had showed was the version that had 15 different pages, one for each series. She really encouraged my to try and add everything onto one page, and man was I so happy I listened. It saved me the huge headache of having to deal with different files, multiple paths, images, datasets, etc. Because of that advice I was able to keep the number of files I was working with to around 5 including html and css files. My code also became so much easier to read and debug.
The second piece of feedback that was really valuable for me was on how to compare shots location data across two different teams. If you remember, my original plan was to have all shots show up on a single court diagram. If I were to continue with that, I would have had to figure out how to differentiate from made vs missed shots as well as differentiate shots from different teams, so possibly four different colors? or 2 colors and 2 shapes? Nooooo wayyy. Way too confusing. One of my classmates recommended that I add two court diagrams, one for each team. That way I kept everything separate and it was easy to read.
Step 6: Final Thoughts
When I first started on my D3 journey, I thought that this kind of visualization would have been impossible. I remember when I first started this project, I was so stressed about creating that shot chart graph. It’s not like your normal bar or line graph. But after carefully taking the time to draw it out on paper, I realized that it was doable. These are just circle, rectangle, and path elements. Sketching along with carefully reading example code made it it much more simpler that I thought it was going to be. Another task that I thought was going to really difficult was fitting everything onto one page, but that also turned out to be simpler with the help of java script modules.
There is still so much more that I could do with this vis. I really would like to go back and add a shot clock component to the viz, showing the user how much time was left on the shot clock when the shot was taken. If there were 1–2 seconds left on the shot clock you knew that that shot was well guarded. I also would love to go back update some of the formatting. You’ll see that the titles and play-off brackets don’t necessarily align with each other. The last thing that I would come back and add is a function that helps adjust the size of each component when the window changes size.
Overall I learned a lot incredibly useful skills with this project.
- Using multiple javascript files and importing each into a main file.
- CSS Grids
- Using existing html elements (Circles/Elements) to create a new type visualization
- UX - How does a user walk through this visualization?
- Prioritizing and making design decisions based off time constraints.
- State Object Management
Until the next play off series…