Mutate(secsPassedQuarter = ifelse(numberPeriod %in% c(1:4), 720 - secsLeftQuarter, 300 - secsLeftQuarter), Mutate(secsLeftQuarter = (minuteRemainingQuarter * 60) + secondsRemainingQuarter) %>% keep_all = TRUE) %>% # remove duplicate events We’ll also create a column to find the number of seconds passed in the game based on the quarter and the time remaining in it, to facilitate our work with time played by each player/lineup. That will help us in the future when analyzing clutch stats (score within 5 or less points with 5 or less minutes remaining in the 4th quarter or OT). Besides the problems mentioned above, we are going to create our own scoring columns, the number of points scored in the play and the score margin before the play happened. # 3 2.19e7 77 1 4:57 Williams BLOCK ~ MISS Sabonis 2'~Ĭonsequently, we need to do some data cleaning. ![]() # 10 2.19e7 341 3 10:41 MISS Brooks 11'~ Gobert BLOCK (3~Īnd there are some event numbers that are out of order: play_logs_all %>% # idGame numberEvent numberPeriod timeQuarter descriptionPlay~ descriptionPlay~ Select(idGame, numberEvent, numberPeriod, timeQuarter, descriptionPlayHome, descriptionPlayVisitor) %>%Īdd_count(idGame, numberEvent, numberPeriod, timeQuarter, descriptionPlayHome, descriptionPlayVisitor) %>% For example, there are some duplicate plays: play_logs_all %>% We can see, however, that the data is not perfect. # idPlayerNBA2, idTeamPlayer2, idPersonType3. # idPlayerNBA1, idTeamPlayer1, idPersonType2 , # numberEventActionType, numberPeriod, idPersonType1 , # slugTeamPlayer3, slugTeamLeading, idGame , # slugTeamPlayer2, namePlayer3, teamNamePlayer3 , ![]() with 461,383 more rows, and 35 more variables: teamNamePlayer2 , # 2 Marc Gasol Raptors TOR Derrick Favors We then get the play-by-play data for every game of the season: plan(multiprocess) SlugTeamAway = ifelse(locationGame = "A", slugTeam, slugOpponent)) %>%ĭistinct(idGame, slugTeamHome, slugTeamAway) Mutate(slugTeamHome = ifelse(locationGame = "H", slugTeam, slugOpponent), Library(nbastatR) # devtools::install_github("abresler/nbastatR") However, you can find the output here) library(tidyverse) We then filter out the preseason games and the All-Star weekend games (the current_schedule() function appears to not be currently working. This post will be the first in a series of posts showing how to get to that, then how to analyze it.Īfter installing the nbastatR and loading the required packages, the first step is to get all the games from the 2019-2020 season. Then, we can analyze that data to answer questions about those lineups’ effectiveness on defense and offense, shooting and other metrics. There, we can find all the substitutions and therefore arrive at the lineup that was on the court for each team at any given moment in the game. It’s complicated because the R package doesn’t provide any lineup data, but it does provide play-by-play data. ![]() However, I recently decided to try working with something a little more complicated: team lineups stats. All of the data came from the nbastatR package, and the stats I posted were often fairly simple to get to. A few months ago, I created an account on Twitter to post code snippets reproducing some stats I saw in NBA-related posts and articles using the programming language R.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |