The post The Brandan Wright Trade? appeared first on Nylon Calculus.

]]>Thursday the Boston Celtics traded Rajon Rondo to the Dallas Mavericks for Brandan Wright, Jae Crowder, Jameer Nelson, a second round pick and a complicated protected first round pick that most likely will be conveyed in the 2016 draft. The way things work in the NBA is that the trade gets remembered as the trade for the guy that was the biggest star, which can either be measured in All Star appearances, salary, or simply by the ratio of players and assets given up to get him. So, by any sane conventional sense this will be known as the ‘Rondo Trade.’

**Advanced Math: .76 > .42**

If, however, we take a less sane perspective and look at who as been the most productive player this year, we see that Wright has been the most productive player included in the trade by a number of metrics, in which case this would be known as the ‘Brandan Wright Trade.’ To my mind the best practice is neither to ignore nor rely completely on one number player metrics, but to use them as a starting poin,t as a sort of Bayesian prior when looking at players or trades.

Pretty clearly every metric rates Wright as the most productive player so far this year, or last year for that matter^{1}:

The Player Tracking Plus Minus (PT-PM) beta model I created agrees too. Below is a chart with the elements that make up the metric and each player’s overall score.^{2}.

Here is where I think things get interesting, along with the more granular breakdowns from the table via Basketball Reference. In the cart one can actually see why Rondo (second bar from the right) is such a polarizing player. He has performed as the best passer in the league by far, but is well below average in retaining offensive possessions, scoring, rim protection and team defense effect.

Wright on the other hand is credited with his incredibly efficient scoring (albeit in low volume), ability to extend offensive possessions ,end defensive possessions and protect the basket. On the other hand his passing numbers and team defense effect on a below average Dallas defense are rated below average.

That, however, isn’t the whole story.

For example, a number of studies, including a couple I have done, as well as common sense ^{3} indicate that minutes played by NBA players conveys some information about players abilities not captured by box score data. In that case, we need to take note of the fact that Wright has never been a starter or averaged over 19 minutes a game. That is his approximate average with Dallas this year, so, it is worth noting that numbers like total Win Shares and Value Over Replacement Player (VORP), indicate the difference between Rondo’s production this year and Wright’s is less than the per 48 minutes figures. However, my own study indicated that the informational value of time on the court may be log linear rather linear, in other words there is more information increasing a player’s time from a total of 100 minutes over a season to 600, than from 1,100 to 1,600.

There are also interesting questions about whether the basically linear player models can capture the net impact of some one as singularly talented at one aspect of the game as Rondo is at passing. Though the on/off numbers in terms of offensive efficiency for the Celtics teams do not indicate any significant boost with Rondo on the court beyond what the player metrics.

**There Will Be Regression**

Simple statistical knowledge tells us to expect regression toward the mean for such significant outliers such as Rondo and Wright in terms of scoring efficiency, one of the less stable (though very important!) statistics in basketball. This is by far Rondo’s least efficient scoring performance to date in his career, though, admittedly, the trend has been down for a few years. It is also Wright’s most efficient scoring season so far, another ‘regression’ flag.

Further, my own studies doing win projection models, as well as other studies, have shown that correlation between a player’s box score statistics and efficiency is lower when they change teams; an indication that fit matters. My own projection system apples a small increase in mean regression for players that changed teams in the off season. It should be noted that Dallas and the San Antonio Spurs are two of the teams that seem to best utilize fit, as well as that player’s who are less efficient and play less are more likely to change teams as well, stars get longer contracts and get traded less.

**Discontinuity, Salary, Time Horizon, and Fit**

As usual, Zach Lowe does a very good job of laying out the pros and cons in terms of roster fit with Rondo on the Mavericks, which is suggested reading, so I won’t recover that ground.

With Wright there are reasons to adjust our expectations for value added as well. The first of which is that Wright has a similar time horizon mismatch with the Celtics that Rondo did in that he will be a free agent at the end of the year, in which the Celtics do not figure to be anywhere near contenders. For that reason, even if Wright plays as productively as he did in Dallas he may have more value to the Celtics in a trade than as a player likely to walk away at the end of the season. It is really not sane to think of a trade as being about player that stayed with a team for sixty days.

In terms of fit Wright instantly becomes the best rim protector on the team^{4}. His ability to finish the pick-and-roll would have worked nicely with Rondo, but will still be useful especially alongside the Celtic’s stretchy young bigs Kelly Olynyk and Jared Sullinger. On the other hand Wright’s lack of scoring versatility and range, as shown in the shot chart below, will limit Wright’s cumulative effect on the Celtic’s offense and the ability to pair him with similar non-shooting bigs as it did in Dallas.

On the Mavs’ side of the ledger the time horizon is roughly now, or at least while Dirk Nowitski remains a productive player. There are reports that the possibility of an ‘extension’ with Rondo was broached, which is important given that the Mavs are unlikely to have significant cap space this off season, but each side has an out if either fit or Rondo’s play don’t work out.

Ultimately, I think, the degree of discontinuity between Rondo’s peak performance and his post knee surgery will determine the success of the trade for the Mavericks. And only an unforeseen alignment of time horizon and fit with Wright in Boston could genuinely make this the ‘Brandan Wright’ trade even if he has been the more productive player in the last two years.

- Via Basketball Reference ↩
- Each element is weighted by the coefficients in the model. Scoring weights points minus field goal and free throw attempts and catch and shoot ratio, Passing weights points created by assist and passing efficiency, Offensive Possessions weights contested and uncontested offensive rebounds and turnovers, Rim Protection weights opponents FG% at the rim, number of attempts defended and fouls, and Team Defense weights the defense efficiency rating when the player is on the floor ↩
- Using the assumption that coaches are neither irrational nor without basketball knowledge ↩
- Not a high bar ↩

The post The Brandan Wright Trade? appeared first on Nylon Calculus.

]]>The post Freelance Friday: An Obituary for Small Sample Size Theater appeared first on Nylon Calculus.

]]>*Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Michael Murray. Michael is a college kid with dreams of the association while he should be doing his homework. You can follow him on Twitter, @michaelmurrays.*

At the time of writing this most teams have played 18 games or more and the closing credits of the Small Sample Size Theater of the early NBA season seem to be starting to roll. Knicks fans are losing patience with the Triangle, Cavaliers fans are joining in an ever louder chorus of “SEE!?”, and Lakers fans are staring at the horizon thinking about the fragility of human life. For some the fate of their team seems to have solidified for the season while for others their team seems to be just getting started.

We know teams “regress” to the mean, but how soon? To answer this I calculate cumulative moving averages, the average after each game is played, for three efficiency measures (TS%, 3pt%, and FT%) for all teams last season. Teams that have large changes in performance (think Westbrook and Durant coming back from injury) will take longer to converge to the true mean and very consistent teams will take less time. Since these cumulative moving averages will eventually converge to the final year average they can give us a feel for how long it takes before we can be reasonably confident the current average is close to what will be the year average.

The table below shows how far away averages at five points in the 13/14 season were from the final season average. 3 point shooting is a high variance event, explaining why it is generally farther away from the year-end mean than other the other metrics. By the 75th game, and earlier for some teams, there is a less than a 1% difference between the cumulative average and the year-end average. It’s not groundbreaking news to find out that the 75 game average is robust, at this point in the season there more eulogy than prophecy for teams.

Games Played | 15 | 30 | 45 | 60 | 75 |

TS% | 0.026854 | 0.019434 | 0.011867 | 0.007658 | 0.003153 |

3pt% | 0.064709 | 0.036409 | 0.022018 | 0.015084 | 0.007981 |

FT% | 0.029585 | 0.015261 | 0.010019 | 0.007744 | 0.003305 |

Breaking these out by metric and visualizing them gives better insight into when the variance in averages levels out. Looking first at TS% among eight teams for clarity sake:

The general trends reflect obvious stuff. Some teams start higher, some teams start lower, they all end up in about the middle. In general once some of the small sample size variance is ironed out, teams get better. Systems get refined, rookies and trades get integrated. Even though it still looks like a high variance mess, early season (Games Played = 15-20) TS% correlates with end of season TS% between 0.79 and 0.82. Increase that to 30 games played and the correlation jumps to 0.88. The correlation between cumulative average and year-end average looks like this:

Big jumps in correlation come early. Three times the correlation jumps by >.05 – after the 2nd, 3rd, and 11th games. The correlation crosses the 0.90 threshold after the 31st game, and the 0.95 mark after the 41st game. Other than that it’s a tale of diminishing marginal returns.

To see if these patterns are consistent I wanted to look at other measures of efficiency. Here’s FT%, which is a component of TS%:

This doesn’t look too different from the TS% graph. You’ve got the Small Sample Size Theatre on the left with lots of variance, a pretty snappy convergence to the mean and minor fluctuations otherwise. Because the y-axes are on different scales for the first graph and this one it appears that team FT% falls closer to the league average. However, there is a 14.4% difference between the league’s lowest TS% (76ers) and the highest (Heat), while there is an 18% difference in FT% between the Pistons and Trailblazers.

Everyone knows the three-pointer is a high variability shot—live by the three, die by the three. But 3pt% regresses to the mean just like any other metric. You can see that high variability visualized in the first 20 or so games as 3pt% fluctuates and resists the mean slightly longer than TS% and FT%. Other than that there does not seem to be too much new information here so I want to look at a correlation curve for season average.

Now this is different, there is obvious difference between this correlation curve and the one for TS%. The grain of salt to take with anything that is said about this is understanding the increase in the number of threes being attempted. It is difficult to say if the increase in threes by players who are not comfortable with the shot or the natural variance of the three is what is causing the shape of this curve. There is a big slump in the 6th-11th games which is probably just noise, if I expand the data beyond one season it probably disappears. The correlation of cumulative 3pt% to season average passes the 0.90 mark after the 44th game of last season and passes 0.95 after the 58th game.

What cumulative averages for 3pt% show more than anything is that we are still a ways from having a good robust sample size from which to make sweeping generalizations.. These efficiency metrics are baked into a lot of the things we use as big, one-stat-like metrics such as Ortg and Drtg. At the same time, you don’t really need to wait until current stats are nearly perfectly correlated with what their season average will be. The samples we have now for teams definitely are not “small”, but we’re not out of weeds yet.

The post Freelance Friday: An Obituary for Small Sample Size Theater appeared first on Nylon Calculus.

]]>The post Freelance Friday: Getting to the X’s and O’s — A Blueprint for the Use of Tracking Data appeared first on Nylon Calculus.

]]>*Freelance Friday is a project that lets us share our platform with the multitude of talented writers and basketball analysts who aren’t part of our regular staff of contributors. As part of that series we’re proud to present this guest post from Johannes Becker. Johannes is interested in basketball and statistics and is a PhD student in bioinformatics. You can follow him on twitter @SportsTribution and on his blog, SportsTribution.blogspot.ch. *

I’ve been writing this article for the last six months. It’s one of those things, where I go to bed and the writing only happens in my head. I have been troubled about the ways in which player tracking data is being used in analytics. There are specific reasons why this has troubled me, and specific ways that I think it can be remedied.

I will use basketball in the following, but most of the why as well as the how is freely translatable to all other team sports.

**Data analysis and Some Thoughts on the State of the Art**

There is a lot of stuff going on right now that is great — completely new analysis on point guard play, Austin Clement’s shot charts are mind blowing, and the APBR community is very lively. But there is also a lot of basketball analysis that leaves me shrugging my shoulders. This is usually not the fault of the person that is doing the analysis. I often find my own stuff not that telling. The problem for a lot of people that do this for fun, is available data and time.

If you are sitting safely, take three minutes and watch this ‘Open Court’ clip. Even though it is cringe-worthy (nobody ever used PIE!), there are still two important messages. The first is Charles Barkley saying (verbatim, it hurts too much to watch it again), “I guess the person that invented PIE could not draw up a play that gets me open for the final shot.” The second is Reggie Miller, who says that he wants a scout telling him information (like player tendencies) that none of these statistics can give him.

I would imagine that they are right. For a player, qualitative information (LaMarcus Aldridge likes to take one dribble to his right before shooting a turnaround) is more important than quantitative information (LMA shot x% this year on pull-up shots). But I think that both points of critique can actually be tackled by statistics, just not in its momentary publicly discussed approaches.

One of the best sources to learn something about the NBA right now, is @j_069 ‘s YouTube channel. The combination of set play sequences (I love the Warriors sliding doors play for the name alone) and great break-beats can tell you more about basketball than any regression model. It would be great if those videos were automatically quantifiable.

To say it differently. Statistics right now are great in explaining to us that corner threes are one of the best shots—high percentage mixed with three-points. But nobody ever mentions that the geometry of the court might make it complicated to get the ball there. People make fun of Rudy Gay and his inefficiency. But people ignore that he does not necessarily put himself in that position (okay, maybe he does), but that the coach/team allows/forces him to do what he does.

If you think of basketball plays, you have to grade them in terms of efficiency and complexity. An isolation play is generally simple. A pick and roll is maybe a bit more complicated. But often more efficient. This Spurs play is maybe as complicated as it gets — but it seems to net you a good corner three possibility. Even isolations and basic pick and rolls are not simply comparable, both in expected efficiency and in complexity. Good luck running an isolation play with three teammates on the floor who can’t shoot from the outside. And often a Spurs pick and roll is simply disguising some other play.

In my opinion, there is too much focus on player evaluation and not enough focus on basketball evaluation. Even though stats like Adjusted Plus-Minus spent a lot of time trying to eliminate this aspect, the worth of a player will be always highly context specific. However, a lot of players are more interchangeable than we think. A recent article here at Nylon Calculus estimated that it takes on average around 700 shots to get a 50% certainty if one player is a better three point shooter than the other one. I may be wrongly describing that result, but in my opinion it means that most NBA wings will become pretty decent three-point shooters when you just give them enough free corner threes (see: Green, Danny; Ariza, Trevor).

Data analysis always means information reduction. But our data space is of high dimension—(number of players playing in the NBA) x (number of 5 man permutations on a team) x (the 2 spatial dimensions of a basketball court) x (time). For reasons I mentioned earlier, a lot of studies focus on the first part (players in the league); adjusted stats try to account for the 5 man rotations; x-player rotation data focuses precisely on this part. Those studies can either survive without tracking data, or use preprocessed tracking data like the public SportVU data at NBA.com. But all these studies ignore time. In general, time seems to be the ugly duckling of sports analytics. Some studies highlight the influence off the shot clock, or look at the influence of playing time and efficiency. But no publicily available story adresses the fact that it takes time for a play to develop, to bend the defense according to your wishes.

What I would like analytics to be used for is a series of question that rank from, “What makes an isolation play efficient (besides the quality of the one-on-one guys)?” to, “How often does Golden State use the sliding doors? What is the expected outcome in points? Which way is the most effective to defend it?”

This in my opinion should be the beginning of the analysis. At the end we can start to ask questions like, “Which players are good personnel to play a specific play?”, the question that most analysis tries to solve at the moment, while ignoring the X and O’s. The approach I will try to lay out in the following will not yield a lot of direct results during the first months and I am aware that a lot of people (myself included) simply lack the resources. But in the end it will provide answers to more useful questions than “Is Kobe correctly ranked at number 25?”. It will answer questions about the way that the game is played.

**A Different Way to Approach Tracking Data**

In my opinion, the most common approaches to tracking data, right now, have a limited horizon. The reason is that they need to broadly ignore spatial and temporal interactions, which are what make team sports so fascinating. These approaches generally can discover things, but come with a bunch of caveats. Seth Partnow described it in a recent article, which I would summarize with ‘We know what is a good outcome for a team, but we have no idea of how to get there.’

In the following, I want to describe an approach that directly uses tracking data to get more into the how’s, what’s and which’s, like:

- How are plays designed that give you a high percentage shot?
- What plays are used effectively and how often by which team?
- Which plays are effective for shortened shot clocks?.

It is going to be far from a finished product. Instead, I will highlight techniques and possible pitfalls to give a broad blueprint, the “Becker Blueprint” [note: As I am not expecting to ever earn money with this, I will try to at least become famous ]. Once again, I am fully aware that most bloggers and journalists do not have the data and time available to use my proposals. However, I think that approaches like this one will be part of the next big step in sports data analysis. I am also aware that the following part can be sometimes confusing (my head is not the most sorted space in the world). If you have a question, feel free to ask and I will try to explain my brain .

**Cluster analysis of raw tracking data**

The general idea of the “Becker Blueprint” is the following: Instead of labeling each play individually—”pick-and-roll” or “isolation” or “something fancy that the Spurs are doing”—all plays are compared for similarity. Each pair of plays gets a distance value, which is small for plays in which the tracking data is similar and big for plays that look completely different. This leads to agglomerations of similar plays, which then can be detected by cluster analysis. A user can look at each of these clusters and give them a name that fits with more detail. This is advantageous as it doesn’t need a manual annotation for each play. Even more importantly, it can be much more precise in its groups of plays. For example, a Spurs pick-and-roll is not simply a pick-and-roll, it is often preparation for something that happens three seconds later.

This approach requires considerations in several aspects.

**Data storage**

To start the blueprint, it is indispensable to have access to both the tracking data and the raw video footage. Raw video footage seems to me to be sometimes undervalued in analysis, but I implore every analytics person that wants to make meaningful assumptions to at least watch some portion of their data on video.

Raw video footage is of course, in some regards, a storage obstacle, but I guess that those of us that have access to a high amount of tracking data can figure those obstacles out as well. In general, I would say every play needs a unique ID that links raw tracking data to raw video footage and available box score data^{1}. The box score part is important as it is user readable. You cannot watch every play, but you need to know how every play ended (turnover, rebound, corner three, etc.) for further analysis. To avoid bottlenecks, it is important that the raw tracking data is stored in a way that is quickly accessible for the further data processing.

**Distance metric**

To find a good distance metric is probably the most novel and tricky step of the whole blueprint. You want as a result of your metric that plays that are related are close in distance and plays that are unrelated are far away. I know, this sounds very obvious, but the problem is that in reality “related” can be a little bit subjective. An example:

- Top of the key pick-and-roll, while the other three players are positioned around the three-point line. The ball handler drives. Help defense forces the ball handler to kick out the ball for a corner three.
- Top of the key pick-and-roll, while the other three players are positioned around the three-point line. The ball handler drives. No help defense and the ball handler gets a layup.
- Top of the key pick-and-roll, two players are positioned around the three-point line and one is on the weak side low post. The ball handler drives. Help defense forces the ball handler to kick out the ball for a corner three.

In my opinion, you want to have #1 and #2 to be more closely related than #1 and #3, even though the outcome of #1 and #3 is more similar. As I said, there is not one obvious answer and the whole thing will need several tries until the user is convinced that the metric represents a good distance between plays.

My approach would be to start with a low number of manually selected plays that are well separated—different set plays, typical pick-and-roll and isolation situations, fast breaks, short shot clock situations, etc. Take maybe five to ten plays for each set, leading to around 100 plays. Add the same number of random plays, to see what happens with data that has no user bias.

I would define the distance between play A and B vaguely as the minimal distance between the ball and the offensive players of play A and play B integrated over time. It might make sense to include the defense, but I would estimate that it adds a lot of noise and slows the computation down, so it might be advantageous to look at defense separately. The task is basically to find the pair of players A1-5 and B1-5 so that the plays look the most similar. As an example, if in play A LeBron is the screen setter and in play B the ball handler, a minimal distance would not link the two LeBron’s. You basically have to minimize 120 different player combinations (luckily there is only one ball). I would weight the distance between two players by their distance from the ball—players that are far away from the ball should have a lower influence on the similarity score. Just like the defense, those players will be evaluated later on.

More complicated is the previously mentioned aspect of minimizing the distance between plays that end differently. Another example are plays where somewhere during the play a ball-handler stops the ball and this stop can take different amounts of time. My idea to solve this problem is based upon something used in biology, called Sequence alignment. My vague idea would be to find the minimal distance between the last eight seconds of two plays, while allowing for gaps, which then are penalized by a Gap penalty. So, if the last eight seconds of two plays are almost identical, this play would get a direct score. But if one play results in a layup and one in a kickout for a three-pointer – therefore taking a second longer—the plays would still be well aligned but there would be an additional gap penalty.

A big part in finding the minimal distance will come down to computational running time. As mentioned previously there are 120 combinations of players. The gaps lead to a very high number of theoretical possible combinations. And the underlying distance function will have several local minima, making it necessary to look at several possible combinations. It could be sufficient to minimize the distance between the ball position in play A and play B and then figure out the minimal combination of players.

**Cluster Analysis**

Once the distance metric is somewhat satisfying for the selected subset of plays, it is time to start looking at bigger chunks of data. A team has around 100 offensive plays per game, 82 games per season, so we are at close to 10,000 plays per team, 300,000 plays for the whole NBA. If we would measure the distance between N=10,000 plays, we would need to calculate around 50 million combinations^{2}. If the distance metric calculates one play comparison per second we are talking about a range of years.

One way to avoid this would be to recursively build a ‘comparison set’ consisting of less than 500 plays. This set would try to span the whole space of common plays with as few plays as possible. In addition, the user can implement ‘break functions’. If Play A is compared with the comparison set and we find it to be similar with one of the first plays in this set, we can automatically exclude other plays for which we then simply insert a maximum distance. For example, if I find a play to be similar to a side pick-and-roll, I can be sure that it is not similar to a post up play.

As with the data storage, I guess the people that have access to the complete player tracking data set, also have access to some pretty powerful computer clusters, so I’m sure they can figure these things out.

Whatever you do, you should get a distance vector for each play, with a vector length of either N (if you compare all plays with each other) or the length of your comparison set. You can now use this vector for cluster analysis. There are two main ways to do clustering: Supervised and unsupervised. In our case, we would have an either unsupervised or semi-supervised approach, as we want to remain open-minded about the possible outcomes. I use the term “semi-supervised”, because if we use a comparison set, we can at least partially control the outcome. But we always have to keep in mind that we do not know everything about basketball and should keep our results open for surprises.

These surprises are basically the advantage and disadvantage of unsupervised clustering. You can imagine supervised clustering (or classification) as having a fixed set of magnets that forces your plays into specific groups. Unsupervised clustering does not do this but instead tries to find already existing density clouds. This sounds easy, but the problem is that your resulting clouds are heavily depended on your used techniques.

In general, I have two opinions on what could work quite well: First, you should try to avoid anything like k-means clustering, as you have no idea how many clusters to expect. You will just get into a quagmire of statistical model criteria like AIC and BIC, which you strongly want to avoid^{3}. My weapon of choice at the moment is hierarchical clustering, for which I have a very personal love-hate relationship. The cool thing is that you can understandably represent a high dimensional problem in one figure. In the case of a comparison set where you cluster (N plays) x (M comparisons matrices), you will find subsets in the M dimensions for which you can directly add names (fast breaks or simple pick-and-rolls for example). You can relatively easily detect possible cutoffs for cluster numbers. The biggest problem with hierarchical clustering is that the end result can be highly distance metric and rule dependent. It can easily happen that you slightly change some of your clustering rules and the whole thing looks different [Note: On the other hand, this probably holds true for all unsupervised clustering algorithms].

This brings us to my second advice: You now have to find a second distance metric for your unsupervised clustering. In theory this could be done by standard approaches like euclidean distance. But in my opinion, the distance metric of your clustering approach should not treat all of the M dimensions the same. Instead, you should focus on those values for which at least one of the two plays is similar. For example, if you want to figure out the distance between play A and play B (two similar set plays), it does not matter if play A has a high distance to a fast break comparison and Play B has a very high distance. So every dimension m_{i} should be weighted by something like 1/(1+min(distance(A,m_{i}),distance(B,m_{i})).

**Combination of cluster analysis and feature extraction**

Okay, in theory, we are now at the position where we can say things like “The Warriors use the Elevator Doors set on 9% of plays.” This is the moment when we can get full circle to things that are already done nowadays, by using feature extraction. Feature extraction is basically a way to give the raw data a context. For example, if the play resulted in points or a turnover, those are very obvious features. Others can be more complex, ranging from a more precise description of the resulting shot (uncontested catch-and-shoot corner three, for example) to concepts like gravity (area that the defense spreads) or even information about the style of defense (hedging after pick and roll, for example). Especially the defensive stats will be interesting, as ultimate play outcomes are definitely a result of defensive styles (if you ICE the pick-and-roll, it will result in different outcomes than if the defender hedges).

As you can see, at this point the world is your oyster and you can go back to analysis styles that are more common right now.

**Conclusion**

I hope that this was more enlightening than confusing and that someone probably uses this stuff and, hopefully, turns it into a publicly accessible thing one day. It’s possible that teams are already doing similar analysis behind closed doors, we just don’t know about it. As I said, this is easily translatable to other sports (compare distance of player positions after a snap for American football, or the last eight seconds before a change of possession in hockey or soccer).

The post Freelance Friday: Getting to the X’s and O’s — A Blueprint for the Use of Tracking Data appeared first on Nylon Calculus.

]]>The post Fascinating Aspect of the Rajon Rondo Trade appeared first on Nylon Calculus.

]]>How will the Rajon Rondo trade affect Dallas’ offense? The short answer is we have very little real idea. On one hand his own scoring efficiency is laughably bad so far this season, with a true shooting percentage of 0.422. On the other side of the ledger, Rondo’s offensive value has never been in his scoring. With Boston’s contending teams, his usage rate was always below average. His role was far more to set up the Celtics more efficient scorers like the big three or to allow players completely dependant on being set up for good shots (such as Brandon Bass or even James Posey) to get those looks.

Now, in Dallas without the burden of being “the man” any longer, perhaps that is a role to which he’ll be returned. Yet it’s a role that, from an analytics standpoint, we know very little about.

Valuing playmaking is controversial, in part because there isn’t really an agreed upon description of the word. Certainly, the “assist” is an imperfect and at times even corrupt unit of measurement. That said, we do have some more insight on both the value of playmaking at the team level and a better way to describe individual playmaking through a few years of SportVU data.

On the team level, assisted shots are significantly more valuable than unassisted shots. Last season, the difference was about .3 points per shot, league-wide. How much of the credit for this goes to the passer compared to the shooter compared to the overall offensive system? Very much a good question.

Similarly, with the SportVU data we can look at individual players’ share of this playmaking. Part of my True Usage statistic is tracking what I’ve termed “Assist Usage,” or the proportion of plays where a guy makes a pass leading to a scoring attempt.

So what does this mean for Rondo to the Mavs? It will be a fascinating test — Dallas was roughly league average both in terms of proportion of assisted shot attempts and in their efficiency on those shots at the time of the trade. However, their overall efficiency was so high in large part because of their ability to make unassisted shots. They have nearly a 47% Effective Field Goal Percentage on those attempts^{1}. So despite their whirring, fizzing machine of an offense, playmaking has been less important to their success than might be expected. Still, like every other team in the league, they shoot much better on assisted shots.

Which brings us to Rondo. So far this season, he’s the only player with an Assist Usage north of 30%^{2}

Aside from J.J. Barea’s spot duty, no Mavericks have a particularly high rate. The departed Jameer Nelson was next at 17.3%, while Monta Ellis, often said to be Dallas’ point guard in all but name, is only setting up 14.4% of teammates shots^{3}.

So, will Rondo increase Dallas’s overall “passiness” and will that be enough to offset the downgrade in floor spacing and shooting from Nelson to Rondo? This will be a fascinating experiment on that front. While there is some understanding of how players’ roles may change and shift in terms of traditional (shooting) usage when new talent is added or subtracted, very little at all is known about how dropping a player of Rondo’s playmaking skill into the mix will alter that aspect of the offense. Part of the difficulty in predicting the effect is while their is a hard and inviolable limit to a team’s total shooting usage (there is only one ball, and only one player can shoot each trip), the same does not hold for creating assist chances — there is a wide spread between the most (Atlanta at 61.3% of shots potentially assisted) and least (Toronto at 45.7%) prolific teams in terms of creating for others. How far, if at all, Rondo moves the Mavericks towards the upper end of that range will be something to keep an eye on for the rest of the season.

The post Fascinating Aspect of the Rajon Rondo Trade appeared first on Nylon Calculus.

]]>The post The r-squared Podcast: Episode 13 with Neil Paine appeared first on Nylon Calculus.

]]>This episode of The r-squared Podcast features a conversation with Neil Paine, a staff writer at FiveThirtyEight. Neil takes us through his personal journey in sports analytics, some of his analytic and journalistic processes and his thoughts on various plus-minus metrics. We also touch on the Rajon Rondo trade, which was announced about an hour before we recorded, and some of Neil’s preseason projections for the Warriors and the Cavaliers. You can find Neil’s sportswriting and analysis at FiveThirtyEight and follow him on Twitter, @Neil_Paine.

The post The r-squared Podcast: Episode 13 with Neil Paine appeared first on Nylon Calculus.

]]>The post Glossary: Basic Shooting Adjustments appeared first on Nylon Calculus.

]]>One of our missions here at Nylon Calculus is to help make basketball analytics accessible to anyone with interest. A big part of that mission is building a comprehensive and reliable Glossary. The plan is to make this glossary different from some of the others around the internet in two ways. The first is that it will be top-to-bottom comprehensive and sequential. The thing that is often lost in discussions about basketball analytics is that each new statistic or technique is usually an adjustment to a previous model, trying to account for some hole or something that isn’t measured well in existing statistics. All basketball statistics have a lineage of mathematical models and we want to make sure that anyone with the time and interest can peruse the entire family tree. The second thing that I believe will set our glossary apart is that we want to do more than define the statistics. For each we would also like to explain a little about what it says and what it doesn’t say—what purpose it serves in our analytic discussion.

We’re just getting started with our Glossary and there’s a lot more to build. We’ve already looked at The Basic Box Score, Basic Shooting Statistics and Basic Box Score Adjustments. Here we will begin looking at how basic shooting statistics are adjusted to provide additional detail.

Simple field goal percentage is nothing more than a ratio of shots made to shots missed. It would be a perfect representation of shooting efficiency except that not all shots are worth the same number of points. Three-pointers are made at a lower rate than shots around the basket because of the added distance but, in the aggregate, they can provide a similar value. For example, making six out of ten layups provides the same 12 points as shooting four out of ten on three-pointers. Players who shoot a lot of three-pointers often have a lower field goal percentage than players who shoot mostly around the basket, but the result is a comparable level of offensive efficiency.

Effective field goal percentage is meant to remedy this problem by accounting for the extra point earned by a three-point basket. It does this by counting three-pointers as one-and-a-half made baskets in the same ratio of makes to misses. The one-and-a-half piece comes because three points is literally one-and-a-half times two points.

While effective field goal percentage gives a more complete picture of how much offense a player is generating when they shoot, where that offense is coming from is not immediately apparent. You can have two players with identical effective field goal percentages that are achieved in different ways—around the basket or from the perimeter. Pairing a player’s effective field goal percentage with other statistics that describe their offensive role can really help reveal what they are doing on offense and how effective they are at it.

True shooting percentage takes things one step further by incorporating free throws made and attempted as well. The formula here boils down to points scored divided by true shot attempts (shots and trips to the free throw line) and then is divided in half to create a shooting percentage.

True shooting percentage gives you an idea of how productive a player or team is on all possessions where they actually attempt a shot or draw a shooting foul, essentially all possessions that don’t end in turnovers. Like effective field goal percentage, one caveat is that it is not immediately apparent where the offensive efficiency is coming from. A great jump shooter can have the same true shooting percentage as a big man who never shoots anything but layups and dunks. Pairing true shooting percentage with other descriptive statistics increases it’s explanatory power.

The last common adjustment to shooting statistics is to look at some of the categories we’ve already discussed, but by location. Several statistical websites now feature shot charts and tables that let you see a player’s field goals made, attempted, field goal percentage and effective percentage separated by location. This can come in the form of an exact distance from the basket, bins of distance (0-5 feet, 5-10 feet, etc.) or in areas (Restricted Area, Mid-Range, Corner 3, etc.). Being able to parse shooting statistics by location helps inform exactly what a player’s offensive role is or what a team’s offensive structure looks like and adds a lot of context to standalone shooting statistics.

The post Glossary: Basic Shooting Adjustments appeared first on Nylon Calculus.

]]>The post Why Shots Are Made (Or Missed) In The NBA appeared first on Nylon Calculus.

]]>At the top of the NBA season, the always fantastic stats side of NBA.com began publishing individual logs for every shot and rebound in the NBA, based on the data gleaned from SportVU, the camera system that tracks the ball and each player’s movement on the court. These logs exist not only for the current 2014-2015 season, but the 2013-2014 season as well. Each log records who took the shot, where the shot took place, how far into the shot clock the shot was taken, how many dribbles were taken by the shooter prior to shooting, how long they had been in possession of the ball prior to shooting, and even how far away the closest defender was to the shooter. In their raw form, 202,690 shot logs exist for the 2013-2014 NBA season, and 55,542 for the 2014-2015 season through December 11th. These logs pertain to field goals only.

The quantity of these shot logs begs for analysis, and many have already answered the call. A few articles have already been written here at Nylon Calculus, and Justin Willard’s awesome stuff at Analyticsgame.com also comes to mind. The work of understanding, however, is never over. Why are shots made and missed in the NBA? What are the most important indicators of shot success?

In an attempt to shed further light on these questions, I took to modeling a given shot’s result and its expected points in the 2013-2014 season. By observing what statistical models deem valuable as input, we can determine and quantify exactly which aspects of a shot’s context are important, and which aspects are the most important.

First, a disclaimer: The shot logs the NBA has provided are far from perfect. In its unfiltered form, there are some 43-foot two-pointers, some 203,569 point shots, and some -23 second touch times. For the purposes of analysis, I simply omitted records that didn’t adhere to common sense. My filtering criteria included: ‘Touch Time’>=0, 0<=’Shot Clock'<=24, Points<4, ‘Points Type’=2 and ‘Shot Dist'<23.75, ‘Points Type’=3 and ‘Shot Dist’>22, ‘Cdef Dis'<=12.261 and ‘Shot Dist'<39.37. These filtering rules make the data look a little more appropriate and remove some crazy outliers like full court shots (who cares), but there could still be some hidden errors and outliers that I missed.

Second, if you’re going to stat, always try to stat responsibly. Several of the variables on the shot logs suffer from multicollinearity issues. In layman’s terms, multicollinearity is when the variables used to predict a target vary strongly with each other. This is an issue because strongly correlated predictors will lead to both unstable models (models that may change significantly every time they are calculated/run), and double-counting the information contained in the correlated predictors. In the shot logs for example, the dribbles and touch time variable are extremely correlated (0.927) with each other. This correlation is somewhat obvious: the more you dribble, the longer you have the ball. Including both of these variables in a model predicting shot success or expected points would double-count this information, so in lieu of principal component analysis, which is less interpretable, I simply didn’t use the touch time variables. I chose dribbles over touch time because I think we should be more interested in shots that occur after no dribbles than shots that occur at the extremes of touch time.

The other multicollinearity issue in the shot logs is the relationship between the shot distance and the closest defender distance. As the shot is further away from the hoop, the closest defender is also likely to be further away. This collinear relationship isn’t nearly as extreme as dribbles and touch time, but in order to keep both shot distance and closest defender distance, it needed to be addressed. To remedy the multicollinearity, I simply divided closest defender distance by the shot distance to create a “proportion of openness” statistic. This metric captures how far away the closest defender was to the shot, not in raw feet, but in relation to how far away the shot was from the hoop. The proportion of openness has a correlation coefficient with shot distance of -0.345, in comparison to the the correlation between closest defender distance and shot distance, which is 0.535. The absolute value of 0.345 for the correlation coefficient between the two is concerning is still concerning (as opposed to the others, which were all 0.2 or lower), but the interpretability of the variables remains, which is desirable.

Third, I normalized all the predictor variables using z-score transformation to stabilize their variance. I won’t get into the gory details here, but this essentially means that variables with large values, ranges, and variance won’t dominate the smaller variables in modeling, variables which could have just as much predictive power as the larger-values variables.

Now to the fun stuff. Using only four variables; shot clock, dribbles, shot distance, and proportion of openness; I ran several models of differing techniques on the entirety of the 2013-2014 season’s shot logs to predict whether a field goal will be made and the expected points of a given field goal attempt.

The results of one such model are below. This model is a logistic regression model that predicts field goal success. Logistic regression is a modeling method that can predict a binary target (like whether or not a field goal is made), using both categorical and numerical predictors.

The most relevant piece of information from this table is the Exp(B) statistic, which represents the likelihood change in a field goal being made per one unit increase of each of the predictors, all else being equal. Since these predictors have all been transformed using z-score transformation, the Exp(B) numbers can be compared to each other evenly.

Translated out of nerd-speak, this model states that:

- Being open (Exp(B)= 1.39) is the most important aspect of shot success (but I’ll comment further on this momentarily), but it is followed closely by shot distance (Exp(B)=0.778). Shot clock time and dribbling are on a tier level below in importance.
- A shot that is open enough so that the closest defender distance is further away than the actual shot distance, is about 34.82% more likely to go in than one that is tightly guarded, all else being equal.
- For every foot of increased distance from the hoop, a shot is 3.09% less likely to go in, all else being equal. For every 8.74 feet (shot distance’s standard deviation), a shot is 28.53% less likely to go in, all else being equal.
- For every second off the shot clock, a shot is 2.10% less likely to go in, all else being equal. For every 5.79 seconds, (shot clock’s standard deviation), a shot is 12.79% less likely to go in, all else being equal.
- For every dribble a shooter takes prior to shooting, a shot is 2.56% less likely to go in, all else being equal. For every 3.38 dribbles, (dribbles’ standard deviation), a shot is 8.92% less likely to go in, all else being equal.

When applied to the shot logs of the 2014-2015 season through December 11th, this logistic regression model had an accuracy of 61.33%. While this number is surely not earth-shattering, especially when considering that a model that picked only misses would yield an accuracy of just about 54%, it’s important to remember the model is built off of only four independent variables, and in no way takes into account players, their play styles, injuries, or the teams playing.

Another model I trained on the 2013-2014 shot-log data was a standard classification and regression tree. Classification and regression trees are very interpretable, because their results are visuals that can easily be understood without any background in math, statistics, science or basketball. To gloss over the nuts and bolts, decision trees work by finding splits among the records in a dataset that maximize the difference of a target variable between the records on each “side” of the division. For the model visualized below, points was the target variable. In each decision tree node (illustrated by the boxes), you can see the expected points of a shot (average points in those records) that meet the criteria of that node. For the reader’s benefit, I went through and changed the results to reflect the values of the original variables, rather than the transformed values.

Like the previous logistic regression model that predicted field goals made, a decision tree predicting expected points also values the predictors as important in the same order: proportion of openness, shot distance, shot clock time, and dribbles. The astute may have noticed that there were several splits in the decision tree marking proportion of openness records above 100%. These records represent the thousands and thousands of shots last year where the defender was further away from the shooter than the actual shot was from the hoop, so this value is valid. This model also pretty much figures out the length of the three-point line by itself, with two early shot distance splits at 22.05 feet (the corner three is 22 feet even) and 23.50 feet (the normal three is 23.75 feet). This results is reassuring to see, because it is evidence that the model is reflective of the real world, and not just fitting to noise in the data, as the decision tree technique has been known to do. Following the decision tree from node to node, it is interesting to see how simple criteria can dramatically change the expected points of a shot. Among shots that are less than 77.67% open (this accounts for 84.67% of last season’s shots), dribbling once before shooting results in an expected points value loss of 0.18 points, from 1.03 points to 0.85 points.

While the decision tree above predicted expected points, I also generated a model that predicted field goals made. Like the linear regression model, it will beat a predict-all-misses model, and can predict field goals in the current NBA season with an accuracy of 61.72%.

At this point, the biggest problem with learning from the shot logs is the complicated relationship between shot distance and closest defender distance, and what this means for how “open” a player is. The simple metric I used in this short study is flawed, but I found different statistics that captured the concept of “openness” to have very similar effects and importance. The problem is more than just the multicollinearity between shot distance and the closest defender, and how a larger value of one will lead to a larger value of the other. A myriad of questions exists here: How do we define openness in a statistic that is relatively uncorrelated with shot distance? Is a shot taken one foot away from the hoop with a defender two feet away as open as a shot taken 19 feet away from the hoop with a defender 2 feet away? Isn’t having the closest defender 6 feet away from the shot pretty much the same as having him 12 feet away? How do we deal with outliers within this statistic, once defined? The proportion of openness metric I used was significantly influenced by its outliers. Principal component analysis is an obvious answer to the complicated relationship between shot distance and “openness”, but once an analysis like that is performed, interpretability is lost, and discerning the impact of each independent variable becomes difficult.

There is plenty of work to be done with regards to learning more about shooting, and the public availability of the NBA’s shot logs will only expedite the process. It will be interesting to see if, in the coming weeks and months, someone can shed some insight on the complicated relationship between shot distance and being “open”. AS it stands now, how open a shooter is seems to play the most important role in predicting a field goal’s result, followed closely by shot distance. The amount of time left on the shot clock, and the amount of dribbles a shooter took prior to shooting also play a statistically significant role.

The post Why Shots Are Made (Or Missed) In The NBA appeared first on Nylon Calculus.

]]>The post The Truth Behind the Ball-Stopper: A Look at the NBA’s Most Productive Players appeared first on Nylon Calculus.

]]>“Ball-stopping” has become a bit of a swear word among certain evaluators of NBA talent. “Yeah, he’s got game, but he’s a bit of a ball-stopper” is a common way for people to subtly diss the skills of players who happen to employ a style of play that requires the ball in their hands.

It’s a product of our era – ball movement is currently king. The last three coaches to win championships – Gregg Popovich, Erik Spoelstra and Rick Carisle – are all praised for their offensive systems that emphasize unselfishness and frequent passing. If any player dares to go against that paradigm, they’re headed for a sure conviction in the court of public opinion.

Even some of our biggest stars are not immune to this phenomenon. James Harden, Kobe Bryant, Carmelo Anthony – all are enormously productive scorers and playmakers, but they’re also all frequently criticized for their styles of play. Once the ball moves to them, according to conventional wisdom, it inevitably sticks. Sticking is bad.

The great thing about our modern era of fandom, though, is we now have the data to go back and check these little bits of convention for factual accuracy. Is it true that the guys we’ve long branded as “ball-stoppers” deserve that label? And if certain players are using up huge chunks of their teams’ possessions, is that necessarily a bad thing?

Thanks to NBA.com’s SportVU player tracking stats, we now have information on precisely how long each player controls the ball. The league’s “time of possession” stat boils it down to a single number – minutes per game, simple as that.

“Time of possession” is a funny thing. In other sports, like football, it’s undoubtedly a positive stat for your team – if you control the ball ad infinitum and don’t let the other team play offense, you’re guaranteed to come out ahead. Basketball’s different, though. Because of the existence of a shot clock, your time is a finite resource on the hardwood – waste it, and your team will suffer.

So that leads us to a fascinating question – who wastes their time in the NBA, and who makes the best of it? I decided to spend my morning breaking down the numbers. Below is an analysis of 60 players – I took the league’s top 50 in touches per game, then made the executive decision to add Al Horford, Anthony Davis, DeMarcus Cousins, Dirk Nowitzki, Dwight Howard, Goran Dragic, Kevin Durant, Kobe Bryant, Rudy Gay and Tim Duncan because they’re awesome, and analyzed their productivity on a “per minute of possession” basis. Let’s explore the findings.

First of all, here are the leaders in the basic stat. Without further ado, here are your top 10 players in the NBA this season in time of possession. The stat is average minutes per game, in decimal form (i.e., 8.6 refers to 8 minutes and 36 seconds):

Unsurprisingly, all 10 players atop this list are point guards. Since the pointmen are generally the ones bringing the ball up the floor and initiating many pick-and-rolls as the ball-handler, this is pretty obvious stuff. But once you scroll down to find the non-point guards with “ball-stopping” tendencies, you do indeed find Harden first, at No. 20 overall. LeBron James ranks 22nd; Kobe is No. 31 and surprisingly, despite his reputation, Carmelo is all the way down at No. 41 out of the 60 players studied. Perhaps his reputation is an unjust one after all.

Now that we’ve established the statistic we’re working with, we can begin to do fun things with it. Let’s ask some questions about players’ efficiency on a per-minute-holding-the-ball basis, shall we?

**Who scores the most per minute of possession?**

Every team wants to have an “easy baskets guy” – a player you can simply drop the ball off to in the post and have him get you a quick bucket. A scorer you can trust, whether there’s two seconds on the shot clock or 20. Here’s a look at the 10 guys who put up the most points in the least time:

Moral: Anthony Davis is friggin’ amazing, people. The kid manages to put up ridiculous scoring numbers despite playing in an offense that’s often dominated by Jrue Holiday and Tyreke Evans – yes, that we knew from the eye test. But now we know just *how* ridiculous AD can be! One point for every four seconds he touches the ball – that’s pretty astounding. Dirk is obviously great too. Note the alarming dropoff between the top two guys on this list and everyone else.

The “fast scorers” list is dominated by big men, obviously. You have to scroll pretty far down to find the first guard – Monta Ellis is 20th out of the 60 guys in points per minute, averaging 5.76.

Carmelo, the much-maligned ball-stopper, ranks 15th on this list, by the way. Not so terribly shabby.

**Who scores the least per minute of possession?**

Sorry, I had to do it. Here’s the bottom 10:

Kind of hilarious. Rondo and Rubio are both known for being great passing point guards and absolutely terrible scorers – and we can see here from the numbers that that reputation is not incorrect. They are indeed the two least productive scoring players out of the 60.

Calderon’s inclusion at No. 6 on this list is somewhat surprising, as he’s a capable catch-and-shoot guy in the right offense, but it’s not working for him too well this season, apparently. John Wall at No. 10 – also an intriguing one. Wall is a fantastic floor leader, but he’s only averaging one basket for every 60 seconds he controls the ball. Makes you wonder about his place in that Washington offense.

**Who assists the most baskets per minute of possession?**

Hint: Not point guards. Because the men at the point “waste” so many minutes bringing the ball up the floor and initiating plays, they’re not going to have as many assists on a per-minute basis. The guys who get quick assists are big men who know how to pass.

Noah is a fitting No. 1. Along with Gasol and Duncan, also top-6 guys here, he’s widely considered one of the best passing bigs in the game – he’s carried the Bulls’ offense for years ever since Derrick Rose’s injuries began necessitating his role as “point center.” He can make plays for teammates out of the high or low post, and he’s a high-energy player who gets the job done quickly.

A couple of interesting names on this list – Josh Smith in third place is a little weird, as he’s not usually known as an amazing passer. But what’s even weirder is that he’s fourth in most turnovers per minute at the same time. Smith’s a high-impact passer, apparently. When he moves the ball, something’s going to happen.

The other notable name here is Rondo – the only point guard to crack the top 10. Maybe he’s not as ball-dominant as we think.

**Who makes the most 3-pointers per minute of possession?**

Again – not guards, at least not for the most part. To top this list, you need to be a skilled catch-and-shoot player who can get the ball in the flow of the offense and get your shot quickly.

Yup, Dirk and Love. Exactly what I meant.

You’ll notice that just like the points list above, there are two top dogs followed by a huge dropoff. Nowitzki, impressively, is one of the two prohibitive leaders in both points and 3-pointers per minute of possession. The fact that the Mavs’ living legend manages to be a dominant scorer despite only touching the ball for 1.3 minutes a night … it’s quite something, I must say.

Patrick Beverley is a fun inclusion here, and deserved – the man’s having a career year shooting the trey, and it shows. Along with Steph Curry, he’s one of only two guards on the list. (Though, not pictured: Harden, Calderon and Damian Lillard are 11-12-13.)

By the way, there are eight guys tied for dead last on this list with zero 3-pointers. Rubio is ninth to last with 0.029. He makes a trey once every 2,000 seconds, give or take.

**Who get to the free throw line the most per minute of possession?**

Field goals don’t tell the whole picture, right? The other way to be productive with the ball in your hands is to initiate contact and get to the line. Here’s who does it the most:

Good for you, Dwight – you’re No. 1. Except you’re also shooting a career low of 46.3 percent from the line this season, so those 5-plus free throw attempts are only translating to about 2.4 points for the Rockets. Dirk actually scores more points at the line than Dwight with half the attempts. If you redid this list using free throws made rather than attempted, Dwight would be 10th instead of first.

As for the bottom of this list, it’s pretty unsurprising given the other data here. The men ranked 60th through 58th are Calderon, Rondo and Rubio.

This is just the beginning of what we can learn from looking at “per minute of possession” data. It’s valuable stuff – while metrics like PER are a good overall measure of player productivity, they sometimes make it hard to parse whether a player is really skilled or if he’s just getting a lot of opportunities. This way of measuring players bypasses that issue, instead asking the question of how much a player does with each opportunity. It’s enlightening.

There are plenty more questions we can ask. Be creative – who misses the most shots? Who turns the ball over the most? Who has the ball in his hands for the highest percentage of his total time on the floor? Possession data holds the key to all of these answers and more. Try poking around the numbers yourself if you so desire, or hit me up on Twitter with any follow-up questions about the data revealed here. There are a lot of conversations to be had about these eye-opening numbers – let’s get started.

The post The Truth Behind the Ball-Stopper: A Look at the NBA’s Most Productive Players appeared first on Nylon Calculus.

]]>The post First Quarter Player Tracking Plus-Minus Scores appeared first on Nylon Calculus.

]]>The first quarter or so of the season is over so I decided it was enough data to at least run my Player Tracking Plus-Minus metric, which is a statistical plus minus using SportVU data^{1}. You can read more about its development, on the offensive side here and original version here. Because there is only one full year of SportVU data that is public, it is best to consider this project a beta test to see how the new data fits with traditional box score data and what we know about basketball. Though, it is worth noting, maybe, that my win predictions for this season using PT-PM was leading the APBR prediction contest through the first quarter of the season.

Of course, it’s also a new way to look at player performance, and everyone who can’t see the Russian financial meltdown from their back porch seems to like player rankings, I thought I would start out with a top twenty lists at this point[As of Dec 15, 2014].

I used a fairly arbitrary 350 minutes of playing time minimum. Most of the names on the list, topped by Steph Curry and Anthony Davis are the ones anyone following the NBA this year would expect. Then there’s Darrell Arthur, Ersan Illyasova, and Rudy Gobert, who safe to say, are not among the top twenty players in the league. That mostly speaks to the noisy nature of the data at this point. Curry, for example, has “held” opponents to 36% on shots he has contested, which is not a trend I would expect to continue.

Of course, one name is very notably absent, LeBron James. James comes in 31st, his offense is elite but not transcendent, while his defense is merely average. (Kevin Durant and Russell Westbrook didn’t have enough minutes played coming off of the injury). Below are the scores for all of the Cleveland Cavaliers with over 350 minutes

We can see that James has struggled with extending and retaining offensive possessions, with his turn overs up and offensive rebounds down, and while his scoring is well above average it is down from his previous seasons. Meanwhile Kyrie Irving has pretty clearly been the team’s second best player.

In reality, the first quarter check in is really a first opportunity to assess how well the metric is performing out of sample. For example, I plan to run a some comparisons with other metrics, as wells look at the performance on both ends of the court on a team level.

- The independent variable was the RAPM score via GotBuckets for 2013-2015 for all players with more than 500 minutes in a weighted least squares regression using an AIC stepwise filter and cross validated for stability. ↩

The post First Quarter Player Tracking Plus-Minus Scores appeared first on Nylon Calculus.

]]>The post Nylon Calculus Power Rankings: Golden State Warriors Are Destroying the World appeared first on Nylon Calculus.

]]>Starting this week, Nylon Calculus will be doing weekly power rankings, measuring how your favorite teams are doing as of the last week of the season. We know that everyone loves power rankings, and we’re determined to give the people what they want.

The idea of power rankings in general is that they represent a subjective “basketball-expert opinion” of where the teams are after each week of action, but as with everything here at Nylon Calculus, we’re seeking to bring just a *bitt *more rigor to the idea of ranking teams based on recent performance and trends. That said, the rankings are still mildly silly and totally imperfect, which I think makes them all the more fun.

Here’s the idea: basketball statisticians typically rank teams by their Net Rating, or, by how many points they typically beat opponents per 100 possessions. They do this for a lot of reasons, but largely because point differential is actually more highly correlated with long-term success in a season than win%. That, however, doesn’t make for a fun power ranking, in part because it doesn’t account for what teams have done recently.

So, we’re ranking teams by their Net Rating, and then adjusting that ranking based on the rate at which that rating has changed over the last week of basketball action. In short: we’re taking a team’s statistical record and adjusting it by how teams have performed recently. So, it’s like everyone’s power rankings, except that we use spreadsheets for ours^{1}

We’ll work out a format for the rankings as time goes on, but for now: we’ll publish a chart with the rankings, each team’s Net Rating (found via the Nylon Calculus Team Efficiency by Possession stats) and the rate of change of that Net Rating. Afterwords, I’ll discuss the top five teams, bottom five teams, and the notable teams in the middle.

The first week of power rankings is quite a joy:

**Warriors**: They’re the best team in the league, it’s not really close, and they’ve gotten better recently. That they’re so dependent on Curry, Thompson and Bogut might be a concern if someone were to get hurt, but otherwise, they’re a freight train that should terrify anyone coming near them, and they’re after the Bulls’ win record.**Raptors**: They might not have the sexiest win/loss record of the group, but they’re winning by a lot and not losing by much. A minor slide of late without DeRozan shouldn’t detract from the fact that this is a scary group.**Grizzlies**: The Grizz are just eating people alive behind MVP candidate Marc Gasol, and questions of whether or not these guys are legit are long gone: the Grizzlies will absolutely grit and grind you into dirt if you get in their way.**Spurs**: Don’t let the record fool you, this is a top 3 defense and a top 10 offense, and that’s without Tiago Splitter for most of the season so far and Patty Mills for all of it. They’ve dropped a few to some bad teams lately, but not by a lot, and they’re still largely killing teams. It’s tempting to start worrying about this team, but don’t. They’ll be fine.**TrailBlazers**: This team dropped a couple after a hot-as-hell 9-game winning streak, but this is a killer team with an elite offense and defense right now, and they’re making a serious big for contention as the season rolls on.

**76ers**: Good lord, not only is this team maybe the worst of all time…but they’ve gotten worse lately, after winning two. The Hinkie tankathon continues.**Timberwolves**: The Wolves have looked good of late, and Wiggins in particular has been putting on quite a show…but this is just a bad team made atrocious by the lack of Rubio or Pekovic.**Hornets**: This team hasn’t gotten any worse lately…but it’s time to admit that this is just a really, really bad team, which is such a bummer. They could have been so much fun. It’s probably a good thing that they’re trying to move Lance Stephenson, who’s been an embarrassment.**Lakers**: They beat the Spurs! Kobe passed MJ for 3rd all time in scoring! They’re 7-7 with Nick Young back in the lineup! Yes! Well…sure. They’re still a really bad basketball team.**Jazz**: A fun team, with encouraging play on both sides of the ball that’s been getting better lately who still can’t really be competitive. At least their time is coming.

**Mavericks**: It’s yet to be seen if this team is legit, or if they’re just really good at beating up on bad teams. For now, we’ll lean towards the stats, but the possibility always remains that they’re a really beefed up version of last year’s Timberwolves, whose Net Rating was inflated all season by beatdowns of worse teams.**Rockets**: This team has basically been weathering its worst case scenario with Dwight Howard out and Trevor Ariza shooting so poorly, and yet they appear to be a top-level team with Harden going MVP-level insane. Watch for this team when everyone gets healthy.**Thunder:**It’s hard for the stats to capture how much better the Thunder is than the rest of the league right now with Russ and KD back, but rest assured, they’re coming. They want that title, and they might just get it.**Clippers:**This team has been getting better, but recent losses to the Suns and Grizzlies are concerning. This team might still have a ways to go.**Wizards**: John Wall is a madman, and this team is awesome, but the degree to which they’ve struggled against sub-par opponents is worrying. They could be title contenders, and they could be a 2nd round out. We’ll have to wait it out with this group.**Nets**: Despite being generally really bad, the Nets have had a couple really good wins, and that’s enough to bump them big in the regression model. It might be a blip on the map, or it could be something bigger.**Cavaliers**: Like the Clippers, the improvement is palpable, but a couple straight losses putting a dent in the momentum has to have more than a few Cavs fans sweating.**Knicks**: Oof.

- More specifically, we run a linear regression using their average over season other than the last week as the y-intercept, the whole season average as the last point, and the last week as every point in between. The slope of that regression line represents how the team has improved or crashed over the last week or so, and we weight the rankings using that information. ↩

The post Nylon Calculus Power Rankings: Golden State Warriors Are Destroying the World appeared first on Nylon Calculus.

]]>