capstone project


Author: Hoa Quach
Submitted: June 10, 2016
Institution: California State University, East Bay
Department: Business & Economics
Co-Program Director & Capstone Thesis Advisor: Dr. Chongqi Wu

The Success of the NBA Draft Order

Predicting Which 2015 NBA Draft Players Will Become All-Stars

Abstract

This paper will examine the data from 1952-2016 National Basketball Association (N.B.A.) players’ first-year season performance, to generate a simple – yet effective – prediction model the likelihood of first-year players becoming future NBA All-Stars. In order to complete this project, we will be examining external data sources from various basketball statistical websites and focusing on discovering patterns.

By doing so, we will be applying one of the most well-known and understood classification algorithms in statistical machine learning, Bootstrap Forests (Random Forests). One of the advantages of decision tree learners is that it is powerful, which utilizes a “tree-like structure to model the relationships among the features and the potential outcomes” (Lantz, 2015)

While researching the reasoning behind all basketball-related statistics we will be interpreting the data set utilizing various programming and statistical analysis softwares such as, Microsoft Excel, MySQL, R, and SAS JMP.

The (NBA) All-Star Game is an annual exhibition basketball game played between the association’s Western Conference and the Eastern Conference during the middle of the season. Only twelve players – five starters and seven reserves - are selected from a pool of 60 players from each conference. The ballots are listed by a panel of sportswriters and broadcasters. The game’s starters are voted by fans and reserves are voted by coaches from each conference

History of NBA All-Star Game

The (NBA) All-Star Game is an annual exhibition basketball game played between the association’s Western Conference and the Eastern Conference during the middle of the season. Only twelve players – five starters and seven reserves - are selected from a pool of 60 players from each conference. The ballots are listed by a panel of sportswriters and broadcasters. The game’s starters are voted by fans and reserves are voted by coaches from each conference.

History and Evolution of the NBA Draft Lottery

From 1966 through 1995, the NBA Board of Governors voted, adopted, and modified the lottery system on numerous occasions among the non-playoff teams to determine their order of selection in the first round of the NBA Draft. Non-playoff teams will select in inverse order of their regular season won lost records. The remaining non-playoff teams will select in inverse order of their regular season won-lost records (NBA, 2016)

In the new lottery system, teams with the worst won-lost records at the end of the regular season will receive increased chances of winning one of the top three picks in the draft while decreasing the lottery chances of the teams with the best records. The team with the worst record in the league is assured of picking no worse than fourth, the team with the second worst record no worse than fifth and so on (NBA, 2016)

NBA Draft Lottery Odds

On the night of the draft lottery, fourteen ping-pong balls numbered 1 to 14 are placed in a drum. There are 1,000 possible winning combinations in which teams are assigned to lottery teams based on their won-lost records during the regular season. The team with the worst record drawing the first pick will receive 16.7 percent to 25 percent, while decreasing the chances of the team with the best record among lottery teams from 1.5 percent to 0.5 percent. For example, in the 2015 NBA Draft lottery, the Minnesota Timberwolves had twenty-five percent chance to win the number one pick after finishing the regular season with the worst record at 16-66. The New York Knicks had the second-best chance at 19.9 percent, the Philadelphia Sixers were third at 15.6 percent; and Oklahoma City Thunder, the best team in the lottery had the least chance of receiving the number one pick with only five combinations out of 1,000 – 0.05 percent.

Four balls are drawn out of 14 – without regard to their order of selection – to determine a four-digit combination. The team assigned to that four-digit combination will receive the number one pick. The four balls are placed back in the drum and the process is repeated to determine the number two pick. Any unassigned combination is drawn after the number one pick, the balls are replaced and drawn again until a new team has been determined (NBA, 2016)

Hypothesis and Analytical Questions

In order to verify the effectiveness and validity of our prediction model, we have constructed the following analytical questions to support my research hypothesis.

Research Hypothesis

Higher first round draft selections (1-14) have higher chances of becoming an All-Star because the amount of playing time they will be receiving and contributing for non-playoff teams in reaching the playoffs. The amount of playing time or exposure is a considerable predictor variable for the player’s value and opportunity to showcase their basketball talent across the nation and other NBA teams. In this empirical research, we will examine – if any – the correlations between game minutes and statistical attributes, either to reject or fail to reject the research hypothesis – can we predict future NBA All-Star players based on draft order selections and player’s statistical performances?

Analytical Questions

The following six analytical questions will be our primary focus throughout our data examination and analysis.

  1. Which players have the highest classified predicted probabilities whom did not become All-Stars from 1953-2013? And where in the draft were they drafted?
  2. Which players have the lowest classified predicted probabilities whom became All-Stars from 1953-2013? And where in the draft were they drafted?
  3. Which players from 2014 NBA Draft received the highest classified predicted probability rate to become an All-Star? And where in the draft were they drafted?
  4. Which players from 2014 NBA Draft received the lowest classified predicted probability rate to become an All-Star? And where in the draft were they drafted?
  5. Which players from the recent 2015 NBA Draft were classified with the lowest predicted probabilities in becoming future All-Stars? And where in the draft were they drafted?
  6. Which players from the recent 2015 NBA Draft class were classified with the highest predicted probabilities in becoming future All-Stars? And where in the draft were they drafted?

Research Methods

Data Collection

To develop the prediction model system, we will gather the data from various websites available on the world wide web. Since Stat.NBA.com only offers season statistical data from 1998-2015, we will gather data from 1953-1997 from other websites such as, HoopsStats.com and Basketball-Reference.com. In addition, we will finalize our data set using a variety of open-source technological tools available on the Internet. Below are list of data extraction tools:

Data Preparation

In this section, we will detail the methodology approach as how we will finalize our data set prior to analysis.

Data Import

Once we have performed all data extraction onto Excel files, the import process involves converting all worksheets into relational database tables into our local MySQL server using the “Excel to SQL” add-in feature.

Data Cleaning/Merge

We have decided to use the player’s name, drafted team, playing team, and college affiliation as our key identifiers for data merge. However, there were a handful of players that share identical same first and last names. To rectify this problem, Excel’s “Text to Column” built-in feature was used to separate first and last names. As a result, the only differences between the two players with identical first and last names is their drafted team, playing team, and college/high school affiliation.

Data Transformation/Re-coding

Example

All-Star Binary Variable (isAllStar)
All-Star binary variable has been created and inserted a value 1 if the player has been selected as an All-Star or value of 0 if the player is not an All-Star as of June 2016. This variable will be our y-target response variable for later prediction and analysis.

(0=The player is not an All-Star)
(1=The player is an All-Star)

Data Parameters

Although our final NBA draft data set consists of 7,311 drafted players and 36 variables, there were players that did not continue to play in the NBA or performed due to injuries, etc. Thus, we removed all players with null values from our data set. Our final data set consists of 2,970 players and 36 features – including one y-target response variable – which are a combination of factor and integer data types. The entire list of variables is listed in the Appendix.

Data Analysis

Exploratory Data Analysis

In this section, we will examine the data to derive any valuable insights from our data set and discover any patterns – if any – that can help us understand our players.

We will examine which top ten college universities drafted the most players per NBA team. As shown in Table 2, University of Kentucky and University of North Carolina were top two with 108 (1.47%) and 105 (1.43%), respectively. Moreover, 324 (4.43%) players were drafted either straight from high school or did not attend college. Furthermore, Table 3 shows the top ten total number of All-Star players from each college. 14 college players were drafted from University of North Carolina and became All-Stars while 21 players did not attend college but became All-Stars.

Table 4 shows the total number of players drafted from colleges in the first fourteenth selections. Top of the list is University of North Carolina (13) followed by None (12), which is our indicator for no college experience or player is an International player; followed by University of California, Los Angeles in close third with ten.

According to our analysis in Tables 1-4, there appears to be a common pattern as to which college had the most drafted players and All-Stars – University of North Carolina tops the list almost every category. Their successful intercollegiate basketball program is ranked one of the best recruited college team in Division 1 having won five NCAA Tournament Championships from since 1957. It only makes sense that NBA teams are aware of their basketball program having ranked Top 25 in 18 of the last 20 years.

Rookie of the Year Award Winners

Amongst the 67 total Rookie of the Year Award winners in NBA history, 56 of those later became All-Stars and eleven players did not, 83.6 percent and 16.4 percent respectively. These eleven players are shown in Table 5. Contrariwise, 350 drafted players became All-Stars without having to win the Rookie of the Year award.

All-Star Player’s Statistical Performances

Points-Per-Game

Table 5 shows the frequency count and percentages of all All-Star players points-per-game in their rookie season. Majority of the points-per-game distribution ranges from 3.20 to 22.50 PPG. The majority of the All-Star players averaged 9.20 PPG during their rookie season.

Of all the All-Star players, the mean of points-per-game (PPG) is 12.05, median is 11.1 PPG, and the standard deviation is 6.27 PPG. Surprisingly, there is a player with 1.60 PPG in his rookie year. The only logical reasoning behind this is that he does not have the opportunity to showcase his abilities in his rookie season due to possible factors – injuries, veteran team, etc. However, this player may have been given the opportunity to increase performance with another team in either trades, free agency, or expansion drafts. As it turns out, this player is Andrew Bynum. He was drafted tenth overall in the 2005 NBA Draft by the Los Angeles Lakers after posting a losing season record of 34-48. With Lakers' centers Chris Mihm and Kwame Brown injured at the start of the 2006–07 season, Bynum served as their starting center. He played his first seven years (2005-2012) with the Los Angeles Lakers and became an All-Star in his final year with the team.

The mean for PPG for players that did not become All-Stars is 5.12 PPG. However, there is one player with a maximum points-per-game of 22.90. This is considered very high for a player that is capable of generating over 20 PPG and is expected to continue the same point production in his career. This player is Ron Harper of Miami University. He was the eighth overall selection in the 1986 NBA Draft by the Cleveland Cavaliers. As it turns out, he was injured in two of his first four years in Cleveland before being traded to the Los Angeles Clippers where he performed considerably well. After a short stint with the Clippers, he signed with the Chicago Bulls in the free agency market to play alongside with Michael Jordan in the beginning of the 1994 season. Although Harper is an All-Rookie First Team and a five-time champion, he never became an All-Star in his 16-year professional NBA career.

Minutes-Per-Game

Next, we would like to find out the average minutes-per-game (MPG) played and the frequency count distribution in their rookie season whom later became future All-Stars. The average minutes-per-game is 25.5 MPG.

Draft Order Selection

Table 12 below shows the frequency counts and cumulative count percentages of first round draft pick selections of all NBA All-Stars. The first fourteen draft selections in the first round is accounted for 14.6% percent of All-Stars.

Interpretations

To better understand how our model works, let's examine our tree, which predicts whether a drafted player would become an All-Star. The consideration of our players to become an All-Star begins at the root node, where it is then passed through decision nodes that require choices to be made based on the attributes of an All-Star. These choices split the data across branches that indicate potential outcomes of a decision, depicted as "yes" or "no" outcomes, though in some cases there may be more than two possibilities. In the case a final decision can be made, the tree is terminated by leaf nodes that denote the action to be taken as the result of the series of decisions. In the case of a predictive model, the leaf nodes provide the expected result given the series of events in the tree (Lantz, 2015).

As expected, our Bootstrap Forests performed fairly well. Bootstrap Forests produced 100 trees and a minimum of ten splits per tree. Thus, our model was able to determine the best tree – at tree number 91. More on the results are located in the Appendix of the final PDF copy.

According our Confusion Matrix, our model correctly classified all but 127 of the 1757 training instances for an error rate of 7.2 percent – 92.8 percent were classified correctly.

A total of 5 actual no values were incorrectly classified as yes (false positives), while 122 yes values were misclassified as no (false negatives). However, decision trees are known for having a tendency to overfit the training model. For this reason, the error rate reported on training data may be overly optimistic, and it is especially important to evaluate decision trees on a test dataset.

Out of the 497 test instances, our model correctly predicted that 432 players did not become All-Stars and 16 did, resulting in an accuracy of 90.1 percent and an error rate of 9.9 percent, which is identical to our training model. This is quite impressive.

Results

Answers to Our Analytical Questions

  • Which players have the highest classified predicted probabilities whom did not become All-Stars from 1953-2013? And where in the draft were they drafted?
  • *According to our predicted probabilities shown below (Table 13), Elmore Smith received the highest predicted probability rate from 1953-2014 – 77.2 percent and did not become an All-Star. Of the fifteen players shown in Table 13, all but two were drafted outside of the first round. Meanwhile, thirteen of the fifteen players were drafted from pick selection #1-#11.

  • Which players have the lowest classified predicted probabilities whom became All-Stars from 1953-2013? And where in the draft were they drafted?
  • *According to Table 14, the player with the lowest predicted probability to become an All-Star is Draymond Green from Michigan State University. Green was drafted from the second round in the 2012 NBA Draft with an improbable rate of 1.26 percent. Of the fifteen players in the table, seven players were drafted outside of the first round.

  • Which players from 2014 NBA Draft received the highest classified predicted probability rate to become an All-Star? And where in the draft were they drafted?

  • Which players from 2014 NBA Draft received the lowest classified predicted probability rate to become an All-Star? And where in the draft were they drafted?

  • Which players from the recent 2015 NBA Draft were classified with the lowest predicted probabilities in becoming future All-Stars? And where in the draft were they drafted?
  • Branden Dawson received the lowest rate of 0.08 percent chance of becoming a future All-Star. We also want to note that Kevon Looney, was drafted 30th by Golden State in the first round of the 2015 Draft, received a rate of 1.23 percent. This makes sense because of his off-season injuries and limited playing time as veteran reserves and All-Stars on the team – Stephen Curry, Draymond Green, and Klay Thompson – push for their second consecutive championship.

  • Which players from the recent 2015 NBA Draft class were classified with the highest predicted probabilities in becoming future All-Stars? And where in the draft were they drafted?

Summary and Conclusion

In summary, in this report we went through the prediction and analysis process that started with data discoveries from variety of external sources – world wide web. We used several procedures to clean and munge the data. Along the way, we refined some of the useful predictors and eliminate those that did not work. Once we were satisfied with the final data, we provided general data analysis and discover patterns to seek valuable insights.

Our analysis suggests that there is significant relationship which players will become All-Stars based on order selections, which validates our hypothesis. Per our data discovery, experiment results, and analysis, pick order selections do make a difference because non-playoff teams are willing to grant playing time in order to compete in the league, whereas lower round draft order selections have difficulty seeking playing time and showcasing their abilities behind star players and veterans for playoff team contenders. If players have more playing time, their abilities and confidence will rise to stardom, winner of multiple accolades e.g. Rookie of Year Awards, Rookie of the Month Awards, etc.; with the possibility to be mentioned in All-Star Ballotting candidacy. However, we do see possible several exogenous variables that are excluded in our experiment. Nonetheless, we answered our analytical questions using the player’s likelihood results, with a prediction accuracy rate of 90.1 percent and an error rate of 9.9 percent on our test instances. This analysis may be of interest to college players entering in the NBA draft to better understand the value of developing their skills and being selected earlier in the first round.

Tech Tags: