Kaggle top ranker Xavier Conort shares insights on the “10 R Packages to Win Kaggle Competitions”.
Across all major surveys, R has clearly dominated as one of the top programming choices for data scientists. Thus, it is no wonder that knowing the important R packages can be a vital advantage in Kaggle competitions. Xavier Conort (currently Data Scientist at Data Robot) has compiled a list of 10 R packages that played a key role in getting a top 10 ranking in more than 15 Kaggle competitions (including winning a few of them).
Since R is widely being used even outside the data science community (such as by statisticians, actuaries, etc.), this list of top 10 powerful R packages might help you in more ways than you might think.
Here are those 10 packages particularly powerful to build winning solutions:
Allowing the machine to capture complexity:
Taking advantage of high-cardinality categorical or text-data:
Making your code more efficient:
- Matrix [Sparse and Dense Matrix Classes and Methods]
- SOAR [Memory management in R by delayed assignments]
- foreach [Foreach looping construct for R]
- doMC [Foreach parallel adaptor for the multicore package]
- data.table [Extension of data.frame]
Expert Advice for Kaggle Competitions: Use your intuition to help the machine by doing the following:
- Always compute differences/ratios of features
- Always consider discarding of features that are “too good”
The complete set of slides for this presentation by Xavier Conort: