Title:  Convert Decision Tree Objects into Tidy Data Frames 

Description:  Convert Decision Tree objects into tidy data frames, by using the framework laid out by the package broom, this means that decision tree output can be easily reshaped, porocessed, and combined with tools like 'dplyr', 'tidyr' and 'ggplot2'. Like the package broom, broomstick provides three S3 generics: tidy, to summarise decision tree specific features  tidy returns the variable importance table; augment adds columns to the original data such as predictions and residuals; and glance, which provides a onerow summary of modellevel statistics. 
Authors:  Nicholas Tierney [aut, cre], Matthew Lincoln [aut] 
Maintainer:  Nicholas Tierney <[email protected]> 
License:  MIT + file LICENSE 
Version:  0.1.2.9200 
Built:  20240801 05:06:17 UTC 
Source:  https://github.com/njtierney/broomstick 
These methods tidy the variable importance of a random forest model summary, augment the original data with information on the fitted values/classifications and error, and construct a onerow glance of the model's statistics.
## S3 method for class 'randomForest' augment(x, data = NULL, ...) ## S3 method for class 'randomForest' glance(x, ...) ## S3 method for class 'randomForest' tidy(x, ...)
## S3 method for class 'randomForest' augment(x, data = NULL, ...) ## S3 method for class 'randomForest' glance(x, ...) ## S3 method for class 'randomForest' tidy(x, ...)
x 
randomForest object 
data 
Model data for use by 
... 
Additional arguments (ignored) 
augment.randomForest
returns the original data with additional columns:
.oob_times 
The number of trees for which the given case was "out of bag". See 
.fitted 
The fitted value or class. 
augment
returns additional columns for classification and usupervised trees:
.votes 
For each case, the voting results, with one column per class. 
.local_var_imp 
The casewise variable importance, stored as data frames in a nested listcolumn, with one row per variable in the model. Only present if the model was created with 
glance.randomForest
returns a data.frame with the following
columns for regression trees:
mse 
The average mean squared error across all trees. 
rsq 
The average pesudoRsquared across all trees. See 
For classification trees: one row per class, with the following columns:
precision 

recall 

accuracy 

f_measure 
All tidying methods return a data.frame
without rownames. The
structure depends on the method chosen.
tidy.randomForest
returns one row for each model term, with the following columns:
term 
The term in the randomForest model 
MeanDecreaseAccuracy 
A measure of variable importance. See 
MeanDecreaseGini 
A measure of variable importance. See 
MeanDecreaseAccuracy_sd 
Standard deviation of 
classwise_importance 
Classwise variable importance for each term, stored as data frames in a nested listcolumn, with one row per class. Only present if the model was created with 
Augment your model object
## S3 method for class 'rpart' augment(x, data = NULL, newdata = NULL, ...)
## S3 method for class 'rpart' augment(x, data = NULL, newdata = NULL, ...)
x 
rpart model 
data 
data.frame from the model 
newdata 
new data to use for predictions, residuals, etc. 
... 
extra arguments to pass 
augment.rpart
returns the original data with additional columns:
.fitted
: The fitted value or class.
.resid
: only given when the same data as was used for the model is
provided.
library(rpart) rpart_fit < rpart(Sepal.Width ~ ., iris) augment(rpart_fit)
library(rpart) rpart_fit < rpart(Sepal.Width ~ ., iris) augment(rpart_fit)
Convert decision tree analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model's statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a onerow summary of modellevel statistics.
tidy returns a tibble of variable importance for the rpart pacakge
## S3 method for class 'gbm' tidy(x, n_trees = x$n.trees, scale = FALSE, sort = TRUE, normalise = TRUE, ...)
## S3 method for class 'gbm' tidy(x, n_trees = x$n.trees, scale = FALSE, sort = TRUE, normalise = TRUE, ...)
x 
A 
n_trees 
integer. (optional) Number of trees to use for computing relative importance. Default is the number of trees in x$n.trees. If not provided, a guess is made using the heuristic: If a test set was used in fitting, the number of trees resulting in lowest test set error will be used; else, if crossvalidation was performed, the number of trees resulting in lowest crossvalidation error will be used; otherwise, all trees will be used. 
scale 
(optional) Should importance be scaled? Default is FALSE 
sort 
(optional) Should results be sorted? Default is TRUE 
normalise 
(optional) Should results be normalised to sum to 100? Default is TRUE 
... 
extra functions or arguments 
A tibble containing the importance score for each variable
# retrieve a tibble of the variable importance from an gbm model library(gbm) library(MASS) fit_gbm < gbm(calories ~., data = UScereal) tidy(fit_gbm)
# retrieve a tibble of the variable importance from an gbm model library(gbm) library(MASS) fit_gbm < gbm(calories ~., data = UScereal) tidy(fit_gbm)
tidy returns a tibble of variable importance for the rpart pacakge
## S3 method for class 'rpart' tidy(x, ...)
## S3 method for class 'rpart' tidy(x, ...)
x 
An 
... 
extra functions or arguments 
A tibble containing the importance score for each variable
# retrieve a tibble of the variable importance from an rpart model library(rpart) fit_rpart < rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) tidy(fit_rpart)
# retrieve a tibble of the variable importance from an rpart model library(rpart) fit_rpart < rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) tidy(fit_rpart)