Title: | Convert Decision Tree Objects into Tidy Data Frames |
---|---|
Description: | Convert Decision Tree objects into tidy data frames, by using the framework laid out by the package broom, this means that decision tree output can be easily reshaped, porocessed, and combined with tools like 'dplyr', 'tidyr' and 'ggplot2'. Like the package broom, broomstick provides three S3 generics: tidy, to summarise decision tree specific features - tidy returns the variable importance table; augment adds columns to the original data such as predictions and residuals; and glance, which provides a one-row summary of model-level statistics. |
Authors: | Nicholas Tierney [aut, cre], Matthew Lincoln [aut] |
Maintainer: | Nicholas Tierney <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2.9200 |
Built: | 2024-11-04 06:05:00 UTC |
Source: | https://github.com/njtierney/broomstick |
These methods tidy the variable importance of a random forest model summary, augment the original data with information on the fitted values/classifications and error, and construct a one-row glance of the model's statistics.
## S3 method for class 'randomForest' augment(x, data = NULL, ...) ## S3 method for class 'randomForest' glance(x, ...) ## S3 method for class 'randomForest' tidy(x, ...)
## S3 method for class 'randomForest' augment(x, data = NULL, ...) ## S3 method for class 'randomForest' glance(x, ...) ## S3 method for class 'randomForest' tidy(x, ...)
x |
randomForest object |
data |
Model data for use by |
... |
Additional arguments (ignored) |
augment.randomForest
returns the original data with additional columns:
.oob_times |
The number of trees for which the given case was "out of bag". See |
.fitted |
The fitted value or class. |
augment
returns additional columns for classification and usupervised trees:
.votes |
For each case, the voting results, with one column per class. |
.local_var_imp |
The casewise variable importance, stored as data frames in a nested list-column, with one row per variable in the model. Only present if the model was created with |
glance.randomForest
returns a data.frame with the following
columns for regression trees:
mse |
The average mean squared error across all trees. |
rsq |
The average pesudo-R-squared across all trees. See |
For classification trees: one row per class, with the following columns:
precision |
|
recall |
|
accuracy |
|
f_measure |
All tidying methods return a data.frame
without rownames. The
structure depends on the method chosen.
tidy.randomForest
returns one row for each model term, with the following columns:
term |
The term in the randomForest model |
MeanDecreaseAccuracy |
A measure of variable importance. See |
MeanDecreaseGini |
A measure of variable importance. See |
MeanDecreaseAccuracy_sd |
Standard deviation of |
classwise_importance |
Classwise variable importance for each term, stored as data frames in a nested list-column, with one row per class. Only present if the model was created with |
Augment your model object
## S3 method for class 'rpart' augment(x, data = NULL, newdata = NULL, ...)
## S3 method for class 'rpart' augment(x, data = NULL, newdata = NULL, ...)
x |
rpart model |
data |
data.frame from the model |
newdata |
new data to use for predictions, residuals, etc. |
... |
extra arguments to pass |
augment.rpart
returns the original data with additional columns:
.fitted
: The fitted value or class.
.resid
: only given when the same data as was used for the model is
provided.
library(rpart) rpart_fit <- rpart(Sepal.Width ~ ., iris) augment(rpart_fit)
library(rpart) rpart_fit <- rpart(Sepal.Width ~ ., iris) augment(rpart_fit)
Convert decision tree analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model's statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.
tidy returns a tibble of variable importance for the rpart pacakge
## S3 method for class 'gbm' tidy(x, n_trees = x$n.trees, scale = FALSE, sort = TRUE, normalise = TRUE, ...)
## S3 method for class 'gbm' tidy(x, n_trees = x$n.trees, scale = FALSE, sort = TRUE, normalise = TRUE, ...)
x |
A |
n_trees |
integer. (optional) Number of trees to use for computing relative importance. Default is the number of trees in x$n.trees. If not provided, a guess is made using the heuristic: If a test set was used in fitting, the number of trees resulting in lowest test set error will be used; else, if cross-validation was performed, the number of trees resulting in lowest cross-validation error will be used; otherwise, all trees will be used. |
scale |
(optional) Should importance be scaled? Default is FALSE |
sort |
(optional) Should results be sorted? Default is TRUE |
normalise |
(optional) Should results be normalised to sum to 100? Default is TRUE |
... |
extra functions or arguments |
A tibble containing the importance score for each variable
# retrieve a tibble of the variable importance from an gbm model library(gbm) library(MASS) fit_gbm <- gbm(calories ~., data = UScereal) tidy(fit_gbm)
# retrieve a tibble of the variable importance from an gbm model library(gbm) library(MASS) fit_gbm <- gbm(calories ~., data = UScereal) tidy(fit_gbm)
tidy returns a tibble of variable importance for the rpart pacakge
## S3 method for class 'rpart' tidy(x, ...)
## S3 method for class 'rpart' tidy(x, ...)
x |
An |
... |
extra functions or arguments |
A tibble containing the importance score for each variable
# retrieve a tibble of the variable importance from an rpart model library(rpart) fit_rpart <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) tidy(fit_rpart)
# retrieve a tibble of the variable importance from an rpart model library(rpart) fit_rpart <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) tidy(fit_rpart)