Package 'broomstick' reference manual

Title:	Convert Decision Tree Objects into Tidy Data Frames
Description:	Convert Decision Tree objects into tidy data frames, by using the framework laid out by the package broom, this means that decision tree output can be easily reshaped, porocessed, and combined with tools like 'dplyr', 'tidyr' and 'ggplot2'. Like the package broom, broomstick provides three S3 generics: tidy, to summarise decision tree specific features - tidy returns the variable importance table; augment adds columns to the original data such as predictions and residuals; and glance, which provides a one-row summary of model-level statistics.
Authors:	Nicholas Tierney [aut, cre], Matthew Lincoln [aut]
Maintainer:	Nicholas Tierney <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.2.9200
Built:	2025-02-02 05:26:12 UTC
Source:	https://github.com/njtierney/broomstick

Tidying methods for a randomForest model

Description

These methods tidy the variable importance of a random forest model summary, augment the original data with information on the fitted values/classifications and error, and construct a one-row glance of the model's statistics.

Usage

## S3 method for class 'randomForest'
augment(x, data = NULL, ...)

## S3 method for class 'randomForest'
glance(x, ...)

## S3 method for class 'randomForest'
tidy(x, ...)
## S3 method for class 'randomForest'
augment(x, data = NULL, ...)

## S3 method for class 'randomForest'
glance(x, ...)

## S3 method for class 'randomForest'
tidy(x, ...)

Arguments

`x`	randomForest object
`data`	Model data for use by `augment.randomForest()`.
`...`	Additional arguments (ignored)

Value

augment.randomForest returns the original data with additional columns:

`.oob_times`	The number of trees for which the given case was "out of bag". See `randomForest::randomForest()` for more details.
`.fitted`	The fitted value or class.

augment returns additional columns for classification and usupervised trees:

`.votes`	For each case, the voting results, with one column per class.
`.local_var_imp`	The casewise variable importance, stored as data frames in a nested list-column, with one row per variable in the model. Only present if the model was created with `importance = TRUE`

glance.randomForest returns a data.frame with the following columns for regression trees:

`mse`	The average mean squared error across all trees.
`rsq`	The average pesudo-R-squared across all trees. See `randomForest::randomForest()` for more information.

For classification trees: one row per class, with the following columns:

`precision`
`recall`
`accuracy`
`f_measure`

All tidying methods return a data.frame without rownames. The structure depends on the method chosen.

tidy.randomForest returns one row for each model term, with the following columns:

`term`	The term in the randomForest model
`MeanDecreaseAccuracy`	A measure of variable importance. See `randomForest::randomForest()` for more information. Only present if the model was created with `importance = TRUE`
`MeanDecreaseGini`	A measure of variable importance. See `randomForest::randomForest()` for more information.
`MeanDecreaseAccuracy_sd`	Standard deviation of `MeanDecreaseAccuracy`. See `randomForest::randomForest()` for more information. Only present if the model was created with `importance = TRUE`
`classwise_importance`	Classwise variable importance for each term, stored as data frames in a nested list-column, with one row per class. Only present if the model was created with `importance = TRUE`

Augment your model object

Description

Augment your model object

Usage

## S3 method for class 'rpart'
augment(x, data = NULL, newdata = NULL, ...)
## S3 method for class 'rpart'
augment(x, data = NULL, newdata = NULL, ...)

Arguments

`x`	rpart model
`data`	data.frame from the model
`newdata`	new data to use for predictions, residuals, etc.
`...`	extra arguments to pass

Value

augment.rpart returns the original data with additional columns:

.fitted: The fitted value or class.
.resid: only given when the same data as was used for the model is provided.

Examples

library(rpart)
rpart_fit <- rpart(Sepal.Width ~ ., iris)
augment(rpart_fit)
library(rpart)
rpart_fit <- rpart(Sepal.Width ~ ., iris)
augment(rpart_fit)

Convert Decision Tree Analysis Objects into Tidy Data Frames

Description

Convert decision tree analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model's statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.

tidy up the model summary of gbm

Description

tidy returns a tibble of variable importance for the rpart pacakge

Usage

## S3 method for class 'gbm'
tidy(x, n_trees = x$n.trees, scale = FALSE, sort = TRUE, normalise = TRUE, ...)
## S3 method for class 'gbm'
tidy(x, n_trees = x$n.trees, scale = FALSE, sort = TRUE, normalise = TRUE, ...)

Arguments

`x`	A `gbm` model
`n_trees`	integer. (optional) Number of trees to use for computing relative importance. Default is the number of trees in x$n.trees. If not provided, a guess is made using the heuristic: If a test set was used in fitting, the number of trees resulting in lowest test set error will be used; else, if cross-validation was performed, the number of trees resulting in lowest cross-validation error will be used; otherwise, all trees will be used.
`scale`	(optional) Should importance be scaled? Default is FALSE
`sort`	(optional) Should results be sorted? Default is TRUE
`normalise`	(optional) Should results be normalised to sum to 100? Default is TRUE
`...`	extra functions or arguments

Value

A tibble containing the importance score for each variable

Examples


# retrieve a tibble of the variable importance from an gbm model

library(gbm)
library(MASS)
fit_gbm <- gbm(calories ~., data = UScereal)

tidy(fit_gbm)

# retrieve a tibble of the variable importance from an gbm model

library(gbm)
library(MASS)
fit_gbm <- gbm(calories ~., data = UScereal)

tidy(fit_gbm)

tidy up the model summary of rpart

Description

tidy returns a tibble of variable importance for the rpart pacakge

Usage

## S3 method for class 'rpart'
tidy(x, ...)
## S3 method for class 'rpart'
tidy(x, ...)

Arguments

`x`	An `rpart` model
`...`	extra functions or arguments

Value

A tibble containing the importance score for each variable

Examples


# retrieve a tibble of the variable importance from an rpart model

library(rpart)
fit_rpart <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

tidy(fit_rpart)

# retrieve a tibble of the variable importance from an rpart model

library(rpart)
fit_rpart <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

tidy(fit_rpart)

Package 'broomstick'

Help Index

Tidying methods for a randomForest model

Description

Usage

Arguments

Value

Augment your model object

Description

Usage

Arguments

Value

Examples

Convert Decision Tree Analysis Objects into Tidy Data Frames

Description

tidy up the model summary of gbm

Description

Usage

Arguments

Value

Examples

tidy up the model summary of rpart

Description

Usage

Arguments

Value

Examples