创建一个特征重要性的 data.table
。
用法
xgb.importance(
model = NULL,
feature_names = getinfo(model, "feature_name"),
trees = NULL
)
返回值
一个包含以下列的 data.table
对于树模型
Features
: 模型中使用的特征名称。Gain
: 基于特征分割的总增益,表示每个特征对模型的贡献分数。百分比越高,重要性越高。Cover
: 与此特征相关的观测数量指标。Frequency
: 特征在树中被使用的次数百分比。
对于线性模型
Features
: 模型中使用的特征名称。Weight
: 此特征的线性系数。Class
: 类别标签(仅适用于多分类模型)。对于xgboost
类的对象(由xgboost()
生成),它将是一个factor
;而对于xgb.Booster
类的对象(由xgb.train()
生成),它将是一个基于零的整数向量。
如果未提供 feature_names
且 model
不包含 feature_names
,则将使用特征的索引代替。由于索引是从模型转储中提取的(基于 C++ 代码),因此它从 0 开始(如 C/C++ 或 Python 中),而不是从 1 开始(通常在 R 中)。
示例
# binary classification using "gbtree":
data("ToothGrowth")
x <- ToothGrowth[, c("len", "dose")]
y <- ToothGrowth$supp
model_tree_binary <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gbtree",
max_depth = 2L
)
xgb.importance(model_tree_binary)
# binary classification using "gblinear":
model_tree_linear <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gblinear",
learning_rate = 0.3
)
xgb.importance(model_tree_linear)
# multi-class classification using "gbtree":
data("iris")
x <- iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]
y <- iris$Species
model_tree_multi <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gbtree",
max_depth = 3
)
# all classes clumped together:
xgb.importance(model_tree_multi)
# inspect importances separately for each class:
num_classes <- 3L
nrounds <- 5L
xgb.importance(
model_tree_multi, trees = seq(from = 1, by = num_classes, length.out = nrounds)
)
xgb.importance(
model_tree_multi, trees = seq(from = 2, by = num_classes, length.out = nrounds)
)
xgb.importance(
model_tree_multi, trees = seq(from = 3, by = num_classes, length.out = nrounds)
)
# multi-class classification using "gblinear":
model_linear_multi <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gblinear",
learning_rate = 0.2
)
xgb.importance(model_linear_multi)