回调，用于收集 gblinear 助推器的系数历史 — xgb.cb.gblinear.history • xgboost

用于收集 gblinear 助推器的系数历史的回调

用法

xgb.cb.gblinear.history(sparse = FALSE)

参数

sparse: 当设置为 FALSE/TRUE 时，分别使用稠密/稀疏矩阵来存储结果。当预期只有一部分系数非零时，即使用“thrifty”特征选择器且每次迭代选择的顶部特征数量相对较少时，稀疏格式特别有用。

返回值

一个 xgb.Callback 对象，可以传递给 xgb.train() 或 xgb.cv()。

详情

为了保持快速和简洁，gblinear 助推器内部不存储线性模型系数在每次提升迭代时的历史记录。此回调提供了一种解决方法来存储系数的路径，通过在每次训练迭代后提取它们。

此回调将构建一个矩阵，其中行是提升迭代，列是特征系数（顺序与调用 coef.xgb.Booster 时相同，截距对应于第一列）。

当每个特征有多个系数时（例如多类别分类），结果将被重塑为一个向量，其中系数首先按特征排列，然后按类别排列（例如前 1 到 N 个系数将属于第一个类别，然后是第 N+1 到 2N 个系数属于第二个类别，依此类推）。

如果结果中每个特征只有一个系数，则生成的矩阵将具有与特征名称匹配的列名，否则（当每个特征有多个系数时），名称将组合为 '列名' + ':' + '类别索引'（因此例如，类别 '0' 的列 'c1' 将命名为 'c1:0'）。

对于 xgb.train()，输出是稠密或稀疏矩阵。对于 xgb.cv()，它是一个此类矩阵的列表（每个折叠一个元素）。

函数 xgb.gblinear.history 提供了一种从该回调中检索输出的简便方法。

另请参阅

xgb.gblinear.history, coef.xgb.Booster.

示例

#### Binary classification:

## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)

# In the iris dataset, it is hard to linearly separate Versicolor class from the rest
# without considering the 2nd order interactions:
x <- model.matrix(Species ~ .^2, iris)[, -1]
colnames(x)
dtrain <- xgb.DMatrix(
  scale(x),
  label = 1 * (iris$Species == "versicolor"),
  nthread = nthread
)
param <- xgb.params(
  booster = "gblinear",
  objective = "reg:logistic",
  eval_metric = "auc",
  reg_lambda = 0.0003,
  reg_alpha = 0.0003,
  nthread = nthread
)

# For 'shotgun', which is a default linear updater, using high learning_rate values may result in
# unstable behaviour in some datasets. With this simple dataset, however, the high learning
# rate does not break the convergence, but allows us to illustrate the typical pattern of
# "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
bst <- xgb.train(
  c(param, list(learning_rate = 1.)),
  dtrain,
  evals = list(tr = dtrain),
  nrounds = 200,
  callbacks = list(xgb.cb.gblinear.history())
)

# Extract the coefficients' path and plot them vs boosting iteration number:
coef_path <- xgb.gblinear.history(bst)
matplot(coef_path, type = "l")

# With the deterministic coordinate descent updater, it is safer to use higher learning rates.
# Will try the classical componentwise boosting which selects a single best feature per round:
bst <- xgb.train(
  c(
    param,
    xgb.params(
      learning_rate = 0.8,
      updater = "coord_descent",
      feature_selector = "thrifty",
      top_k = 1
    )
  ),
  dtrain,
  evals = list(tr = dtrain),
  nrounds = 200,
  callbacks = list(xgb.cb.gblinear.history())
)
matplot(xgb.gblinear.history(bst), type = "l")
#  Componentwise boosting is known to have similar effect to Lasso regularization.
# Try experimenting with various values of top_k, learning_rate, nrounds,
# as well as different feature_selectors.

# For xgb.cv:
bst <- xgb.cv(
  c(
    param,
    xgb.params(
      learning_rate = 0.8,
      updater = "coord_descent",
      feature_selector = "thrifty",
      top_k = 1
    )
  ),
  dtrain,
  nfold = 5,
  nrounds = 100,
  callbacks = list(xgb.cb.gblinear.history())
)
# coefficients in the CV fold #3
matplot(xgb.gblinear.history(bst)[[3]], type = "l")


#### Multiclass classification:
dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1, nthread = nthread)

param <- xgb.params(
  booster = "gblinear",
  objective = "multi:softprob",
  num_class = 3,
  reg_lambda = 0.0003,
  reg_alpha = 0.0003,
  nthread = nthread
)

# For the default linear updater 'shotgun' it sometimes is helpful
# to use smaller learning_rate to reduce instability
bst <- xgb.train(
  c(param, list(learning_rate = 0.5)),
  dtrain,
  evals = list(tr = dtrain),
  nrounds = 50,
  callbacks = list(xgb.cb.gblinear.history())
)

# Will plot the coefficient paths separately for each class:
matplot(xgb.gblinear.history(bst, class_index = 0), type = "l")
matplot(xgb.gblinear.history(bst, class_index = 1), type = "l")
matplot(xgb.gblinear.history(bst, class_index = 2), type = "l")

# CV:
bst <- xgb.cv(
  c(param, list(learning_rate = 0.5)),
  dtrain,
  nfold = 5,
  nrounds = 70,
  callbacks = list(xgb.cb.gblinear.history(FALSE))
)
# 1st fold of 1st class
matplot(xgb.gblinear.history(bst, class_index = 0)[[1]], type = "l")