截距

2.0.0 版本新增。

自 2.0.0 版本以来，XGBoost 支持在训练时根据目标自动估计模型截距（命名为 base_score）。此行为可以通过将 base_score 设置为一个常数来控制。以下代码段禁用了自动估计

import xgboost as xgb

clf = xgb.XGBClassifier(n_estimators=10)
clf.set_params(base_score=0.5)

library(xgboost)

# Load built-in dataset
data(agaricus.train, package = "xgboost")

# Set base_score parameter directly
model <- xgboost(
  x = agaricus.train$data,
  y = factor(agaricus.train$label),
  base_score = 0.5,
  nrounds = 10
)

此外，这里的 0.5 表示应用逆链接函数后的值。有关描述，请参见文档末尾。

除了 base_score，用户还可以通过数据字段 base_margin 提供全局偏差，它是一个向量或矩阵，具体取决于任务。对于多输出和多类别任务，base_margin 是一个大小为 (n_samples, n_targets) 或 (n_samples, n_classes) 的矩阵。

import xgboost as xgb
from sklearn.datasets import make_classification

X, y = make_classification()

clf = xgb.XGBClassifier()
clf.fit(X, y)
# Request for raw prediction
m = clf.predict(X, output_margin=True)

clf_1 = xgb.XGBClassifier()
# Feed the prediction into the next model
# Using base margin overrides the base score, see below sections.
clf_1.fit(X, y, base_margin=m)
clf_1.predict(X, base_margin=m)

library(xgboost)

# Load built-in dataset
data(agaricus.train, package = "xgboost")

# Train first model
model_1 <- xgboost(
  x = agaricus.train$data,
  y = factor(agaricus.train$label),
  nrounds = 10
)

# Request for raw prediction
m <- predict(model_1, agaricus.train$data, type = "raw")

# Feed the prediction into the next model using base_margin
# Using base margin overrides the base score, see below sections.
model_2 <- xgboost(
  x = agaricus.train$data,
  y = factor(agaricus.train$label),
  base_margin = m,
  nrounds = 10
)

# Make predictions with base_margin
pred <- predict(model_2, agaricus.train$data, base_margin = m)

它指定了每个样本的偏差，可用于在其他模型之上堆叠 XGBoost 模型，请参阅从预测中提升的演示以获取一个工作示例。当指定 base_margin 时，它会自动覆盖 base_score 参数。如果您正在堆叠 XGBoost 模型，那么使用应该相对简单，前一个模型提供原始预测，新模型将该预测用作偏差。对于更定制的输入，用户需要额外注意链接函数。设 \(F\) 为模型，\(g\) 为链接函数，由于当样本特有的 base_margin 可用时 base_score 会被覆盖，我们在此省略它

\[g(E[y_i]) = F(x_i)\]

当提供了基础边距 \(b\) 时，它会添加到原始模型输出 \(F\) 中

\[g(E[y_i]) = F(x_i) + b_i\]

最终模型的输出是

\[g^{-1}(F(x_i) + b_i)\]

以伽马偏差目标 reg:gamma 为例，它具有对数链接函数，因此

\[\begin{split}\ln{(E[y_i])} = F(x_i) + b_i \\ E[y_i] = \exp{(F(x_i) + b_i)}\end{split}\]

因此，如果您正在输入来自具有相应目标函数的 GLM 等模型的输出，请确保这些输出尚未通过逆链接（激活）转换。

在 base_score（截距）的情况下，它可以在估计后通过 save_config() 访问。与 base_margin 不同，返回的值表示应用逆链接后的值。以逻辑回归和 logit 链接函数为例，给定 base_score 为 0.5，则 \(g(intercept) = logit(0.5) = 0\) 被添加到原始模型输出中

\[E[y_i] = g^{-1}{(F(x_i) + g(intercept))}\]

0.5 与 \(base\_score = g^{-1}(0) = 0.5\) 相同。如果您移除模型并仅考虑截距，它在模型拟合之前估计，这会更直观

\[\begin{split}E[y] = g^{-1}{(g(intercept))} \\ E[y] = intercept\end{split}\]

对于某些目标（如 MAE），存在封闭解，而对于其他目标，则使用一步牛顿法进行估计。

偏移量

base_margin 是 GLM 中 offset 的一种形式。以泊松目标为例，我们可能希望建模速率而不是计数

\[rate = \frac{count}{exposure}\]

偏移量定义为应用于曝光变量的对数链接：\(\ln{exposure}\)。设 \(c\) 为计数，\(\gamma\) 为曝光，将响应 \(y\) 代入我们之前关于基础边距的公式中

\[g(\frac{E[c_i]}{\gamma_i}) = F(x_i)\]

对于泊松回归，将 \(g\) 替换为 \(\ln\)

\[\ln{\frac{E[c_i]}{\gamma_i}} = F(x_i)\]

我们有

\[\begin{split}E[c_i] &= \exp{(F(x_i) + \ln{\gamma_i})} \\ E[c_i] &= g^{-1}(F(x_i) + g(\gamma_i))\end{split}\]

如您所见，我们可以使用 base_margin 进行与 GLM 类似的偏移量建模

示例

以下示例展示了使用带有 logit 链接函数的二元逻辑回归时 base_score 和 base_margin 之间的关系

import numpy as np
from scipy.special import logit
from sklearn.datasets import make_classification

import xgboost as xgb

X, y = make_classification(random_state=2025)

library(xgboost)

# Load built-in dataset
data(agaricus.train, package = "xgboost")
X <- agaricus.train$data
y <- agaricus.train$label

截距是一个有效的概率（0.5）。它被用作获得正样本概率的初始估计。

intercept = 0.5

intercept <- 0.5

首先我们使用截距来训练模型

booster = xgb.train(
    {"base_score": intercept, "objective": "binary:logistic"},
    dtrain=xgb.DMatrix(X, y),
    num_boost_round=1,
)
predt_0 = booster.predict(xgb.DMatrix(X, y))

# First model with base_score
model_0 <- xgboost(
  x = X, y = factor(y),
  base_score = intercept,
  objective = "binary:logistic",
  nrounds = 1
)
predt_0 <- predict(model_0, X)

应用 logit() 以获取“边距”

# Apply logit function to obtain the "margin"
margin = np.full(y.shape, fill_value=logit(intercept), dtype=np.float32)
Xy = xgb.DMatrix(X, y, base_margin=margin)
# Second model with base_margin
# 0.2 is a dummy value to show that `base_margin` overrides `base_score`.
booster = xgb.train(
    {"base_score": 0.2, "objective": "binary:logistic"},
    dtrain=Xy,
    num_boost_round=1,
)
predt_1 = booster.predict(Xy)

# Apply logit function to obtain the "margin"
logit_intercept <- log(intercept / (1 - intercept))
margin <- rep(logit_intercept, length(y))
# Second model with base_margin
# 0.2 is a dummy value to show that `base_margin` overrides `base_score`
model_1 <- xgboost(
  x = X, y = factor(y),
  base_margin = margin,
  base_score = 0.2,
  objective = "binary:logistic",
  nrounds = 1
)
predt_1 <- predict(model_1, X, base_margin = margin)

比较结果

np.testing.assert_allclose(predt_0, predt_1)

all.equal(predt_0, predt_1, tolerance = 1e-6)