I built a small gradient-boosted tree library based on the screening transform from "Screening Is Enough" (Nakanishi 2026, arXiv:2604.01178). The paper was originally written for Transformers, but the core idea — replacing relative comparison with absolute-threshold rejection — maps naturally onto GBDT split selection.

Disclaimer: I'm not affiliated with the paper's author. This is an independent implementation that applies the screening idea to GBDTs.

The idea in one paragraph

Every GBDT implementation picks the split with the highest gain among all candidates. This means the tree always splits, even if the best candidate is nearly useless. min_gain_to_split is the standard workaround, but it's an arbitrary hyperparameter that needs tuning per dataset.

ibu-boost replaces this with a screening transform:

raw_gain = G_L^2/(H_L+λ) + G_R^2/(H_R+λ) - G_total^2/(H_total+λ) norm_gain = raw_gain / H_total # N-invariant, O(1) regardless of dataset size s = 1 - exp(-norm_gain / τ) # bounded similarity in [0, 1) ρ = max(1 - r*(1-s), 0)^2 # Trim-and-Square

If max(ρ) == 0 across all (feature, bin) candidates, the node becomes a leaf automatically — no split is issued. There is no min_gain_to_split to tune.

The threshold behaviour is controlled by s_w (temperature) and s_r (acceptance width), both stored in log-space, and will become learnable in a future release.

What's implemented

Two tree types: non-oblivious (standard per-node splits) and oblivious (CatBoost-style symmetric splits — all nodes at the same depth share one split)
Gradient boosting with MSE regression and binary log-loss
Missing value handling: XGBoost-style learned default direction per split
Triton GPU kernels: fused histogram scatter + screening transform, batched multi-node dispatch, full on-device gradient normalisation
ScreeningDiagnostics: accept_rate per round — a built-in health check for over/under-rejection
ScreeningParamSearch: K-fold grid search over (s_w, s_r)

Benchmark (California Housing, 100 rounds, oblivious tree)

Model	RMSE	Train time
LightGBM (default)	0.4711 ± 0.0042	—
ibu-boost (CPU)	0.5286 ± 0.0039	5.34 s
ibu-boost (RTX 4060 Ti)	0.5286 ± 0.0039	1.70 s (3.15x)

Gap to LightGBM is ~12% RMSE. Honest take: this is an early alpha. Part of the gap comes from s_w/s_r being fixed scalars — once they become learnable (Phase 2), the threshold should adapt per dataset. But I also suspect the gap will persist on small, clean datasets like California Housing where over-splitting isn't a real problem. The hypothesis is that absolute rejection pays off more on high-dimensional or noisy data where standard GBDTs tend to overfit via spurious splits. I haven't tested this rigorously yet — if you have a go-to tabular benchmark suite, I'd love to hear about it.

Kernel-level speedup (N=65536, F=8, B=255): 51x over NumPy reference.

Install

pip install ibu-boost # NumPy reference only pip install "ibu-boost[triton]" # + Triton GPU kernels (Linux / Windows CUDA)

Quick start

from ibu_boost import ScreeningBooster model = ScreeningBooster( n_estimators=100, learning_rate=0.1, max_depth=6, tree_type="oblivious", # CatBoost-style symmetric splits device="cuda", # requires [triton] extra ) model.fit(X_train, y_train) print(f"Accept rate: {model.mean_accept_rate():.1%}") # screening health check

What I'd like feedback on

Screening calibration: Does the absolute-rejection idea feel useful in practice, or does it just move the tuning problem from min_gain_to_split to (s_w, s_r)?
Benchmark suggestions: Which tabular datasets or benchmark suites would best stress-test the "auto-stop on noise" property?
Triton kernel design: The histogram scatter uses sample-parallel atomic_add, which is non-deterministic. Any tips on deterministic alternatives that don't kill throughput?

Happy to discuss the theory or implementation details.

submitted by /u/Pleasant_Yard_8879
[link] [comments]

[P] ibu-boost: a GBDT library where splits are absolutely rejected, not just relatively ranked[P]

The idea in one paragraph

What's implemented

Benchmark (California Housing, 100 rounds, oblivious tree)

Install

Quick start

Links

What I'd like feedback on

Want to read more?

Tagged with