Skip to content

[BUG] In LightGBM use of improvementTolerance has inverted semantics for lower-is-better metrics (RMSE, MAE, etc.), causing early stopping to never trigger #2565

@sjoshi-antuit

Description

@sjoshi-antuit

SynapseML version

Reproducible on all versions from 0.9.x through 1.1.3 (latest) — TrainUtils.scala has not changed since Dec 2022

System information

  • Language version (e.g. python 3.8, scala 2.12): scala 2.12, python 3.10
  • Spark Version (e.g. 3.2.3): 3.5.0
  • Spark Platform (e.g. Synapse, Databricks): Databricks (runtime version 14.3 LTS ML)

Describe the problem

In TrainUtils.scala, the getValidEvalResults method uses a single comparison function for determining whether a metric has "improved":

// For lower-is-better metrics (RMSE, MAE, etc.):
val cmp = (x: Double, y: Double, tol: Double) => x - y < tol

if (bestScores == null || cmp(evalScore, bestScore, improvementTolerance)) {
  // Update bestScore / bestIteration  ← counter RESET
} else if (iteration - bestIteration >= earlyStoppingRound) {
  // Trigger early stopping
}

The condition cmp(evalScore, bestScore, tol) evaluates to evalScore - bestScore < tol.

With the default improvementTolerance = 0.0, this correctly means "trigger only on strict improvement (evalScore < bestScore)". However, any positive value of improvementTolerance causes the counter to reset even when RMSE worsens by up to tol, making it harder to trigger early stopping — the opposite of the documented intent.

For example, improvementTolerance = 0.1 means RMSE must worsen by ≥ 0.1 in a single round for the non-improvement counter to increment. Since typical per-iteration RMSE changes are on the order of 0.001, the counter never increments and early stopping never fires regardless of earlyStoppingRound.

This is asymmetric with higher-is-better metrics (AUC), where the cmp is correctly x - y > tol — meaning improvement must exceed tol to reset the counter.

  • Expected behavior

For lower-is-better metrics, improvementTolerance = 0.1 should mean: "require a drop of at least 0.1 to count as improvement; if no such drop occurs for earlyStoppingRound rounds, stop."

This would make the semantics symmetric with higher-is-better metrics and match user expectations.

Code to reproduce issue

from synapse.ml.lightgbm import LightGBMRegressor
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = SparkSession.builder.getOrCreate()

# Create toy data with isValidation flag
data = [(float(i), float(i % 10), i > 800) for i in range(1000)]
df = spark.createDataFrame(data, ["label", "features_raw", "isValidation"])

from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(inputCols=["features_raw"], outputCol="features")
df = assembler.transform(df)

regressor = LightGBMRegressor(
    labelCol="label",
    featuresCol="features",
    numIterations=500,
    earlyStoppingRound=10,
    improvementTolerance=0.1,   # ← positive value intended as minimum improvement threshold
    validationIndicatorCol="isValidation",
    isProvideTrainingMetric=True,
    metric="rmse",
    verbosity=1
)

model = regressor.fit(df.filter(~F.col("isValidation")).union(df.filter(F.col("isValidation"))))

# Expected: training stops well before 500 iterations
# Actual:   trains all 500 iterations; model.getBoosterBestIteration() returns -1
print(f"Best iteration: {model.getBoosterBestIteration()}")  # -1 means never triggered

Worker logs will show best: None at completion, confirming early stopping never triggered despite earlyStoppingRound=10.

Other info / logs

No response

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions