SynapseML version
Reproducible on all versions from 0.9.x through 1.1.3 (latest) — TrainUtils.scala has not changed since Dec 2022
System information
- Language version (e.g. python 3.8, scala 2.12): scala 2.12, python 3.10
- Spark Version (e.g. 3.2.3): 3.5.0
- Spark Platform (e.g. Synapse, Databricks): Databricks (runtime version 14.3 LTS ML)
Describe the problem
In TrainUtils.scala, the getValidEvalResults method uses a single comparison function for determining whether a metric has "improved":
// For lower-is-better metrics (RMSE, MAE, etc.):
val cmp = (x: Double, y: Double, tol: Double) => x - y < tol
if (bestScores == null || cmp(evalScore, bestScore, improvementTolerance)) {
// Update bestScore / bestIteration ← counter RESET
} else if (iteration - bestIteration >= earlyStoppingRound) {
// Trigger early stopping
}
The condition cmp(evalScore, bestScore, tol) evaluates to evalScore - bestScore < tol.
With the default improvementTolerance = 0.0, this correctly means "trigger only on strict improvement (evalScore < bestScore)". However, any positive value of improvementTolerance causes the counter to reset even when RMSE worsens by up to tol, making it harder to trigger early stopping — the opposite of the documented intent.
For example, improvementTolerance = 0.1 means RMSE must worsen by ≥ 0.1 in a single round for the non-improvement counter to increment. Since typical per-iteration RMSE changes are on the order of 0.001, the counter never increments and early stopping never fires regardless of earlyStoppingRound.
This is asymmetric with higher-is-better metrics (AUC), where the cmp is correctly x - y > tol — meaning improvement must exceed tol to reset the counter.
For lower-is-better metrics, improvementTolerance = 0.1 should mean: "require a drop of at least 0.1 to count as improvement; if no such drop occurs for earlyStoppingRound rounds, stop."
This would make the semantics symmetric with higher-is-better metrics and match user expectations.
Code to reproduce issue
from synapse.ml.lightgbm import LightGBMRegressor
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
spark = SparkSession.builder.getOrCreate()
# Create toy data with isValidation flag
data = [(float(i), float(i % 10), i > 800) for i in range(1000)]
df = spark.createDataFrame(data, ["label", "features_raw", "isValidation"])
from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(inputCols=["features_raw"], outputCol="features")
df = assembler.transform(df)
regressor = LightGBMRegressor(
labelCol="label",
featuresCol="features",
numIterations=500,
earlyStoppingRound=10,
improvementTolerance=0.1, # ← positive value intended as minimum improvement threshold
validationIndicatorCol="isValidation",
isProvideTrainingMetric=True,
metric="rmse",
verbosity=1
)
model = regressor.fit(df.filter(~F.col("isValidation")).union(df.filter(F.col("isValidation"))))
# Expected: training stops well before 500 iterations
# Actual: trains all 500 iterations; model.getBoosterBestIteration() returns -1
print(f"Best iteration: {model.getBoosterBestIteration()}") # -1 means never triggered
Worker logs will show best: None at completion, confirming early stopping never triggered despite earlyStoppingRound=10.
Other info / logs
No response
What component(s) does this bug affect?
What language(s) does this bug affect?
What integration(s) does this bug affect?
SynapseML version
Reproducible on all versions from 0.9.x through 1.1.3 (latest) — TrainUtils.scala has not changed since Dec 2022
System information
Describe the problem
In TrainUtils.scala, the getValidEvalResults method uses a single comparison function for determining whether a metric has "improved":
The condition cmp(evalScore, bestScore, tol) evaluates to evalScore - bestScore < tol.
With the default improvementTolerance = 0.0, this correctly means "trigger only on strict improvement (evalScore < bestScore)". However, any positive value of improvementTolerance causes the counter to reset even when RMSE worsens by up to tol, making it harder to trigger early stopping — the opposite of the documented intent.
For example, improvementTolerance = 0.1 means RMSE must worsen by ≥ 0.1 in a single round for the non-improvement counter to increment. Since typical per-iteration RMSE changes are on the order of 0.001, the counter never increments and early stopping never fires regardless of earlyStoppingRound.
This is asymmetric with higher-is-better metrics (AUC), where the cmp is correctly x - y > tol — meaning improvement must exceed tol to reset the counter.
For lower-is-better metrics, improvementTolerance = 0.1 should mean: "require a drop of at least 0.1 to count as improvement; if no such drop occurs for earlyStoppingRound rounds, stop."
This would make the semantics symmetric with higher-is-better metrics and match user expectations.
Code to reproduce issue
Worker logs will show best: None at completion, confirming early stopping never triggered despite earlyStoppingRound=10.
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive: Cognitive projectarea/core: Core projectarea/deep-learning: DeepLearning projectarea/lightgbm: Lightgbm projectarea/opencv: Opencv projectarea/vw: VW projectarea/website: Websitearea/build: Project build systemarea/notebooks: Samples under notebooks folderarea/docker: Docker usagearea/models: models related issueWhat language(s) does this bug affect?
language/scala: Scala source codelanguage/python: Pyspark APIslanguage/r: R APIslanguage/csharp: .NET APIslanguage/new: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse: Azure Synapse integrationsintegrations/azureml: Azure ML integrationsintegrations/databricks: Databricks integrations