SuccessChanges

Summary

  1. [SPARK-27626][K8S] Fix `docker-image-tool.sh` to be robust in non-bash (commit: 6071653404d62a9114203c517c3847541980e2a2) (details)
  2. [SPARK-27621][ML] Linear Regression - validate training related params (commit: 52daf4998b2f2ef29f6185ca39e97faad9e9402e) (details)
Commit 6071653404d62a9114203c517c3847541980e2a2 by dhyun
[SPARK-27626][K8S] Fix `docker-image-tool.sh` to be robust in non-bash
shell env
Although we use shebang `#!/usr/bin/env bash`, `minikube docker-env`
returns invalid commands in `non-bash` environment and causes failures
at `eval` because it only recognizes the default shell. We had better
add `--shell bash` option explicitly in our `bash` script.
```bash
$ bash -c 'eval $(minikube docker-env)' bash: line 0: set: -g: invalid
option set: usage: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg
...] bash: line 0: set: -g: invalid option set: usage: set
[-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...] bash: line 0: set:
-g: invalid option set: usage: set [-abefhkmnptuvxBCHP] [-o option-name]
[--] [arg ...] bash: line 0: set: -g: invalid option set: usage: set
[-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]
$ bash -c 'eval $(minikube docker-env --shell bash)'
```
Manual. Run the script with non-bash shell environment.
``` bin/docker-image-tool.sh -m -t testing build
```
Closes #24517 from dongjoon-hyun/SPARK-27626.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(cherry picked from commit 6c2d351f5466d42c4d227f5627bd3709c266b5ce)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6071653404d62a9114203c517c3847541980e2a2)
The file was modifiedbin/docker-image-tool.sh (diff)
Commit 52daf4998b2f2ef29f6185ca39e97faad9e9402e by sean.owen
[SPARK-27621][ML] Linear Regression - validate training related params
such as loss only during fitting phase
## What changes were proposed in this pull request?
When transform(...) method is called on a LinearRegressionModel created
directly with the coefficients and intercepts, the following exception
is encountered.
``` java.util.NoSuchElementException: Failed to find a default value for
loss
at
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
at
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:786)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at
org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
at
org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
at
org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192)
at
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
at
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
at
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
```
This is because validateAndTransformSchema() is called both during
training and scoring phases, but the checks against the training related
params like loss should really be performed during training phase only,
I think, please correct me if I'm missing anything :)
This issue was first reported for mleap
(https://github.com/combust/mleap/issues/455) because basically when we
serialize the Spark transformers for mleap, we only serialize the params
that are relevant for scoring. We do have the option to de-serialize the
serialized transformers back into Spark for scoring again, but in that
case, we no longer have all the training params.
## How was this patch tested? Added a unit test to check this scenario.
Please let me know if there's anything additional required, this is the
first PR that I've raised in this project.
Closes #24509 from ancasarb/linear_regression_params_fix.
Authored-by: asarb <asarb@expedia.com> Signed-off-by: Sean Owen
<sean.owen@databricks.com>
(cherry picked from commit 4241a72c654f13b6b4ceafb27daceb7bb553add6)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(commit: 52daf4998b2f2ef29f6185ca39e97faad9e9402e)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala (diff)