SuccessChanges

Summary

  1. [SPARK-25461][PYSPARK][SQL] Add document for mismatch between return (details)
  2. [SPARK-25658][SQL][TEST] Refactor HashByteArrayBenchmark to use main (details)
  3. [SPARK-25657][SQL][TEST] Refactor HashBenchmark to use main method (details)
  4. [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaking change (details)
Commit 3eb842969906d6e81a137af6dc4339881df0a315 by hyukjinkwon
[SPARK-25461][PYSPARK][SQL] Add document for mismatch between return
type of Pandas.Series and return type of pandas udf
## What changes were proposed in this pull request?
For Pandas UDFs, we get arrow type from defined Catalyst return data
type of UDFs. We use this arrow type to do serialization of data. If the
defined return data type doesn't match with actual return type of
Pandas.Series returned by Pandas UDFs, it has a risk to return incorrect
data from Python side.
Currently we don't have reliable approach to check if the data
conversion is safe or not. We leave some document to notify this to
users for now. When there is next upgrade of PyArrow available we can
use to check it, we should add the option to check it.
## How was this patch tested?
Only document change.
Closes #22610 from viirya/SPARK-25461.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by:
hyukjinkwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/sql/functions.py (diff)
Commit b1328cc58ebb73bc191de5546735cffe0c68255e by dongjoon
[SPARK-25658][SQL][TEST] Refactor HashByteArrayBenchmark to use main
method
## What changes were proposed in this pull request?
Refactor `HashByteArrayBenchmark` to use main method. 1. use
`spark-submit`:
```console bin/spark-submit --class
org.apache.spark.sql.HashByteArrayBenchmark --jars
./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar
./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
```
2. Generate benchmark result:
```console SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt
"catalyst/test:runMain org.apache.spark.sql.HashByteArrayBenchmark"
```
## How was this patch tested?
manual tests
Closes #22652 from wangyum/SPARK-25658.
Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming
Wang <yumwang@ebay.com> Co-authored-by: Dongjoon Hyun
<dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
The file was addedsql/catalyst/benchmarks/HashByteArrayBenchmark-results.txt
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/HashByteArrayBenchmark.scala (diff)
Commit 669ade3a8eed0016b5ece57d776cea0616417088 by dongjoon
[SPARK-25657][SQL][TEST] Refactor HashBenchmark to use main method
## What changes were proposed in this pull request?
Refactor `HashBenchmark` to use main method. 1. use `spark-submit`:
```console bin/spark-submit --class  org.apache.spark.sql.HashBenchmark
--jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar
./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
```
2. Generate benchmark result:
```console SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt
"catalyst/test:runMain org.apache.spark.sql.HashBenchmark"
```
## How was this patch tested? manual tests
Closes #22651 from wangyum/SPARK-25657.
Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming
Wang <yumwang@ebay.com> Co-authored-by: Dongjoon Hyun
<dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala (diff)
The file was addedsql/catalyst/benchmarks/HashBenchmark-results.txt
Commit ebd899b8a865395e6f1137163cb508086696879b by dongjoon
[SPARK-25321][ML] Revert SPARK-14681 to avoid API breaking change
## What changes were proposed in this pull request?
This is the same as #22492 but for master branch. Revert SPARK-14681 to
avoid API breaking changes.
cc: WeichenXu123
## How was this patch tested?
Existing unit tests.
Closes #22618 from mengxr/SPARK-25321.master.
Authored-by: WeichenXu <weichen.xu@databricks.com> Signed-off-by:
Dongjoon Hyun <dongjoon@apache.org>
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/Node.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala (diff)