SuccessChanges

Summary

  1. [SPARK-24412][SQL] Adding docs about automagical type casting in `isin` (commit: 36a3409134687d6a2894cd6a77554b8439cacec1) (details)
  2. [SPARK-24468][SQL] Handle negative scale when adjusting precision for (commit: f07c5064a3967cdddf57c2469635ee50a26d864c) (details)
  3. [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from (commit: 3e5b4ae63a468858ff8b9f7f3231cc877846a0af) (details)
  4. [SPARK-19826][ML][PYTHON] add spark.ml Python API for PIC (commit: a99d284c16cc4e00ce7c83ecdc3db6facd467552) (details)
  5. [MINOR][CORE] Log committer class used by HadoopMapRedCommitProtocol (commit: 9b6f24202f6f8d9d76bbe53f379743318acb19f9) (details)
  6. [SPARK-24520] Double braces in documentations (commit: 2dc047a3189290411def92f6d7e9a4e01bdb2c30) (details)
  7. [SPARK-24134][DOCS] A missing full-stop in doc "Tuning Spark". (commit: f5af86ea753c446df59a0a8c16c685224690d633) (details)
  8. [SPARK-22144][SQL] ExchangeCoordinator combine the partitions of an 0 (commit: 048197749ef990e4def1fcbf488f3ded38d95cae) (details)
  9. [SPARK-23732][DOCS] Fix source links in generated scaladoc. (commit: dc22465f3e1ef5ad59306b1f591d6fd16d674eb7) (details)
  10. [SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite (commit: 01452ea9c75ff027ceeb8314368c6bbedefdb2bf) (details)
  11. docs: fix typo (commit: 1d7db65e968de1c601e7f8b1ec9bc783ef2dbd01) (details)
  12. [SPARK-15064][ML] Locale support in StopWordsRemover (commit: 5d6a53d9831cc1e2115560db5cebe0eea2565dcd) (details)
  13. [SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in (commit: 2824f1436bb0371b7216730455f02456ef8479ce) (details)
  14. [SPARK-24416] Fix configuration specification for killBlacklisted (commit: 3af1d3e6d95719e15a997877d5ecd3bb40c08b9c) (details)
  15. [SPARK-23931][SQL] Adds arrays_zip function to sparksql (commit: f0ef1b311dd5399290ad6abe4ca491bdb13478f0) (details)
  16. Don't use RxJava (commit: 108181d8190b7bc15eb80caeceff2ce8103284a4) (details)
  17. [SPARK-24216][SQL] Spark TypedAggregateExpression uses getSimpleName (commit: cc88d7fad16e8b5cbf7b6b9bfe412908782b4a45) (details)
Commit 36a3409134687d6a2894cd6a77554b8439cacec1 by d_tsai
[SPARK-24412][SQL] Adding docs about automagical type casting in `isin`
and `isInCollection` APIs
## What changes were proposed in this pull request? Update documentation
for `isInCollection` API to clealy explain the "auto-casting" of
elements if their types are different.
## How was this patch tested? No-Op
Author: Thiruvasakan Paramasivan <thiru@apple.com>
Closes #21519 from trvskn/sql-doc-update.
(commit: 36a3409134687d6a2894cd6a77554b8439cacec1)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Column.scala (diff)
Commit f07c5064a3967cdddf57c2469635ee50a26d864c by wenchen
[SPARK-24468][SQL] Handle negative scale when adjusting precision for
decimal operations
## What changes were proposed in this pull request?
In SPARK-22036 we introduced the possibility to allow precision loss in
arithmetic operations (according to the SQL standard). The
implementation was drawn from Hive's one, where Decimals with a negative
scale are not allowed in the operations.
The PR handles the case when the scale is negative, removing the
assertion that it is not.
## How was this patch tested?
added UTs
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #21499 from mgaido91/SPARK-24468.
(commit: f07c5064a3967cdddf57c2469635ee50a26d864c)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/decimalArithmeticOperations.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalArithmeticOperations.sql (diff)
Commit 3e5b4ae63a468858ff8b9f7f3231cc877846a0af by hyukjinkwon
[SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from
driver to executor
## What changes were proposed in this pull request? SPARK-23754 was
fixed in #21383 by changing the UDF code to wrap the user function, but
this required a hack to save its argspec. This PR reverts this change
and fixes the `StopIteration` bug in the worker
## How does this work?
The root of the problem is that when an user-supplied function raises a
`StopIteration`, pyspark might stop processing data, if this function is
used in a for-loop. The solution is to catch `StopIteration`s exceptions
and re-raise them as `RuntimeError`s, so that the execution fails and
the error is reported to the user. This is done using the
`fail_on_stopiteration` wrapper, in different ways depending on where
the function is used:
- In RDDs, the user function is wrapped in the driver, because this
function is also called in the driver itself.
- In SQL UDFs, the function is wrapped in the worker, since all
processing happens there. Moreover, the worker needs the signature of
the user function, which is lost when wrapping it, but passing this
signature to the worker requires a not so nice hack.
## How was this patch tested?
Same tests, plus tests for pandas UDFs
Author: edorigatti <emilio.dorigatti@gmail.com>
Closes #21467 from e-dorigatti/fix_udf_hack.
(commit: 3e5b4ae63a468858ff8b9f7f3231cc877846a0af)
The file was modifiedpython/pyspark/sql/udf.py (diff)
The file was modifiedpython/pyspark/worker.py (diff)
The file was modifiedpython/pyspark/util.py (diff)
The file was modifiedpython/pyspark/tests.py (diff)
The file was modifiedpython/pyspark/sql/tests.py (diff)
Commit a99d284c16cc4e00ce7c83ecdc3db6facd467552 by meng
[SPARK-19826][ML][PYTHON] add spark.ml Python API for PIC
## What changes were proposed in this pull request?
add spark.ml Python API for PIC
## How was this patch tested?
add doctest
Author: Huaxin Gao <huaxing@us.ibm.com>
Closes #21513 from huaxingao/spark--19826.
(commit: a99d284c16cc4e00ce7c83ecdc3db6facd467552)
The file was modifiedpython/pyspark/ml/clustering.py (diff)
Commit 9b6f24202f6f8d9d76bbe53f379743318acb19f9 by srowen
[MINOR][CORE] Log committer class used by HadoopMapRedCommitProtocol
## What changes were proposed in this pull request?
When HadoopMapRedCommitProtocol is used (e.g., when using
saveAsTextFile() or saveAsHadoopFile() with RDDs), it's not easy to
determine which output committer class was used, so this PR simply logs
the class that was used, similarly to what is done in
SQLHadoopMapReduceCommitProtocol.
## How was this patch tested?
Built Spark then manually inspected logging when calling
saveAsTextFile():
```scala scala> sc.setLogLevel("INFO") scala>
sc.textFile("README.md").saveAsTextFile("/tmp/out")
... 18/05/29 10:06:20 INFO HadoopMapRedCommitProtocol: Using output
committer class org.apache.hadoop.mapred.FileOutputCommitter
```
Author: Jonathan Kelly <jonathak@amazon.com>
Closes #21452 from ejono/master.
(commit: 9b6f24202f6f8d9d76bbe53f379743318acb19f9)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/io/HadoopMapRedCommitProtocol.scala (diff)
Commit 2dc047a3189290411def92f6d7e9a4e01bdb2c30 by srowen
[SPARK-24520] Double braces in documentations
There are double braces in the markdown, which break the link.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot;
otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
Author: Fokko Driesprong <fokkodriesprong@godatadriven.com>
Closes #21528 from Fokko/patch-1.
(commit: 2dc047a3189290411def92f6d7e9a4e01bdb2c30)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
Commit f5af86ea753c446df59a0a8c16c685224690d633 by srowen
[SPARK-24134][DOCS] A missing full-stop in doc "Tuning Spark".
## What changes were proposed in this pull request?
In the document [Tuning Spark -> Determining Memory
Consumption](https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption),
a full stop was missing in the second paragraph.
It's `...use SizeEstimator’s estimate method This is useful for
experimenting...`, while there is supposed to be a full stop before
`This`.
Screenshot showing before change is attached below.
<img width="1033" alt="screen shot 2018-05-01 at 5 22 32 pm"
src="https://user-images.githubusercontent.com/11539188/39468206-778e3d8a-4d64-11e8-8a92-38464952b54b.png">
## How was this patch tested?
This is a simple change in doc. Only one full stop was added in plain
text.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
Author: Xiaodong <11539188+XD-DENG@users.noreply.github.com>
Closes #21205 from XD-DENG/patch-1.
(commit: f5af86ea753c446df59a0a8c16c685224690d633)
The file was modifieddocs/tuning.md (diff)
Commit 048197749ef990e4def1fcbf488f3ded38d95cae by wenchen
[SPARK-22144][SQL] ExchangeCoordinator combine the partitions of an 0
sized pre-shuffle to 0
## What changes were proposed in this pull request? when the length of
pre-shuffle's partitions is 0, the length of post-shuffle's partitions
should be 0 instead of spark.sql.shuffle.partitions.
## How was this patch tested? ExchangeCoordinator converted a
pre-shuffle that partitions is 0 to a post-shuffle that partitions is 0
instead of one that partitions is spark.sql.shuffle.partitions.
Author: liutang123 <liutang123@yeah.net>
Closes #19364 from liutang123/SPARK-22144.
(commit: 048197749ef990e4def1fcbf488f3ded38d95cae)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala (diff)
Commit dc22465f3e1ef5ad59306b1f591d6fd16d674eb7 by hyukjinkwon
[SPARK-23732][DOCS] Fix source links in generated scaladoc.
Apply the suggestion on the bug to fix source links. Tested with the
2.3.1 release docs.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #21521 from vanzin/SPARK-23732.
(commit: dc22465f3e1ef5ad59306b1f591d6fd16d674eb7)
The file was modifiedproject/SparkBuild.scala (diff)
Commit 01452ea9c75ff027ceeb8314368c6bbedefdb2bf by wenchen
[SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite
## What changes were proposed in this pull request?
`UnsafeRowSerializerSuite` calls `UnsafeProjection.create` which
accesses `SQLConf.get`, while the current active SparkSession may
already be stopped, and we may hit exception like this
``` sbt.ForkMain$ForkError: java.lang.IllegalStateException:
LiveListenerBus is stopped.
at
org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
at
org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
at
org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:93)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120)
at scala.Option.getOrElse(Option.scala:121)
at
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:120)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:119)
at
org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
at
org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
at
org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
at
org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95)
at
org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95)
at scala.Option.map(Option.scala:146)
at
org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:95)
at
org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:94)
at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:126)
at
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:54)
at
org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:157)
at
org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:150)
at
org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$unsafeRowConverter(UnsafeRowSerializerSuite.scala:54)
at
org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$toUnsafeRow(UnsafeRowSerializerSuite.scala:49)
at
org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:63)
at
org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:60)
...
```
## How was this patch tested?
N/A
Author: Wenchen Fan <wenchen@databricks.com>
Closes #21518 from cloud-fan/test.
(commit: 01452ea9c75ff027ceeb8314368c6bbedefdb2bf)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeRowSerializerSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/LocalSparkSession.scala (diff)
Commit 1d7db65e968de1c601e7f8b1ec9bc783ef2dbd01 by srowen
docs: fix typo
no => no[t]
## What changes were proposed in this pull request?
Fixing a typo.
## How was this patch tested?
Visual check of the docs.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
Author: Tom Saleeba <tom.saleeba@gmail.com>
Closes #21496 from tomsaleeba/patch-1.
(commit: 1d7db65e968de1c601e7f8b1ec9bc783ef2dbd01)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Column.scala (diff)
Commit 5d6a53d9831cc1e2115560db5cebe0eea2565dcd by meng
[SPARK-15064][ML] Locale support in StopWordsRemover
## What changes were proposed in this pull request?
Add locale support for `StopWordsRemover`.
## How was this patch tested?
[Scala|Python] unit tests.
Author: Lee Dongjin <dongjin@apache.org>
Closes #21501 from dongjinleekr/feature/SPARK-15064.
(commit: 5d6a53d9831cc1e2115560db5cebe0eea2565dcd)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala (diff)
The file was modifiedpython/pyspark/ml/tests.py (diff)
The file was modifiedpython/pyspark/ml/feature.py (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala (diff)
Commit 2824f1436bb0371b7216730455f02456ef8479ce by gatorsmile
[SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in
HiveExternalCatalogVersionsSuite
## What changes were proposed in this pull request?
Removing version 2.2.0 from testing versions in
HiveExternalCatalogVersionsSuite as it is not present anymore in the
mirrors and this is blocking all the open PRs.
## How was this patch tested?
running UTs
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #21540 from mgaido91/SPARK-24531.
(commit: 2824f1436bb0371b7216730455f02456ef8479ce)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala (diff)
Commit 3af1d3e6d95719e15a997877d5ecd3bb40c08b9c by irashid
[SPARK-24416] Fix configuration specification for killBlacklisted
executors
## What changes were proposed in this pull request?
spark.blacklist.killBlacklistedExecutors is defined as
(Experimental) If set to "true", allow Spark to automatically kill, and
attempt to re-create, executors when they are blacklisted. Note that,
when an entire node is added to the blacklist, all of the executors on
that node will be killed.
I presume the killing of blacklisted executors only happens after the
stage completes successfully and all tasks have completed or on fetch
failures
(updateBlacklistForFetchFailure/updateBlacklistForSuccessfulTaskSet). It
is confusing because the definition states that the executor will be
attempted to be recreated as soon as it is blacklisted. This is not true
while the stage is in progress and an executor is blacklisted, it will
not attempt to cleanup until the stage finishes.
Author: Sanket Chintapalli <schintap@yahoo-inc.com>
Closes #21475 from redsanket/SPARK-24416.
(commit: 3af1d3e6d95719e15a997877d5ecd3bb40c08b9c)
The file was modifieddocs/configuration.md (diff)
Commit f0ef1b311dd5399290ad6abe4ca491bdb13478f0 by ueshin
[SPARK-23931][SQL] Adds arrays_zip function to sparksql
Signed-off-by: DylanGuedes <djmgguedesgmail.com>
## What changes were proposed in this pull request?
Addition of arrays_zip function to spark sql functions.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests) Unit tests that checks if the results are correct.
Author: DylanGuedes <djmgguedes@gmail.com>
Closes #21045 from DylanGuedes/SPARK-23931.
(commit: f0ef1b311dd5399290ad6abe4ca491bdb13478f0)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala (diff)
The file was modifiedpom.xml (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/pom.xml (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala (diff)
Commit cc88d7fad16e8b5cbf7b6b9bfe412908782b4a45 by wenchen
[SPARK-24216][SQL] Spark TypedAggregateExpression uses getSimpleName
that is not safe in scala
## What changes were proposed in this pull request?
When user create a aggregator object in scala and pass the aggregator to
Spark Dataset's agg() method, Spark's will initialize
TypedAggregateExpression with the nodeName field as
aggregator.getClass.getSimpleName. However, getSimpleName is not safe in
scala environment, depending on how user creates the aggregator object.
For example, if the aggregator class full qualified name is
"com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw
java.lang.InternalError "Malformed class name". This has been reported
in scalatest https://github.com/scalatest/scalatest/pull/1044 and
discussed in many scala upstream jiras such as SI-8110, SI-5425.
To fix this issue, we follow the solution in
https://github.com/scalatest/scalatest/pull/1044 to add safer version of
getSimpleName as a util method, and TypedAggregateExpression will invoke
this util method rather than getClass.getSimpleName.
## How was this patch tested? added unit test
Author: Fangshi Li <fli@linkedin.com>
Closes #21276 from fangshil/SPARK-24216.
(commit: cc88d7fad16e8b5cbf7b6b9bfe412908782b4a45)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/UtilsSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/AccumulatorV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StringFormat.scala (diff)