FailedChanges

Summary

  1. Spelling r common dev mlib external project streaming resource managers (commit: 13fd272) (details)
  2. [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin (commit: cf98a76) (details)
  3. [SPARK-33580][CORE] resolveDependencyPaths should use classifier (commit: 3650a6b) (details)
  4. [MINOR][SQL] Remove `getTables()` from `r.SQLUtils` (commit: bfe9380) (details)
  5. [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite (commit: ba178f8) (details)
  6. [SPARK-33590][DOCS][SQL] Add missing sub-bullets in Spark SQL Guide (commit: b94ff1e) (details)
  7. [SPARK-33587][CORE] Kill the executor on nested fatal errors (commit: c8286ec) (details)
  8. [SPARK-33588][SQL] Respect the `spark.sql.caseSensitive` config while (commit: 0054fc9) (details)
  9. [SPARK-33585][SQL][DOCS] Fix the comment for `SQLContext.tables()` and (commit: a088a80) (details)
Commit 13fd272cd353c8aa40a6030c4c847c2e2f632f68 by srowen
Spelling r common dev mlib external project streaming resource managers python

### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules:
* `R`
* `common`
* `dev`
* `mlib`
* `external`
* `project`
* `streaming`
* `resource-managers`
* `python`

Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618

NOTE: The misspellings have been reported at https://github.com/jsoref/spark/commit/706a726f87a0bbf5e31467fae9015218773db85b#commitcomment-44064356

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

There are various fixes to documentation, etc...

### How was this patch tested?

No testing was performed

Closes #30402 from jsoref/spelling-R_common_dev_mlib_external_project_streaming_resource-managers_python.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 13fd272)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/crypto/AuthEngineSuite.java (diff)
The file was modifiedpython/pyspark/mllib/evaluation.py (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtilsSuite.scala (diff)
The file was modifieddev/run-tests.py (diff)
The file was modifiedR/pkg/R/mllib_fpm.R (diff)
The file was modifiedR/pkg/R/streaming.R (diff)
The file was modifiedpython/pyspark/sql/pandas/_typing/protocols/series.pyi (diff)
The file was modifiedR/pkg/inst/worker/worker.R (diff)
The file was modifiedpython/pyspark/mllib/regression.py (diff)
The file was modifiedR/pkg/R/mllib_tree.R (diff)
The file was modifiedpython/pyspark/java_gateway.py (diff)
The file was modifiedpython/docs/source/_static/css/pyspark.css (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifieddev/create-release/known_translations (diff)
The file was modifieddev/appveyor-guide.md (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala (diff)
The file was modifiedpython/pyspark/shuffle.py (diff)
The file was modifiedpython/pyspark/rdd.py (diff)
The file was modifiedresource-managers/yarn/src/test/java/org/apache/hadoop/net/ServerSocketUtil.java (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedR/pkg/R/context.R (diff)
The file was modifiedR/pkg/tests/fulltests/test_utils.R (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/protocol/MessageWithHeaderSuite.java (diff)
The file was modifiedpython/pyspark/sql/pandas/_typing/protocols/frame.pyi (diff)
The file was modifiedR/pkg/R/utils.R (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala (diff)
The file was modifiedR/pkg/R/RDD.R (diff)
The file was modifiedpython/pyspark/streaming/context.py (diff)
The file was modifiedR/pkg/R/install.R (diff)
The file was modifiedpython/pyspark/tests/test_context.py (diff)
The file was modifieddev/create-release/releaseutils.py (diff)
The file was modifiedpython/pyspark/mllib/stat/_statistics.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedpython/docs/source/index.rst (diff)
The file was modifieddev/create-release/release-build.sh (diff)
The file was modifiedpython/pyspark/sql/pandas/functions.py (diff)
The file was modifiedpython/pyspark/mllib/clustering.py (diff)
The file was modifiedR/pkg/R/column.R (diff)
The file was modifiedpython/docs/source/development/testing.rst (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_jvm_api.R (diff)
The file was modifiedR/pkg/R/deserialize.R (diff)
The file was modifiedpython/docs/source/getting_started/quickstart.ipynb (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/server/OneForOneStreamManagerSuite.java (diff)
The file was modifiedpython/pyspark/__init__.pyi (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java (diff)
The file was modifiedpython/pyspark/cloudpickle/cloudpickle_fast.py (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedR/pkg/R/SQLContext.R (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_grouped_map.py (diff)
The file was modifiedR/pkg/vignettes/sparkr-vignettes.Rmd (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/util/TransportFrameDecoderSuite.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java (diff)
The file was modifieddev/github_jira_sync.py (diff)
The file was modifiedR/pkg/inst/worker/daemon.R (diff)
The file was modifiedcommon/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java (diff)
The file was modifiedpython/docs/source/_templates/autosummary/class.rst (diff)
The file was modifieddev/run-tests-jenkins.py (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala (diff)
The file was modifiedpython/docs/source/getting_started/install.rst (diff)
The file was modifiedpython/pyspark/ml/regression.pyi (diff)
The file was modifiedpython/pyspark/ml/regression.py (diff)
The file was modifieddev/tests/pr_merge_ability.sh (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/util/HdfsUtils.scala (diff)
The file was modifiedcommon/unsafe/src/test/scala/org/apache/spark/unsafe/types/UTF8StringPropertyCheckSuite.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_Serde.R (diff)
The file was modifiedpython/pyspark/ml/tests/test_image.py (diff)
The file was modifiedR/CRAN_RELEASE.md (diff)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedpython/pyspark/context.py (diff)
The file was modifiedR/pkg/R/types.R (diff)
The file was modifiedpython/pyspark/sql/tests/test_udf.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_algorithms.py (diff)
The file was modifiedpython/pyspark/sql/utils.py (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala (diff)
The file was modifiedpython/pyspark/sql/column.py (diff)
The file was modifiedR/install-dev.bat (diff)
The file was modifiedR/pkg/R/functions.R (diff)
The file was modifiedstreaming/src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java (diff)
The file was modifiedR/pkg/R/pairRDD.R (diff)
The file was modifiedpython/pyspark/worker.py (diff)
The file was modifiedR/pkg/R/DataFrame.R (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala (diff)
The file was modifiedpython/pyspark/mllib/tests/test_streaming_algorithms.py (diff)
The file was modifiedpython/pyspark/resource/requests.py (diff)
The file was modifiedpython/pyspark/cloudpickle/cloudpickle.py (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/rdd/MapWithStateRDDSuite.scala (diff)
The file was modifiedR/pkg/R/WindowSpec.R (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/SimpleDownloadFile.java (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/MapWithStateSuite.scala (diff)
The file was modifieddev/create-release/translate-contributors.py (diff)
The file was modifiedpython/docs/source/development/debugging.rst (diff)
The file was modifiedR/pkg/R/mllib_utils.R (diff)
The file was modifiedpython/test_support/userlibrary.py (diff)
The file was modifieddev/tests/pr_public_classes.sh (diff)
The file was modifiedpython/pyspark/ml/feature.py (diff)
Commit cf98a761de677c733f3c33230e1c63ddb785d5c5 by yamamuro
[SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationSuite

### What changes were proposed in this pull request?

This PR changes mariadb_docker_entrypoint.sh to set the proper version automatically for mariadb-plugin-gssapi-server.
The proper version is based on the one of mariadb-server.
Also, this PR enables to use arbitrary docker image by setting the environment variable `MARIADB_CONTAINER_IMAGE_NAME`.

### Why are the changes needed?

For `MariaDBKrbIntegrationSuite`, the version of `mariadb-plugin-gssapi-server` is currently set to `10.5.5` in `mariadb_docker_entrypoint.sh` but it's no longer available in the official apt repository and `MariaDBKrbIntegrationSuite` doesn't pass for now.
It seems that only the most recent three versions are available for each major version and they are `10.5.6`, `10.5.7` and `10.5.8` for now.
Further, the release cycle of MariaDB seems to be very rapid (1 ~ 2 months) so I don't think it's a good idea to set to an specific version for `mariadb-plugin-gssapi-server`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Confirmed that `MariaDBKrbIntegrationSuite` passes with the following commands.
```
$  build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"
```
In this case, we can see what version of `mariadb-plugin-gssapi-server` is going to be installed in the following container log message.
```
Installing mariadb-plugin-gssapi-server=1:10.5.8+maria~focal
```

Or, we can set MARIADB_CONTAINER_IMAGE_NAME for a specific version of MariaDB.
```
$ MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.6 build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"
```
```
Installing mariadb-plugin-gssapi-server=1:10.5.6+maria~focal
```

Closes #30515 from sarutak/fix-MariaDBKrbIntegrationSuite.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: cf98a76)
The file was modifiedexternal/docker-integration-tests/src/test/resources/mariadb_docker_entrypoint.sh (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala (diff)
Commit 3650a6bd97b9cecf382f96a55a97ff56b75471cd by dongjoon
[SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

### What changes were proposed in this pull request?

This patch proposes to use classifier attribute to construct artifact path instead of type.

### Why are the changes needed?

`resolveDependencyPaths` now takes artifact type to decide to add "-tests" postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is `[organization]_[artifact][revision](-[classifier]).[ext]`. We should use classifier instead of type to construct file path.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test. Manual test.

Closes #30524 from viirya/SPARK-33580.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 3650a6b)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
Commit bfe9380ba2bc9762ccfaa36d3ed938867c143876 by dongjoon
[MINOR][SQL] Remove `getTables()` from `r.SQLUtils`

### What changes were proposed in this pull request?
Remove the unused method `getTables()` from `r.SQLUtils`. The method was used before the changes https://github.com/apache/spark/pull/17483 but R's `tables.default` was rewritten using `listTables()`: https://github.com/apache/spark/pull/17483/files#diff-2c01472a7bcb1d318244afcd621d726e00d36cd15dffe7e44fa96c54fce4cd9aR220-R223

### Why are the changes needed?
To improve code maintenance, and remove the dead code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By R tests.

Closes #30527 from MaxGekk/remove-getTables-in-r-SQLUtils.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: bfe9380)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala (diff)
Commit ba178f852f8e4b11a243d907ac204b30a60369b5 by yumwang
[SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

### What changes were proposed in this pull request?

This pr refactor HivePartitionFilteringSuite.

### Why are the changes needed?

To make it easy to maintain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #30525 from wangyum/SPARK-33581.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: ba178f8)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala (diff)
Commit b94ff1e870152ac692c6f1ebf3d110caa274ebb2 by dongjoon
[SPARK-33590][DOCS][SQL] Add missing sub-bullets in Spark SQL Guide

### What changes were proposed in this pull request?

Add the missing sub-bullets in the left side of `Spark SQL Guide`

### Why are the changes needed?

The three sub-bullets in the left side is not consistent with the contents (five bullets) in the right side.

![image](https://user-images.githubusercontent.com/1315079/100546388-7a21e880-32a4-11eb-922d-62a52f4f9f9b.png)

### Does this PR introduce _any_ user-facing change?

Yes, you can see more lines in the left menu.

### How was this patch tested?

Manually build the doc as follows. This can be verified as attached:

```
cd docs
SKIP_API=1 jekyll build
firefox _site/sql-pyspark-pandas-with-arrow.html
```

![image](https://user-images.githubusercontent.com/1315079/100546399-8ad25e80-32a4-11eb-80ac-44af0aebc717.png)

Closes #30537 from kiszk/SPARK-33590.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: b94ff1e)
The file was modifieddocs/_data/menu-sql.yaml (diff)
Commit c8286ec41616909f1f6e452ce63f0e7605d5bc63 by dongjoon
[SPARK-33587][CORE] Kill the executor on nested fatal errors

### What changes were proposed in this pull request?

Currently we will kill the executor when hitting a fatal error. However, if the fatal error is wrapped by another exception, such as
- java.util.concurrent.ExecutionException, com.google.common.util.concurrent.UncheckedExecutionException, com.google.common.util.concurrent.ExecutionError when using Guava cache or Java thread pool.
- SparkException thrown from https://github.com/apache/spark/blob/cf98a761de677c733f3c33230e1c63ddb785d5c5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L231 or https://github.com/apache/spark/blob/cf98a761de677c733f3c33230e1c63ddb785d5c5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L296

We will still keep the executor running. Fatal errors are usually unrecoverable (such as OutOfMemoryError), some components may be in a broken state when hitting a fatal error and it's hard to predicate the behaviors of a broken component. Hence, it's better to detect the nested fatal error as well and kill the executor. Then we can rely on Spark's fault tolerance to recover.

### Why are the changes needed?

Fatal errors are usually unrecoverable (such as OutOfMemoryError), some components may be in a broken state when hitting a fatal error and it's hard to predicate the behaviors of a broken component. Hence, it's better to detect the nested fatal error as well and kill the executor. Then we can rely on Spark's fault tolerance to recover.

### Does this PR introduce _any_ user-facing change?

Yep. There is a slight internal behavior change on when to kill an executor. We will kill the executor when detecting a nested fatal error in the exception chain. `spark.executor.killOnFatalError.depth` is added to allow users to turn off this change if the slight behavior change impacts them.

### How was this patch tested?

The new method `Executor.isFatalError` is tested by `spark.executor.killOnNestedFatalError`.

Closes #30528 from zsxwing/SPARK-33587.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: c8286ec)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/Executor.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala (diff)
Commit 0054fc937f804660c6501d9d3f6319f3047a68f8 by dongjoon
[SPARK-33588][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED`

### What changes were proposed in this pull request?
Perform partition spec normalization in `ShowTablesCommand` according to the table schema before getting partitions from the catalog. The normalization via `PartitioningUtils.normalizePartitionSpec()` adjusts the column names in partition specification, w.r.t. the real partition column names and case sensitivity.

### Why are the changes needed?
Even when `spark.sql.caseSensitive` is `false` which is the default value, v1 `SHOW TABLE EXTENDED` is case sensitive:
```sql
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
         > USING parquet
         > partitioned by (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`';
```

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, the `SHOW TABLE EXTENDED` command respects the SQL config. And for example above, it returns correct result:
```sql
spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
default tbl1 false Partition Values: [year=2015, month=1]
Location: file:/Users/maximgekk/spark-warehouse/tbl1/year=2015/month=1
Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties: [serialization.format=1, path=file:/Users/maximgekk/spark-warehouse/tbl1]
Partition Parameters: {transient_lastDdlTime=1606595118, totalSize=623, numFiles=1}
Created Time: Sat Nov 28 23:25:18 MSK 2020
Last Access: UNKNOWN
Partition Statistics: 623 bytes
```

### How was this patch tested?
By running the modified test suite `v1/ShowTablesSuite`

Closes #30529 from MaxGekk/show-table-case-sensitive-spec.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 0054fc9)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-tables.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala (diff)
Commit a088a801ed8c17171545c196a3f26ce415de0cd1 by dongjoon
[SPARK-33585][SQL][DOCS] Fix the comment for `SQLContext.tables()` and mention the `database` column

### What changes were proposed in this pull request?
Change the comments for `SQLContext.tables()` to "The returned DataFrame has three columns, database, tableName and isTemporary".

### Why are the changes needed?
Currently, the comment mentions only 2 columns but `tables()` returns 3 columns actually:
```scala
scala> spark.range(10).createOrReplaceTempView("view1")
scala> val tables = spark.sqlContext.tables()
tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string ... 1 more field]

scala> tables.printSchema
root
|-- database: string (nullable = false)
|-- tableName: string (nullable = false)
|-- isTemporary: boolean (nullable = false)

scala> tables.show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|       t1|      false|
| default|       t2|      false|
| default|      ymd|      false|
|        |    view1|       true|
+--------+---------+-----------+
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./dev/scalastyle`

Closes #30526 from MaxGekk/sqlcontext-tables-doc.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: a088a80)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala (diff)