SuccessChanges

Summary

  1. [SPARK-28373][DOCS][WEBUI] JDBC/ODBC Server Tab (commit: d334fee502fa85cfc6ef1b7f193fa3a0ff0b244e) (details)
  2. [SPARK-29080][CORE][SPARKR] Support R file extension case-insensitively (commit: 956f6e988cf83cf68f6cea214b3d9045920bca55) (details)
  3. [SPARK-29081][CORE] Replace calls to SerializationUtils.clone on (commit: 8c0e961f6c5fe0da3b36a6fe642c12f88ac34d0f) (details)
  4. [SPARK-29087][CORE][STREAMING] Use DelegatingServletContextHandler to (commit: 729b3180bcbaa5289cb9a5848a3cce9010e75515) (details)
  5. [SPARK-28471][SQL][DOC][FOLLOWUP] Fix year patterns in the comments of (commit: 1b7afc0c986ed8e5431df351f51434424460f4b3) (details)
  6. [SPARK-29046][SQL] Fix NPE in SQLConf.get when active SparkContext is (commit: 61e5aebce3925e7c512899939688e0eee4ac8a06) (details)
  7. [SPARK-28856][FOLLOW-UP][SQL][TEST] Add the `namespaces` keyword to (commit: b91648cfd0d7ab7014a137cdb61d8dbb3611438c) (details)
  8. [SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration (commit: 7d4eb38bbcc887fb61ba7344df3f77a046ad77f8) (details)
  9. [SPARK-29069][SQL] ResolveInsertInto should not do table lookup (commit: 1b99d0cca4b4fb6d193091f92c46c916b70cd84e) (details)
  10. [SPARK-28932][BUILD][FOLLOWUP] Switch to scala-library compile (commit: 471a3eff514480cfcbda79bde9294408cc8eb125) (details)
  11. [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen (commit: 6297287dfa6e9d30141728c931ed58c8c4966851) (details)
  12. [SPARK-29072][CORE] Put back usage of TimeTrackingOutputStream for (commit: 67751e26940a16ab6f9950ae66a46b7cb901c102) (details)
  13. [SPARK-26929][SQL] fix table owner use user instead of principal when (commit: 5881871ca5156ef0e83c9503d5eac288320147c3) (details)
  14. [SPARK-23539][SS][FOLLOWUP][TESTS] Add UT to ensure existing query (commit: 88c8d5eed2bf26ec4cc6ef68d9bdabbcb7ba1b83) (details)
  15. [SPARK-29008][SQL] Define an individual method for each common (commit: 95073fb62b646c3e8394941c5835a396b9d48c0f) (details)
  16. [SPARK-29100][SQL] Fix compilation error in codegen with switch from (commit: dffd92e9779021fa7df2ec962c9cd07e0dbfc2f3) (details)
  17. [SPARK-22797][ML][PYTHON] Bucketizer support multi-column (commit: 4d27a259087258492d0a66ca1ace7ef584c72a6f) (details)
  18. [SPARK-28996][SQL][TESTS] Add tests regarding username of HiveClient (commit: c8628354b7d2e6116b2a6eb3bdb2fc957c91fd03) (details)
  19. [SPARK-29074][SQL] Optimize `date_format` for foldable `fmt` (commit: db996ccad91bbd7db412b1363641820784ce77bc) (details)
  20. [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG (commit: 79b10a1aab9be6abdf749ad94c88234ace8ba34a) (details)
  21. [SPARK-28483][FOLLOW-UP] Fix flaky test in BarrierTaskContextSuite (commit: 104b9b6f8c93f341bda043852aa61ea2a1d2e21b) (details)
  22. [SPARK-29104][CORE][TESTS] Fix PipedRDDSuite to use `eventually` to (commit: 34915b22ab174a45c563ccdcd5035299f3ccc56c) (details)
  23. [SPARK-28950][SQL] Refine the code of DELETE (commit: 3fc52b5557b4608d8f0ce26d11c1ca3e24c157a2) (details)
Commit d334fee502fa85cfc6ef1b7f193fa3a0ff0b244e by gatorsmile
[SPARK-28373][DOCS][WEBUI] JDBC/ODBC Server Tab
### What changes were proposed in this pull request? New documentation
to explain in detail JDBC/ODBC server tab. New images are included to
better explanation.
![image](https://user-images.githubusercontent.com/12819544/64735402-c4287e00-d4e8-11e9-9366-c8ac0fbfc058.png)
![image](https://user-images.githubusercontent.com/12819544/64735429-cee31300-d4e8-11e9-83f1-0b662037e194.png)
### Does this PR introduce any user-facing change? Only documentation
### How was this patch tested? I have generated it using "jekyll build"
to ensure that it's ok
Closes #25718 from planga82/SPARK-28373_JDBCServerPage.
Lead-authored-by: Pablo Langa <soypab@gmail.com> Co-authored-by: Unknown
<soypab@gmail.com> Co-authored-by: Pablo <soypab@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
(commit: d334fee502fa85cfc6ef1b7f193fa3a0ff0b244e)
The file was addeddocs/img/JDBCServer1.png
The file was addeddocs/img/JDBCServer3.png
The file was modifieddocs/web-ui.md (diff)
The file was addeddocs/img/JDBCServer2.png
Commit 956f6e988cf83cf68f6cea214b3d9045920bca55 by dhyun
[SPARK-29080][CORE][SPARKR] Support R file extension case-insensitively
### What changes were proposed in this pull request?
Make r file extension check case insensitive for spark-submit.
### Why are the changes needed?
spark-submit does not accept `.r` files as R scripts. Some codebases
have r files that end with lowercase file extensions. It is inconvenient
to use spark-submit with lowercase extension R files. The error is not
very clear
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L232).
```
$ ./bin/spark-submit examples/src/main/r/dataframe.r Exception in thread
"main" org.apache.spark.SparkException: Cannot load main class from JAR
file:/Users/dongjoon/APACHE/spark-release/spark-2.4.4-bin-hadoop2.7/examples/src/main/r/dataframe.r
```
### Does this PR introduce any user-facing change?
Yes. spark-submit can now be used to run R scripts with `.r` file
extension.
### How was this patch tested?
Manual.
```
$ mv examples/src/main/r/dataframe.R examples/src/main/r/dataframe.r
$ ./bin/spark-submit examples/src/main/r/dataframe.r
```
Closes #25778 from Loquats/r-case.
Authored-by: Andy Zhang <yue.zhang@databricks.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: 956f6e988cf83cf68f6cea214b3d9045920bca55)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala (diff)
Commit 8c0e961f6c5fe0da3b36a6fe642c12f88ac34d0f by dhyun
[SPARK-29081][CORE] Replace calls to SerializationUtils.clone on
properties with a faster implementation
Replace use of `SerializationUtils.clone` with new
`Utils.cloneProperties` method Add benchmark + results showing dramatic
speed up for effectively equivalent functionality.
### What changes were proposed in this pull request? While I am not sure
that SerializationUtils.clone is a performance issue in production, I am
sure that it is overkill for the task it is doing (providing a distinct
copy of a `Properties` object). This PR provides a benchmark showing the
dramatic improvement over the clone operation and replaces uses of
`SerializationUtils.clone` on `Properties` with the more specialized
`Utils.cloneProperties`.
### Does this PR introduce any user-facing change? Strings are immutable
so there is no reason to serialize and deserialize them, it just creates
extra garbage. The only functionality that would be changed is the
unsupported insertion of non-String objects into the spark local
properties.
### How was this patch tested?
1. Pass the Jenkins with the existing tests. 2. Since this is a
performance improvement PR, manually run the benchmark.
Closes #25787 from databricks-david-lewis/SPARK-29081.
Authored-by: David Lewis <david.lewis@databricks.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: 8c0e961f6c5fe0da3b36a6fe642c12f88ac34d0f)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/ResetSystemProperties.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/benchmark/Benchmark.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala
The file was addedcore/benchmarks/PropertiesCloneBenchmark-results.txt
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala (diff)
Commit 729b3180bcbaa5289cb9a5848a3cce9010e75515 by dhyun
[SPARK-29087][CORE][STREAMING] Use DelegatingServletContextHandler to
avoid CCE
### What changes were proposed in this pull request?
[SPARK-27122](https://github.com/apache/spark/pull/24088) fixes
`ClassCastException` at `yarn` module by introducing
`DelegatingServletContextHandler`. Initially, this was discovered with
JDK9+, but the class path issues affected JDK8 environment, too. After
[SPARK-28709](https://github.com/apache/spark/pull/25439), I also hit
the similar issue at `streaming` module.
This PR aims to fix `streaming` module by adding `getContextPath` to
`DelegatingServletContextHandler` and using it.
### Why are the changes needed?
Currently, when we test `streaming` module independently, it fails like
the following.
```
$ build/mvn test -pl streaming
... UISeleniumSuite:
- attaching and detaching a Streaming tab *** FAILED ***
java.lang.ClassCastException:
org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to
org.eclipse.jetty.servlet.ServletContextHandler
... Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with the modified tests. And do the following manually.
Since you can observe this when you run `streaming` module test only
(instead of running all), you need to install the changed `core` module
and use it.
```
$ java -version openjdk version "1.8.0_222" OpenJDK Runtime Environment
(AdoptOpenJDK)(build 1.8.0_222-b10) OpenJDK 64-Bit Server VM
(AdoptOpenJDK)(build 25.222-b10, mixed mode)
$ build/mvn install -DskipTests
$ build/mvn test -pl streaming
```
Closes #25791 from dongjoon-hyun/SPARK-29087.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 729b3180bcbaa5289cb9a5848a3cce9010e75515)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/WebUI.scala (diff)
Commit 1b7afc0c986ed8e5431df351f51434424460f4b3 by dhyun
[SPARK-28471][SQL][DOC][FOLLOWUP] Fix year patterns in the comments of
date-time expressions
### What changes were proposed in this pull request?
In the PR, I propose to fix comments of date-time expressions, and
replace the `yyyy` pattern by `uuuu` when the implementation supposes
the former one.
### Why are the changes needed?
To make comments consistent to implementations.
### Does this PR introduce any user-facing change? No
### How was this patch tested?
By running Scala Style checker.
Closes #25796 from MaxGekk/year-pattern-uuuu-followup.
Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 1b7afc0c986ed8e5431df351f51434424460f4b3)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala (diff)
Commit 61e5aebce3925e7c512899939688e0eee4ac8a06 by dhyun
[SPARK-29046][SQL] Fix NPE in SQLConf.get when active SparkContext is
stopping
### What changes were proposed in this pull request?
This patch fixes the bug regarding NPE in SQLConf.get, which is only
possible when SparkContext._dagScheduler is null due to stopping
SparkContext. The logic doesn't seem to consider active SparkContext
could be in progress of stopping.
Note that it can't be encountered easily as SparkContext.stop() blocks
the main thread, but there're many cases which SQLConf.get is accessed
concurrently while SparkContext.stop() is executing - users run another
threads, or listener is accessing SQLConf.get after dagScheduler is set
to null (this is the case what I encountered.)
### Why are the changes needed?
The bug brings NPE.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Previous patch #25753 was tested with new UT, and due to disruption with
other tests in concurrent test run, the test is excluded in this patch.
Closes #25790 from HeartSaVioR/SPARK-29046-v2.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 61e5aebce3925e7c512899939688e0eee4ac8a06)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit b91648cfd0d7ab7014a137cdb61d8dbb3611438c by dhyun
[SPARK-28856][FOLLOW-UP][SQL][TEST] Add the `namespaces` keyword to
TableIdentifierParserSuite
### What changes were proposed in this pull request?
This PR add the `namespaces` keyword to `TableIdentifierParserSuite`.
### Why are the changes needed? Improve the test.
### Does this PR introduce any user-facing change? No
### How was this patch tested? N/A
Closes #25758 from highmoutain/3.0bugfix.
Authored-by: changchun.wang <251922566@qq.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: b91648cfd0d7ab7014a137cdb61d8dbb3611438c)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala (diff)
Commit 7d4eb38bbcc887fb61ba7344df3f77a046ad77f8 by dhyun
[SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration
Guide tap in Spark documentation
### What changes were proposed in this pull request?
Currently, there is no migration section for PySpark, SparkCore and
Structured Streaming. It is difficult for users to know what to do when
they upgrade.
This PR proposes to create create a "Migration Guide" tap at Spark
documentation.
![Screen Shot 2019-09-11 at 7 02 05
PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png)
![Screen Shot 2019-09-11 at 7 27 15
PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png)
This page will contain migration guides for Spark SQL, PySpark, SparkR,
MLlib, Structured Streaming and Core. Basically it is a refactoring.
There are some new information added, which I will leave a comment
inlined for easier review.
1. **MLlib**
Merge
[ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide)
and
[ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html)
    ```
   'docs/ml-guide.md'
           ↓ Merge new/old migration guides
   'docs/ml-migration-guide.md'
   ```
2. **PySpark**
Extract PySpark specific items from
https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html
    ```
   'docs/sql-migration-guide-upgrade.md'
          ↓ Extract PySpark specific items
   'docs/pyspark-migration-guide.md'
   ```
3. **SparkR**
Move
[sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide)
into a separate file, and extract from
[sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html)
    ```
   'docs/sparkr.md'                   
'docs/sql-migration-guide-upgrade.md'
    Move migration guide section ↘     ↙ Extract SparkR specific items
                    docs/sparkr-migration-guide.md
   ```
4. **Core**
Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved
JIRAs at 3.0.0 and found some items to note.
5. **Structured Streaming**
Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved
JIRAs at 3.0.0 and found some items to note.
6. **SQL**
Merged
[sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html)
and
[sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
   ```
   'docs/sql-migration-guide-hive-compatibility.md'   
'docs/sql-migration-guide-upgrade.md'
    Move Hive compatibility section ↘                   ↙ Left over
after filtering PySpark and SparkR items
                                 'docs/sql-migration-guide.md'
   ```
### Why are the changes needed?
In order for users in production to effectively migrate to higher
versions, and detect behaviour or breaking changes before upgrading
and/or migrating.
### Does this PR introduce any user-facing change? Yes, this changes
Spark's documentation at
https://spark.apache.org/docs/latest/index.html.
### How was this patch tested?
Manually build the doc. This can be verified as below:
```bash cd docs SKIP_API=1 jekyll build open _site/index.html
```
Closes #25757 from HyukjinKwon/migration-doc.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 7d4eb38bbcc887fb61ba7344df3f77a046ad77f8)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was addeddocs/sql-migration-old.md
The file was modifieddocs/_layouts/global.html (diff)
The file was addeddocs/_includes/nav-left-wrapper-migration.html
The file was addeddocs/ml-migration-guide.md
The file was removeddocs/mllib-migration-guides.md
The file was modifieddocs/_data/menu-sql.yaml (diff)
The file was addeddocs/core-migration-guide.md
The file was addeddocs/migration-guide.md
The file was removeddocs/sql-migration-guide-upgrade.md
The file was addeddocs/pyspark-migration-guide.md
The file was addeddocs/_data/menu-migration.yaml
The file was modifieddocs/index.md (diff)
The file was removeddocs/ml-migration-guides.md
The file was removeddocs/sql-migration-guide-hive-compatibility.md
The file was addeddocs/sparkr-migration-guide.md
The file was modifieddocs/sparkr.md (diff)
The file was addeddocs/ss-migration-guide.md
The file was modifieddocs/ml-guide.md (diff)
Commit 1b99d0cca4b4fb6d193091f92c46c916b70cd84e by gurwls223
[SPARK-29069][SQL] ResolveInsertInto should not do table lookup
### What changes were proposed in this pull request?
It's more clear to only do table lookup in `ResolveTables` rule (for v2
tables) and `ResolveRelations` rule (for v1 tables). `ResolveInsertInto`
should only resolve the `InsertIntoStatement` with resolved relations.
### Why are the changes needed?
to make the code simpler
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
existing tests
Closes #25774 from cloud-fan/simplify.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 1b99d0cca4b4fb6d193091f92c46c916b70cd84e)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 471a3eff514480cfcbda79bde9294408cc8eb125 by dhyun
[SPARK-28932][BUILD][FOLLOWUP] Switch to scala-library compile
dependency for JDK11
### What changes were proposed in this pull request?
This is a follow-up of https://github.com/apache/spark/pull/25638 to
switch `scala-library` from `test` dependency to `compile` dependency in
`network-common` module.
### Why are the changes needed?
Previously, we added `scala-library` as a test dependency to resolve the
followings, but it was insufficient to resolve. This PR aims to switch
it to compile dependency.
```
$ java -version openjdk version "11.0.3" 2019-04-16 OpenJDK Runtime
Environment AdoptOpenJDK (build 11.0.3+7) OpenJDK 64-Bit Server VM
AdoptOpenJDK (build 11.0.3+7, mixed mode)
$ mvn clean install -pl common/network-common -DskipTests
...
[INFO] --- scala-maven-plugin:4.2.0:doc-jar (attach-scaladocs)
spark-network-common_2.12 --- error: fatal error: object scala in
compiler mirror not found. one error found
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually, run the following on JDK11.
```
$ mvn clean install -pl common/network-common -DskipTests
```
Closes #25800 from dongjoon-hyun/SPARK-28932-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 471a3eff514480cfcbda79bde9294408cc8eb125)
The file was modifiedcommon/network-common/pom.xml (diff)
Commit 6297287dfa6e9d30141728c931ed58c8c4966851 by wenchen
[SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
### What changes were proposed in this pull request?
This pr proposes to print bytecode statistics (max class bytecode size,
max method bytecode size, max constant pool size, and # of inner
classes) for generated classes in debug prints, `debugCodegen`. Since
these metrics are critical for codegen framework developments, I think
its worth printing there. This pr intends to enable `debugCodegen` to
print these metrics as following;
``` scala> sql("SELECT sum(v) FROM VALUES(1) t(v)").debugCodegen Found 2
WholeStageCodegen subtrees.
== Subtree 1 / 2 (maxClassCodeSize:2693; maxMethodCodeSize:124;
maxConstantPoolSize:130(0.20% used); numInnerClasses:0) ==
               
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*(1) HashAggregate(keys=[], functions=[partial_sum(cast(v#0 as
bigint))], output=[sum#5L])
+- *(1) LocalTableScan [v#0]
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
...
```
### Why are the changes needed?
For efficient developments
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually tested
Closes #25766 from maropu/PrintBytecodeStats.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Wenchen Fan <wenchen@databricks.com>
(commit: 6297287dfa6e9d30141728c931ed58c8c4966851)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala (diff)
Commit 67751e26940a16ab6f9950ae66a46b7cb901c102 by irashid
[SPARK-29072][CORE] Put back usage of TimeTrackingOutputStream for
UnsafeShuffleWriter and SortShuffleWriter
### What changes were proposed in this pull request? The previous
refactors of the shuffle writers using the shuffle writer plugin
resulted in shuffle write metric updates - particularly write times -
being lost in particular situations. This patch restores the lost metric
updates.
### Why are the changes needed? This fixes a regression. I'm pretty sure
that without this, the Spark UI will lose shuffle write time
information.
### Does this PR introduce any user-facing change? No change from Spark
2.4. Without this, there would be a user-facing bug in Spark 3.0.
### How was this patch tested? Existing unit tests.
Closes #25780 from mccheah/fix-write-metrics.
Authored-by: mcheah <mcheah@palantir.com> Signed-off-by: Imran Rashid
<irashid@cloudera.com>
(commit: 67751e26940a16ab6f9950ae66a46b7cb901c102)
The file was modifiedcore/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/ShufflePartitionPairsWriter.scala (diff)
Commit 5881871ca5156ef0e83c9503d5eac288320147c3 by vanzin
[SPARK-26929][SQL] fix table owner use user instead of principal when
create table through spark-sql or beeline
…create table through spark-sql or beeline
## What changes were proposed in this pull request?
fix table owner use user instead of principal when create table through
spark-sql private val userName = conf.getUser will get ugi's userName
which is principal info, and i copy the source code into HiveClientImpl,
and use ugi.getShortUserName() instead of ugi.getUserName(). The owner
display correctly.
## How was this patch tested?
1. create a table in kerberos cluster 2. use "desc formatted tbName"
check owner
Please review http://spark.apache.org/contributing.html before opening a
pull request.
Closes #23952 from hddong/SPARK-26929-fix-table-owner.
Lead-authored-by: hongdd <jn_hdd@163.com> Co-authored-by: hongdongdong
<hongdongdong@cmss.chinamobile.com> Signed-off-by: Marcelo Vanzin
<vanzin@cloudera.com>
(commit: 5881871ca5156ef0e83c9503d5eac288320147c3)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
Commit 88c8d5eed2bf26ec4cc6ef68d9bdabbcb7ba1b83 by sean.owen
[SPARK-23539][SS][FOLLOWUP][TESTS] Add UT to ensure existing query
doesn't break with default conf of includeHeaders
### What changes were proposed in this pull request?
This patch adds new UT to ensure existing query (before Spark 3.0.0)
with checkpoint doesn't break with default configuration of
"includeHeaders" being introduced via SPARK-23539.
This patch also modifies existing test which checks type of columns to
also check headers column as well.
### Why are the changes needed?
The patch adds missing tests which guarantees backward compatibility of
the change of SPARK-23539.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
UT passed.
Closes #25792 from HeartSaVioR/SPARK-23539-FOLLOWUP.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(commit: 88c8d5eed2bf26ec4cc6ef68d9bdabbcb7ba1b83)
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/metadata
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/offsets/0
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/state/0/1/1.delta
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/commits/0
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/state/0/0/1.delta
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/state/0/4/1.delta
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/state/0/3/1.delta
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala (diff)
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/sources/0/0
The file was addedexternal/kafka-0-10-sql/src/test/resources/structured-streaming/checkpoint-version-2.4.3-kafka-include-headers-default/state/0/2/1.delta
Commit 95073fb62b646c3e8394941c5835a396b9d48c0f by yamamuro
[SPARK-29008][SQL] Define an individual method for each common
subexpression in HashAggregateExec
### What changes were proposed in this pull request?
This pr proposes to define an individual method for each common
subexpression in HashAggregateExec. In the current master, the common
subexpr elimination code in HashAggregateExec is expanded in a single
method;
https://github.com/apache/spark/blob/4664a082c2c7ac989e818958c465c72833d3ccfe/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L397
The method size can be too big for JIT compilation, so I believe
splitting it is beneficial for performance. For example, in a query
`SELECT SUM(a + b), AVG(a + b + c) FROM VALUES (1, 1, 1) t(a, b, c)`,
the current master generates;
```
/* 098 */   private void agg_doConsume_0(InternalRow
localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int
agg_expr_2_0) throws java.io.IOException {
/* 099 */     // do aggregate
/* 100 */     // common sub-expressions
/* 101 */     int agg_value_6 = -1;
/* 102 */
/* 103 */     agg_value_6 = agg_expr_0_0 + agg_expr_1_0;
/* 104 */
/* 105 */     int agg_value_5 = -1;
/* 106 */
/* 107 */     agg_value_5 = agg_value_6 + agg_expr_2_0;
/* 108 */     boolean agg_isNull_4 = false;
/* 109 */     long agg_value_4 = -1L;
/* 110 */     if (!false) {
/* 111 */       agg_value_4 = (long) agg_value_5;
/* 112 */     }
/* 113 */     int agg_value_10 = -1;
/* 114 */
/* 115 */     agg_value_10 = agg_expr_0_0 + agg_expr_1_0;
/* 116 */     // evaluate aggregate functions and update aggregation
buffers
/* 117 */     agg_doAggregate_sum_0(agg_value_10);
/* 118 */     agg_doAggregate_avg_0(agg_value_4, agg_isNull_4);
/* 119 */
/* 120 */   }
```
On the other hand, this pr generates;
```
/* 121 */   private void agg_doConsume_0(InternalRow
localtablescan_row_0, int agg_expr_0_0, int agg_expr_1_0, int
agg_expr_2_0) throws java.io.IOException {
/* 122 */     // do aggregate
/* 123 */     // common sub-expressions
/* 124 */     long agg_subExprValue_0 = agg_subExpr_0(agg_expr_2_0,
agg_expr_0_0, agg_expr_1_0);
/* 125 */     int agg_subExprValue_1 = agg_subExpr_1(agg_expr_0_0,
agg_expr_1_0);
/* 126 */     // evaluate aggregate functions and update aggregation
buffers
/* 127 */     agg_doAggregate_sum_0(agg_subExprValue_1);
/* 128 */     agg_doAggregate_avg_0(agg_subExprValue_0);
/* 129 */
/* 130 */   }
```
I run some micro benchmarks for this pr;
```
(base) maropu~:$system_profiler SPHardwareDataType Hardware:
   Hardware Overview:
     Processor Name: Intel Core i5
     Processor Speed: 2 GHz
     Number of Processors: 1
     Total Number of Cores: 2
     L2 Cache (per Core): 256 KB
     L3 Cache: 4 MB
     Memory: 8 GB
(base) maropu~:$java -version java version "1.8.0_181" Java(TM) SE
Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server
VM (build 25.181-b13, mixed mode)
(base) maropu~:$ /bin/spark-shell --master=local[1] --conf
spark.driver.memory=8g --conf spark.sql.shurtitions=1 -v
val numCols = 40 val colExprs = "id AS key" +: (0 until numCols).map { i
=> s"id AS _c$i" } spark.range(3000000).selectExpr(colExprs:
_*).createOrReplaceTempView("t")
val aggExprs = (2 until numCols).map { i =>
(0 until i).map(d => s"_c$d")
   .mkString("AVG(", " + ", ")")
}
// Drops the time of a first run then pick that of a second run timer {
sql(s"SELECT ${aggExprs.mkString(", ")} FROM
t").write.format("noop").save() }
// the master maxCodeGen: 12957 Elapsed time: 36.309858661s
// this pr maxCodeGen=4184 Elapsed time: 2.399490285s
```
### Why are the changes needed?
To avoid the too-long-function issue in JVMs.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added tests in `WholeStageCodegenSuite`
Closes #25710 from maropu/SplitSubexpr.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
(commit: 95073fb62b646c3e8394941c5835a396b9d48c0f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
Commit dffd92e9779021fa7df2ec962c9cd07e0dbfc2f3 by wenchen
[SPARK-29100][SQL] Fix compilation error in codegen with switch from
InSet expression
### What changes were proposed in this pull request?
When InSet generates Java switch-based code, if the input set is empty,
we don't generate switch condition, but a simple expression that is
default case of original switch.
### Why are the changes needed?
SPARK-26205 adds an optimization to InSet that generates Java switch
condition for certain cases. When the given set is empty, it is possibly
that codegen causes compilation error:
```
[info] - SPARK-29100: InSet with empty input set *** FAILED *** (58
milliseconds)
[info]   Code generation of input[0, int, true] INSET () failed:
[info]   org.codehaus.janino.InternalCompilerException: failed to
compile: org.codehaus.janino.InternalCompilerException: Compiling
"GeneratedClass" in "generated.java": Compiling "apply(java.lang.Object
_i)"; apply(java.lang.Object _i): Operand stack inconsistent at offset
45: Previous size 0, now 1
[info]   org.codehaus.janino.InternalCompilerException: failed to
compile: org.codehaus.janino.InternalCompilerException: Compiling
"GeneratedClass" in "generated.java": Compiling "apply(java.lang.Object
_i)"; apply(java.lang.Object _i): Operand stack inconsistent at offset
45: Previous size 0, now 1
[info]         at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1308)
[info]         at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1386)
[info]         at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1383)
```
### Does this PR introduce any user-facing change?
Yes. Previously, when users have InSet against an empty set, generated
code causes compilation error. This patch fixed it.
### How was this patch tested?
Unit test added.
Closes #25806 from viirya/SPARK-29100.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: dffd92e9779021fa7df2ec962c9cd07e0dbfc2f3)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala (diff)
Commit 4d27a259087258492d0a66ca1ace7ef584c72a6f by ruifengz
[SPARK-22797][ML][PYTHON] Bucketizer support multi-column
### What changes were proposed in this pull request? Bucketizer support
multi-column in the python side
### Why are the changes needed? Bucketizer should support multi-column
like the scala side.
### Does this PR introduce any user-facing change? yes, this PR add new
Python API
### How was this patch tested? added testsuites
Closes #25801 from zhengruifeng/20542_py.
Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by:
zhengruifeng <ruifengz@foxmail.com>
(commit: 4d27a259087258492d0a66ca1ace7ef584c72a6f)
The file was modifiedpython/pyspark/ml/tests/test_param.py (diff)
The file was modifiedpython/pyspark/ml/feature.py (diff)
The file was modifiedpython/pyspark/ml/param/__init__.py (diff)
Commit c8628354b7d2e6116b2a6eb3bdb2fc957c91fd03 by wenchen
[SPARK-28996][SQL][TESTS] Add tests regarding username of HiveClient
### What changes were proposed in this pull request?
This patch proposes to add new tests to test the username of HiveClient
to prevent changing the semantic unintentionally. The owner of Hive
table has been changed back-and-forth, principal -> username ->
principal, and looks like the change is not intentional. (Please refer
[SPARK-28996](https://issues.apache.org/jira/browse/SPARK-28996) for
more details.) This patch intends to prevent this.
This patch also renames previous HiveClientSuite(s) to
HivePartitionFilteringSuite(s) as it was commented as TODO, as well as
previous tests are too narrowed to test only partition filtering.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Newly added UTs.
Closes #25696 from HeartSaVioR/SPARK-28996.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c8628354b7d2e6116b2a6eb3bdb2fc957c91fd03)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuites.scala
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was removedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
The file was removedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuites.scala
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientUserNameSuites.scala
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientUserNameSuite.scala
Commit db996ccad91bbd7db412b1363641820784ce77bc by gurwls223
[SPARK-29074][SQL] Optimize `date_format` for foldable `fmt`
### What changes were proposed in this pull request?
In the PR, I propose to create an instance of `TimestampFormatter` only
once at the initialization, and reuse it inside of `nullSafeEval()` and
`doGenCode()` in the case when the `fmt` parameter is foldable.
### Why are the changes needed?
The changes improve performance of the `date_format()` function.
Before:
``` format date:                             Best/Avg Time(ms)  
Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
format date wholestage off                    7180 / 7181          1.4 
      718.0       1.0X format date wholestage on                   
7051 / 7194          1.4         705.1       1.0X
```
After:
``` format date:                             Best/Avg Time(ms)  
Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
format date wholestage off                    4787 / 4839          2.1 
      478.7       1.0X format date wholestage on                   
4736 / 4802          2.1         473.6       1.0X
```
### Does this PR introduce any user-facing change? No.
### How was this patch tested?
By existing test suites `DateExpressionsSuite` and `DateFunctionsSuite`.
Closes #25782 from MaxGekk/date_format-foldable.
Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: db996ccad91bbd7db412b1363641820784ce77bc)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/core/benchmarks/DateTimeBenchmark-results.txt (diff)
Commit 79b10a1aab9be6abdf749ad94c88234ace8ba34a by dhyun
[SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG
in Executor Plugin API
### What changes were proposed in this pull request?
Log levels in Executor.scala are changed from DEBUG to INFO.
### Why are the changes needed?
Logging level DEBUG is too low here. These messages are simple
acknowledgement for successful loading and initialization of plugins. So
its better to keep them in INFO level.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Manually tested.
Closes #25634 from iRakson/ExecutorPlugin.
Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 79b10a1aab9be6abdf749ad94c88234ace8ba34a)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/Executor.scala (diff)
Commit 104b9b6f8c93f341bda043852aa61ea2a1d2e21b by weichen.xu
[SPARK-28483][FOLLOW-UP] Fix flaky test in BarrierTaskContextSuite
### What changes were proposed in this pull request?
I fix the test "barrier task killed" which is flaky:
* Split interrupt/no interrupt test into separate sparkContext. Prevent
them to influence each other.
* only check exception on partiton-0. partition-1 is hang on sleep which
may throw other exception.
### Why are the changes needed? Make test robust.
### Does this PR introduce any user-facing change? No.
### How was this patch tested? N/A
Closes #25799 from WeichenXu123/oss_fix_barrier_test.
Authored-by: WeichenXu <weichen.xu@databricks.com> Signed-off-by:
WeichenXu <weichen.xu@databricks.com>
(commit: 104b9b6f8c93f341bda043852aa61ea2a1d2e21b)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala (diff)
Commit 34915b22ab174a45c563ccdcd5035299f3ccc56c by gurwls223
[SPARK-29104][CORE][TESTS] Fix PipedRDDSuite to use `eventually` to
check thread termination
### What changes were proposed in this pull request?
`PipedRDD` will invoke `stdinWriterThread.interrupt()` at task
completion, and `obj.wait` will get `InterruptedException`. However,
there exists a possibility which the thread termination gets delayed
because the thread starts from `obj.wait()` with that exception. To
prevent test flakiness, we need to use `eventually`. Also, This PR fixes
the typo in code comment and variable name.
### Why are the changes needed?
```
- stdin writer thread should be exited when task is finished *** FAILED
***
Some(Thread[stdin writer for List(cat),5,]) was not empty
(PipedRDDSuite.scala:107)
```
-
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/6867/testReport/junit/org.apache.spark.rdd/PipedRDDSuite/stdin_writer_thread_should_be_exited_when_task_is_finished/
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manual.
We can reproduce the same failure like Jenkins if we catch
`InterruptedException` and sleep longer than the `eventually` timeout
inside the test code. The following is the example to reproduce it.
```scala val nums = sc.makeRDD(Array(1, 2, 3, 4), 1).map { x =>
try {
   obj.synchronized {
     obj.wait() // make the thread waits here.
   }
} catch {
   case ie: InterruptedException =>
     Thread.sleep(15000)
     throw ie
}
x
}
```
Closes #25808 from dongjoon-hyun/SPARK-29104.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 34915b22ab174a45c563ccdcd5035299f3ccc56c)
The file was modifiedcore/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala (diff)
Commit 3fc52b5557b4608d8f0ce26d11c1ca3e24c157a2 by wenchen
[SPARK-28950][SQL] Refine the code of DELETE
### What changes were proposed in this pull request? This pr refines the
code of DELETE, including, 1, make `whereClause` to be optional, in
which case DELETE will delete all of the data of a table; 2, add more
test cases; 3, some other refines. This is a following-up of
SPARK-28351.
### Why are the changes needed? An optional where clause in DELETE
respects the SQL standard.
### Does this PR introduce any user-facing change? Yes. But since this
is a non-released feature, this change does not have any end-user
affects.
### How was this patch tested? New case is added.
Closes #25652 from xianyinxin/SPARK-28950.
Authored-by: xy_xin <xianyin.xxy@alibaba-inc.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: 3fc52b5557b4608d8f0ce26d11c1ca3e24c157a2)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/DeleteFromStatement.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)