SuccessChanges

Summary

  1. [SPARK-23421][SPARK-22356][SQL] Document the behavior change in (commit: bd83f7ba097d9bca9a0e8c072f7566a645887a96) (details)
  2. [SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource (commit: 129fd45efb418c6afa95aa26e5b96f03a39dcdd0) (details)
  3. [SPARK-23419][SPARK-23416][SS] data source v2 write path should re-throw (commit: f2c0585652a8262801e92a02b56c56f16b8926e5) (details)
  4. [SPARK-23422][CORE] YarnShuffleIntegrationSuite fix when SPARK_PREPEN… (commit: d24d13179f0e9d125eaaebfcc225e1ec30c5cb83) (details)
  5. [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0 (commit: bae4449ad836a64db853e297d33bfd1a725faa0b) (details)
  6. [MINOR][SQL] Fix an error message about inserting into bucketed tables (commit: 03960faa621710388c0de91d16a71dda749d173f) (details)
  7. [SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug (commit: 0bd7765cd9832bee348af87663f3d424b61e92fc) (details)
  8. [SPARK-23413][UI] Fix sorting tasks by Host / Executor ID at the Stag… (commit: 75bb19a018f9260eab3ea0ba3ea46e84b87eabf2) (details)
Commit bd83f7ba097d9bca9a0e8c072f7566a645887a96 by gatorsmile
[SPARK-23421][SPARK-22356][SQL] Document the behavior change in
## What changes were proposed in this pull request?
https://github.com/apache/spark/pull/19579 introduces a behavior change.
We need to document it in the migration guide.
## How was this patch tested? Also update the
HiveExternalCatalogVersionsSuite to verify it.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #20606 from gatorsmile/addMigrationGuide.
(cherry picked from commit a77ebb0921e390cf4fc6279a8c0a92868ad7e69b)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: bd83f7ba097d9bca9a0e8c072f7566a645887a96)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala (diff)
The file was modifieddocs/sql-programming-guide.md (diff)
Commit 129fd45efb418c6afa95aa26e5b96f03a39dcdd0 by gatorsmile
[SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource
## What changes were proposed in this pull request? This PR is to revert
the PR https://github.com/apache/spark/pull/20302, because it causes a
regression.
## How was this patch tested? N/A
Author: gatorsmile <gatorsmile@gmail.com>
Closes #20614 from gatorsmile/revertJsonFix.
(cherry picked from commit 95e4b4916065e66a4f8dba57e98e725796f75e04)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 129fd45efb418c6afa95aa26e5b96f03a39dcdd0)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/CreateJacksonParser.scala (diff)
Commit f2c0585652a8262801e92a02b56c56f16b8926e5 by wenchen
[SPARK-23419][SPARK-23416][SS] data source v2 write path should re-throw
interruption exceptions directly
## What changes were proposed in this pull request?
Streaming execution has a list of exceptions that means interruption,
and handle them specially. `WriteToDataSourceV2Exec` should also respect
this list and not wrap them with `SparkException`.
## How was this patch tested?
existing test.
Author: Wenchen Fan <wenchen@databricks.com>
Closes #20605 from cloud-fan/write.
(cherry picked from commit f38c760638063f1fb45e9ee2c772090fb203a4a0)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f2c0585652a8262801e92a02b56c56f16b8926e5)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala (diff)
Commit d24d13179f0e9d125eaaebfcc225e1ec30c5cb83 by vanzin
[SPARK-23422][CORE] YarnShuffleIntegrationSuite fix when SPARK_PREPEN…
…D_CLASSES set to 1
## What changes were proposed in this pull request?
YarnShuffleIntegrationSuite fails when SPARK_PREPEND_CLASSES set to 1.
Normally mllib built before yarn module. When SPARK_PREPEND_CLASSES used
mllib classes are on yarn test classpath.
Before 2.3 that did not cause issues. But 2.3 has SPARK-22450, which
registered some mllib classes with the kryo serializer. Now it dies with
the following error:
` 18/02/13 07:33:29 INFO SparkContext: Starting job: collect at
YarnShuffleIntegrationSuite.scala:143 Exception in thread
"dag-scheduler-event-loop" java.lang.NoClassDefFoundError:
breeze/linalg/DenseMatrix
`
In this PR NoClassDefFoundError caught only in case of testing and then
do nothing.
## How was this patch tested?
Automated: Pass the Jenkins.
Author: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Closes #20608 from gaborgsomogyi/SPARK-23422.
(cherry picked from commit 44e20c42254bc6591b594f54cd94ced5fcfadae3)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: d24d13179f0e9d125eaaebfcc225e1ec30c5cb83)
The file was modifiedcore/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala (diff)
Commit bae4449ad836a64db853e297d33bfd1a725faa0b by gatorsmile
[SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0
## What changes were proposed in this pull request?
To prevent any regressions, this PR changes ORC implementation to `hive`
by default like Spark 2.2.X. Users can enable `native` ORC. Also, ORC
PPD is also restored to `false` like Spark 2.2.X.
![orc_section](https://user-images.githubusercontent.com/9700541/36221575-57a1d702-1173-11e8-89fe-dca5842f4ca7.png)
## How was this patch tested?
Pass all test cases.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #20610 from dongjoon-hyun/SPARK-ORC-DISABLE.
(cherry picked from commit 2f0498d1e85a53b60da6a47d20bbdf56b42b7dcb)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: bae4449ad836a64db853e297d33bfd1a725faa0b)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala (diff)
The file was modifieddocs/sql-programming-guide.md (diff)
Commit 03960faa621710388c0de91d16a71dda749d173f by gatorsmile
[MINOR][SQL] Fix an error message about inserting into bucketed tables
## What changes were proposed in this pull request?
This replaces `Sparkcurrently` to `Spark currently` in the following
error message.
```scala scala> sql("insert into t2 select * from v1")
org.apache.spark.sql.AnalysisException: Output Hive table `default`.`t2`
is bucketed but Sparkcurrently does NOT populate bucketed ...
```
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #20617 from dongjoon-hyun/SPARK-ERROR-MSG.
(cherry picked from commit 6968c3cfd70961c4e86daffd6a156d0a9c1d7a2a)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 03960faa621710388c0de91d16a71dda749d173f)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala (diff)
Commit 0bd7765cd9832bee348af87663f3d424b61e92fc by joseph
[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug
## What changes were proposed in this pull request?
#### Problem:
Since 2.3, `Bucketizer` supports multiple input/output columns. We will
check if exclusive params are set during transformation. E.g., if
`inputCols` and `outputCol` are both set, an error will be thrown.
However, when we write `Bucketizer`, looks like the default params and
user-supplied params are merged during writing. All saved params are
loaded back and set to created model instance. So the default
`outputCol` param in `HasOutputCol` trait will be set in `paramMap` and
become an user-supplied param. That makes the check of exclusive params
failed.
#### Fix:
This changes the saving logic of Bucketizer to handle this case. This is
a quick fix to catch the time of 2.3. We should consider modify the
persistence mechanism later.
Please see the discussion in the JIRA.
Note: The multi-column `QuantileDiscretizer` also has the same issue.
## How was this patch tested?
Modified tests.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #20594 from viirya/SPARK-23377-2.
(cherry picked from commit db45daab90ede4c03c1abc9096f4eac584e9db17)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
(commit: 0bd7765cd9832bee348af87663f3d424b61e92fc)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala (diff)
Commit 75bb19a018f9260eab3ea0ba3ea46e84b87eabf2 by irashid
[SPARK-23413][UI] Fix sorting tasks by Host / Executor ID at the Stag…
…e page
## What changes were proposed in this pull request?
Fixing exception got at sorting tasks by Host / Executor ID:
```
       java.lang.IllegalArgumentException: Invalid sort column: Host
at org.apache.spark.ui.jobs.ApiHelper$.indexName(StagePage.scala:1017)
at
org.apache.spark.ui.jobs.TaskDataSource.sliceData(StagePage.scala:694)
at org.apache.spark.ui.PagedDataSource.pageData(PagedTable.scala:61)
at org.apache.spark.ui.PagedTable$class.table(PagedTable.scala:96)
at org.apache.spark.ui.jobs.TaskPagedTable.table(StagePage.scala:708)
at org.apache.spark.ui.jobs.StagePage.liftedTree1$1(StagePage.scala:293)
at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:282)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
```
Moreover some refactoring to avoid similar problems by introducing
constants for each header name and reusing them at the identification of
the corresponding sorting index.
## How was this patch tested?
Manually:
![screen shot 2018-02-13 at 18 57
10](https://user-images.githubusercontent.com/2017933/36166532-1cfdf3b8-10f3-11e8-8d32-5fcaad2af214.png)
(cherry picked from commit 1dc2c1d5e85c5f404f470aeb44c1f3c22786bdea)
Author: “attilapiros” <piros.attila.zsolt@gmail.com>
Closes #20623 from squito/fix_backport.
(commit: 75bb19a018f9260eab3ea0ba3ea46e84b87eabf2)
The file was modifiedcore/src/main/scala/org/apache/spark/status/storeTypes.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/StagePageSuite.scala (diff)