SuccessChanges

Summary

  1. [SPARK-23020] Ignore Flaky Test: (commit: 1a6dfaf25f507545debdf4cb1d427b9cc78c3cc8) (details)
  2. [SPARK-23033][SS] Don't use task level retry for continuous processing (commit: dbd2a5566d8924ab340c3c840d31e83e5af92242) (details)
  3. [SPARK-23093][SS] Don't change run id when reconfiguring a continuous (commit: 79ccd0cadf09c41c0f4b5853a54798be17a20584) (details)
  4. [SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in (commit: 6e509fde3f056316f46c71b672a7d69adb1b4f8e) (details)
  5. [SPARK-23132][PYTHON][ML] Run doctests in ml.image when testing (commit: b84c2a30665ebbd65feb7418826501f6c959eb96) (details)
  6. [SPARK-23119][SS] Minor fixes to V2 streaming APIs (commit: 9783aea2c75700e7ce9551ccfd33e43765de8981) (details)
  7. [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins (commit: 050c1e24e506ff224bcf4e3e458e57fbd216765c) (details)
  8. [SPARK-21996][SQL] read files with space in name for streaming (commit: f2688ef0fbd9d355d13ce4056d35e99970f4cd47) (details)
  9. [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext (commit: 3a80cc59b54bb8e92df507777e836f167a4db14e) (details)
  10. [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api. (commit: 2a87c3a77cbe40cbe5a8bdef41e3c37a660e2308) (details)
  11. [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Session state's (commit: f801ac417ba13a975887ba83904ee771bc3a003e) (details)
  12. [SPARK-22036][SQL] Decimal multiplication with high precision/scale (commit: 8a98274823a4671cee85081dd19f40146e736325) (details)
  13. [SPARK-23141][SQL][PYSPARK] Support data type string as a returnType for (commit: e0421c65093f66b365539358dd9be38d2006fa47) (details)
  14. [SPARK-23147][UI] Fix task page table IndexOutOfBound Exception (commit: bd0a1627b9396c69dbe3554e6ca6c700eeb08f74) (details)
  15. [SPARK-23029][DOCS] Specifying default units of configuration entries (commit: bfdbdd37951a872676a22b0524cbde12a1df418d) (details)
  16. [SPARK-23143][SS][PYTHON] Added python API for setting continuous (commit: e6e8bbe84625861f3a4834a2d71cb2f0fe7f6b5a) (details)
  17. [SPARK-23144][SS] Added console sink for continuous processing (commit: 1f88fcd41c6c5521d732b25e83d6c9d150d7f24a) (details)
  18. [SPARK-23133][K8S] Fix passing java options to Executor (commit: b8c6d9303d029f6bf8ee43bae3f159112eb0fb79) (details)
  19. [SPARK-23094] Fix invalid character handling in JsonDataSource (commit: a295034da6178f8654c3977903435384b3765b5e) (details)
  20. [SPARK-22962][K8S] Fail fast if submission client local files are used (commit: 7057e310ab3756c83c13586137e8390fe9ef7e9a) (details)
  21. [SPARK-23142][SS][DOCS] Added docs for continuous processing (commit: acf3b70d16cc4d2416b4ce3f42b3cf95836170ed) (details)
  22. [DOCS] change to dataset for java code in (commit: 225b1afdd1582cd4087e7cb98834505eaf16743e) (details)
  23. [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType casting when casting (commit: 541dbc00b24f17d83ea2531970f2e9fe57fe3718) (details)
  24. [BUILD][MINOR] Fix java style check issues (commit: 54c1fae12df654c7713ac5e7eb4da7bb2f785401) (details)
  25. [SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols (commit: e58223171ecae6450482aadf4e7994c3b8d8a58d) (details)
  26. [SPARK-23048][ML] Add OneHotEncoderEstimator document and examples (commit: ef7989d55b65f386ed1ab87535a44e9367029a52) (details)
  27. [SPARK-23089][STS] Recreate session log directory if it doesn't exist (commit: b7a81999df8f43223403c77db9c1aedddb58370d) (details)
  28. [SPARK-23000][TEST] Keep Derby DB Location Unchanged After Session (commit: 8d6845cf926a14e21ca29a43f2cc9a3a9475afd5) (details)
  29. [SPARK-23149][SQL] polish ColumnarBatch (commit: 55efeffd774a776806f379df5b2209af05270cc4) (details)
  30. [SPARK-23104][K8S][DOCS] Changes to Kubernetes scheduler documentation (commit: ffe45913d0c666185f8c252be30b5e269a909c07) (details)
  31. [SPARK-20664][CORE] Delete stale application data from SHS. (commit: 4b79514c90ca76674d17fd80d125e9dbfb0e845e) (details)
  32. [SPARK-23103][CORE] Ensure correct sort order for negative values in (commit: d0cb19873bb325be7e31de62b0ba117dd6b92619) (details)
  33. [SPARK-23135][UI] Fix rendering of accumulators in the stage page. (commit: f9ad00a5aeeecf4b8d261a0dae6c8cb6be8daa67) (details)
  34. [SPARK-21771][SQL] remove useless hive client in SparkSQLEnv (commit: c647f918b1aee27d7a53852aca74629f03ad49f6) (details)
  35. [SPARK-23091][ML] Incorrect unit test for approxQuantile (commit: 0cde5212a80b5572bfe53b06ed557e6c2ec8c903) (details)
  36. [SPARK-23165][DOC] Spelling mistake fix in quick-start doc. (commit: e11d5eaf79ffccbe3a5444a5b9ecf3a203e1fc90) (details)
  37. [SPARK-21786][SQL] The 'spark.sql.parquet.compression.codec' and (commit: b9c1367b7d9240070c5d83572dc7b43c7480b456) (details)
  38. [SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition (commit: e0ef30f770329f058843a7a486bf357e9cd6e26a) (details)
  39. [SPARK-21293][SS][SPARKR] Add doc example for streaming join, dedup (commit: 7520491bf80eb2e21f0630aa13d7cdaad881626b) (details)
  40. [SPARK-22976][CORE] Cluster mode driver dir removed while running (commit: 5781fa79e28e2123e370fc1096488e318f2b4ee2) (details)
  41. [MINOR][SQL] Fix wrong comments on (commit: 36af73b59b6fb3d5f8e8a8e1caf44bd565e97b3d) (details)
  42. [SPARK-23020][CORE] Fix races in launcher code, test. (commit: 57c320a0dcc6ca784331af0191438e252d418075) (details)
  43. [MINOR][DOC] Fix the path to the examples jar (commit: cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7) (details)
  44. [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration (commit: 743b9173f8feaed8e594961aa85d61fb3f8e5e70) (details)
  45. [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and (commit: d933fcea6f3b1d2a5bfb03d808ec83db0f97298a) (details)
  46. [MINOR][SQL][TEST] Test case cleanups for recent PRs (commit: 1069fad41fb6896fef4245e6ae6b5ba36115ad68) (details)
  47. [SPARK-23090][SQL] polish ColumnVector (commit: d963ba031748711ec7847ad0b702911eb7319c63) (details)
Commit 1a6dfaf25f507545debdf4cb1d427b9cc78c3cc8 by sameerag
[SPARK-23020] Ignore Flaky Test:
SparkLauncherSuite.testInProcessLauncher
## What changes were proposed in this pull request?
Temporarily ignoring flaky test
`SparkLauncherSuite.testInProcessLauncher` to de-flake the builds. This
should be re-enabled when SPARK-23020 is merged.
## How was this patch tested?
N/A (Test Only Change)
Author: Sameer Agarwal <sameerag@apache.org>
Closes #20291 from sameeragarwal/disable-test-2.
(cherry picked from commit c132538a164cd8b55dbd7e8ffdc0c0782a0b588c)
Signed-off-by: Sameer Agarwal <sameerag@apache.org>
(commit: 1a6dfaf25f507545debdf4cb1d427b9cc78c3cc8)
The file was modifiedcore/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java (diff)
Commit dbd2a5566d8924ab340c3c840d31e83e5af92242 by tathagata.das1565
[SPARK-23033][SS] Don't use task level retry for continuous processing
## What changes were proposed in this pull request?
Continuous processing tasks will fail on any attempt number greater than
0. ContinuousExecution will catch these failures and restart globally
from the last recorded checkpoints.
## How was this patch tested? unit test
Author: Jose Torres <jose@databricks.com>
Closes #20225 from jose-torres/no-retry.
(cherry picked from commit 86a845031824a5334db6a5299c6f5dcc982bc5b8)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
(commit: dbd2a5566d8924ab340c3c840d31e83e5af92242)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDDIter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTaskRetryException.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala (diff)
Commit 79ccd0cadf09c41c0f4b5853a54798be17a20584 by zsxwing
[SPARK-23093][SS] Don't change run id when reconfiguring a continuous
processing query.
## What changes were proposed in this pull request?
Keep the run ID static, using a different ID for the epoch coordinator
to avoid cross-execution message contamination.
## How was this patch tested?
new and existing unit tests
Author: Jose Torres <jose@databricks.com>
Closes #20282 from jose-torres/fix-runid.
(cherry picked from commit e946c63dd56d121cf898084ed7e9b5b0868b226e)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(commit: 79ccd0cadf09c41c0f4b5853a54798be17a20584)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochCoordinator.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDDIter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala (diff)
Commit 6e509fde3f056316f46c71b672a7d69adb1b4f8e by hyukjinkwon
[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in
ArrowColumnVector
## What changes were proposed in this pull request? This PR changes
usage of `MapVector` in Spark codebase to use `NullableMapVector`.
`MapVector` is an internal Arrow class that is not supposed to be used
directly. We should use `NullableMapVector` instead.
## How was this patch tested?
Existing test.
Author: Li Jin <ice.xelloss@gmail.com>
Closes #20239 from icexelloss/arrow-map-vector.
(cherry picked from commit 4e6f8fb150ae09c7d1de6beecb2b98e5afa5da19)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
(commit: 6e509fde3f056316f46c71b672a7d69adb1b4f8e)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ArrowColumnVectorSuite.scala (diff)
Commit b84c2a30665ebbd65feb7418826501f6c959eb96 by hyukjinkwon
[SPARK-23132][PYTHON][ML] Run doctests in ml.image when testing
## What changes were proposed in this pull request?
This PR proposes to actually run the doctests in `ml/image.py`.
## How was this patch tested?
doctests in `python/pyspark/ml/image.py`.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #20294 from HyukjinKwon/trigger-image.
(cherry picked from commit 45ad97df87c89cb94ce9564e5773897b6d9326f5)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
(commit: b84c2a30665ebbd65feb7418826501f6c959eb96)
The file was modifiedpython/pyspark/ml/image.py (diff)
Commit 9783aea2c75700e7ce9551ccfd33e43765de8981 by zsxwing
[SPARK-23119][SS] Minor fixes to V2 streaming APIs
## What changes were proposed in this pull request?
- Added `InterfaceStability.Evolving` annotations
- Improved docs.
## How was this patch tested? Existing tests.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #20286 from tdas/SPARK-23119.
(cherry picked from commit bac0d661af6092dd26638223156827aceb901229)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(commit: 9783aea2c75700e7ce9551ccfd33e43765de8981)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/ContinuousReadSupport.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/reader/PartitionOffset.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/reader/Offset.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/reader/MicroBatchReader.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/reader/ContinuousReader.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/reader/ContinuousDataReader.java (diff)
Commit 050c1e24e506ff224bcf4e3e458e57fbd216765c by zsxwing
[SPARK-23064][DOCS][SS] Added documentation for stream-stream joins
## What changes were proposed in this pull request? Added documentation
for stream-stream joins
![image](https://user-images.githubusercontent.com/663212/35018744-e999895a-fad7-11e7-9d6a-8c7a73e6eb9c.png)
![image](https://user-images.githubusercontent.com/663212/35018775-157eb464-fad8-11e7-879e-47a2fcbd8690.png)
![image](https://user-images.githubusercontent.com/663212/35018784-27791a24-fad8-11e7-98f4-7ff246f62a74.png)
![image](https://user-images.githubusercontent.com/663212/35018791-36a80334-fad8-11e7-9791-f85efa7c6ba2.png)
## How was this patch tested?
N/a
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #20255 from tdas/join-docs.
(cherry picked from commit 1002bd6b23ff78a010ca259ea76988ef4c478c6e)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(commit: 050c1e24e506ff224bcf4e3e458e57fbd216765c)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
Commit f2688ef0fbd9d355d13ce4056d35e99970f4cd47 by zsxwing
[SPARK-21996][SQL] read files with space in name for streaming
## What changes were proposed in this pull request?
Structured streaming is now able to read files with space in file name
(previously it would skip the file and output a warning)
## How was this patch tested?
Added new unit test.
Author: Xiayun Sun <xiayunsun@gmail.com>
Closes #19247 from xysun/SPARK-21996.
(cherry picked from commit 02194702068291b3af77486d01029fb848c36d7b)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(commit: f2688ef0fbd9d355d13ce4056d35e99970f4cd47)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala (diff)
Commit 3a80cc59b54bb8e92df507777e836f167a4db14e by ueshin
[SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext
and Catalog in PySpark
## What changes were proposed in this pull request?
This PR proposes to deprecate `register*` for UDFs in `SQLContext` and
`Catalog` in Spark 2.3.0.
These are inconsistent with Scala / Java APIs and also these basically
do the same things with `spark.udf.register*`.
Also, this PR moves the logcis from
`[sqlContext|spark.catalog].register*` to `spark.udf.register*` and
reuse the docstring.
This PR also handles minor doc corrections. It also includes
https://github.com/apache/spark/pull/20158
## How was this patch tested?
Manually tested, manually checked the API documentation and tests added
to check if deprecated APIs call the aliases correctly.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #20288 from HyukjinKwon/deprecate-udf.
(cherry picked from commit 39d244d921d8d2d3ed741e8e8f1175515a74bdbd)
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 3a80cc59b54bb8e92df507777e836f167a4db14e)
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was modifiedpython/pyspark/sql/group.py (diff)
The file was modifiedpython/pyspark/sql/tests.py (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/udf.py (diff)
The file was modifiedpython/pyspark/sql/context.py (diff)
The file was modifiedpython/pyspark/sql/catalog.py (diff)
The file was modifiedpython/pyspark/sql/session.py (diff)
Commit 2a87c3a77cbe40cbe5a8bdef41e3c37a660e2308 by tathagata.das1565
[SPARK-23052][SS] Migrate ConsoleSink to data source V2 api.
## What changes were proposed in this pull request?
Migrate ConsoleSink to data source V2 api.
Note that this includes a missing piece in DataStreamWriter required to
specify a data source V2 writer.
Note also that I've removed the "Rerun batch" part of the sink, because
as far as I can tell this would never have actually happened. A
MicroBatchExecution object will only commit each batch once for its
lifetime, and a new MicroBatchExecution object would have a new
ConsoleSink object which doesn't know it's retrying a batch. So I think
this represents an anti-feature rather than a weakness in the V2 API.
## How was this patch tested?
new unit test
Author: Jose Torres <jose@databricks.com>
Closes #20243 from jose-torres/console-sink.
(cherry picked from commit 1c76a91e5fae11dcb66c453889e587b48039fdc9)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
(commit: 2a87c3a77cbe40cbe5a8bdef41e3c37a660e2308)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriterSuite.scala
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/PackedRowWriterFactory.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamReaderWriterSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
Commit f801ac417ba13a975887ba83904ee771bc3a003e by wenchen
[SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Session state's
planner
## What changes were proposed in this pull request?
`DataSourceV2Strategy` is missing in `HiveSessionStateBuilder`'s
planner, which will throw exception as described in
[SPARK-23140](https://issues.apache.org/jira/browse/SPARK-23140).
## How was this patch tested?
Manual test.
Author: jerryshao <sshao@hortonworks.com>
Closes #20305 from jerryshao/SPARK-23140.
(cherry picked from commit 7a2248341396840628eef398aa512cac3e3bd55f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f801ac417ba13a975887ba83904ee771bc3a003e)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala (diff)
Commit 8a98274823a4671cee85081dd19f40146e736325 by wenchen
[SPARK-22036][SQL] Decimal multiplication with high precision/scale
often returns NULL
## What changes were proposed in this pull request?
When there is an operation between Decimals and the result is a number
which is not representable exactly with the result's precision and
scale, Spark is returning `NULL`. This was done to reflect Hive's
behavior, but it is against SQL ANSI 2011, which states that "If the
result cannot be represented exactly in the result type, then whether it
is rounded or truncated is implementation-defined". Moreover, Hive now
changed its behavior in order to respect the standard, thanks to
HIVE-15331.
Therefore, the PR propose to:
- update the rules to determine the result precision and scale according
to the new Hive's ones introduces in HIVE-15331;
- round the result of the operations, when it is not representable
exactly with the result's precision and scale, instead of returning
`NULL`
- introduce a new config
`spark.sql.decimalOperations.allowPrecisionLoss` which default to `true`
(ie. the new behavior) in order to allow users to switch back to the
previous one.
Hive behavior reflects SQLServer's one. The only difference is that the
precision and scale are adjusted for all the arithmetic operations in
Hive, while SQL Server is said to do so only for multiplications and
divisions in the documentation. This PR follows Hive's behavior.
A more detailed explanation is available here:
https://mail-archives.apache.org/mod_mbox/spark-dev/201712.mbox/%3CCAEorWNAJ4TxJR9NBcgSFMD_VxTg8qVxusjP%2BAJP-x%2BJV9zH-yA%40mail.gmail.com%3E.
## How was this patch tested?
modified and added UTs. Comparisons with results of Hive and SQLServer.
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #20023 from mgaido91/SPARK-22036.
(cherry picked from commit e28eb431146bcdcaf02a6f6c406ca30920592a6a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8a98274823a4671cee85081dd19f40146e736325)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifieddocs/sql-programming-guide.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/decimalArithmeticOperations.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalArithmeticOperations.sql (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/decimalPrecision.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala (diff)
Commit e0421c65093f66b365539358dd9be38d2006fa47 by hyukjinkwon
[SPARK-23141][SQL][PYSPARK] Support data type string as a returnType for
registerJavaFunction.
## What changes were proposed in this pull request?
Currently `UDFRegistration.registerJavaFunction` doesn't support data
type string as a `returnType` whereas `UDFRegistration.register`, `udf`,
or `pandas_udf` does. We can support it for
`UDFRegistration.registerJavaFunction` as well.
## How was this patch tested?
Added a doctest and existing tests.
Author: Takuya UESHIN <ueshin@databricks.com>
Closes #20307 from ueshin/issues/SPARK-23141.
(cherry picked from commit 5063b7481173ad72bd0dc941b5cf3c9b26a591e4)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
(commit: e0421c65093f66b365539358dd9be38d2006fa47)
The file was modifiedpython/pyspark/sql/udf.py (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
Commit bd0a1627b9396c69dbe3554e6ca6c700eeb08f74 by vanzin
[SPARK-23147][UI] Fix task page table IndexOutOfBound Exception
## What changes were proposed in this pull request?
Stage's task page table will throw an exception when there's no complete
tasks. Furthermore, because the `dataSize` doesn't take running tasks
into account, so sometimes UI cannot show the running tasks. Besides
table will only be displayed when first task is finished according to
the default sortColumn("index").
![screen shot 2018-01-18 at 8 50 08
pm](https://user-images.githubusercontent.com/850797/35100052-470b4cae-fc95-11e7-96a2-ad9636e732b3.png)
To reproduce this issue, user could try `sc.parallelize(1 to 20, 20).map
{ i => Thread.sleep(10000); i }.collect()` or `sc.parallelize(1 to 20,
20).map { i => Thread.sleep((20 - i) * 1000); i }.collect` to reproduce
the above issue.
Here propose a solution to fix it. Not sure if it is a right fix, please
help to review.
## How was this patch tested?
Manual test.
Author: jerryshao <sshao@hortonworks.com>
Closes #20315 from jerryshao/SPARK-23147.
(cherry picked from commit cf7ee1767ddadce08dce050fc3b40c77cdd187da)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: bd0a1627b9396c69dbe3554e6ca6c700eeb08f74)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala (diff)
Commit bfdbdd37951a872676a22b0524cbde12a1df418d by sowen
[SPARK-23029][DOCS] Specifying default units of configuration entries
## What changes were proposed in this pull request? This PR completes
the docs, specifying the default units assumed in configuration entries
of type size. This is crucial since unit-less values are accepted and
the user might assume the base unit is bytes, which in most cases it is
not, leading to hard-to-debug problems.
## How was this patch tested? This patch updates only documentation
only.
Author: Fernando Pereira <fernando.pereira@epfl.ch>
Closes #20269 from ferdonline/docs_units.
(cherry picked from commit 9678941f54ebc5db935ed8d694e502086e2a31c0)
Signed-off-by: Sean Owen <sowen@cloudera.com>
(commit: bfdbdd37951a872676a22b0524cbde12a1df418d)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifieddocs/configuration.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkConf.scala (diff)
Commit e6e8bbe84625861f3a4834a2d71cb2f0fe7f6b5a by tathagata.das1565
[SPARK-23143][SS][PYTHON] Added python API for setting continuous
trigger
## What changes were proposed in this pull request? Self-explanatory.
## How was this patch tested? New python tests.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #20309 from tdas/SPARK-23143.
(cherry picked from commit 2d41f040a34d6483919fd5d491cf90eee5429290)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
(commit: e6e8bbe84625861f3a4834a2d71cb2f0fe7f6b5a)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedpython/pyspark/sql/tests.py (diff)
Commit 1f88fcd41c6c5521d732b25e83d6c9d150d7f24a by tathagata.das1565
[SPARK-23144][SS] Added console sink for continuous processing
## What changes were proposed in this pull request? Refactored
ConsoleWriter into ConsoleMicrobatchWriter and ConsoleContinuousWriter.
## How was this patch tested? new unit test
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #20311 from tdas/SPARK-23144.
(cherry picked from commit bf34d665b9c865e00fac7001500bf6d521c2dff9)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
(commit: 1f88fcd41c6c5521d732b25e83d6c9d150d7f24a)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriterSuite.scala (diff)
Commit b8c6d9303d029f6bf8ee43bae3f159112eb0fb79 by vanzin
[SPARK-23133][K8S] Fix passing java options to Executor
Pass through spark java options to the executor in context of docker
image. Closes #20296
andrusha: Deployed two version of containers to local k8s, checked that
java options were present in the updated image on the running executor.
Manual test
Author: Andrew Korzhuev <korzhuev@andrusha.me>
Closes #20322 from foxish/patch-1.
(cherry picked from commit f568e9cf76f657d094f1d036ab5a95f2531f5761)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: b8c6d9303d029f6bf8ee43bae3f159112eb0fb79)
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh (diff)
Commit a295034da6178f8654c3977903435384b3765b5e by hyukjinkwon
[SPARK-23094] Fix invalid character handling in JsonDataSource
## What changes were proposed in this pull request?
There were two related fixes regarding `from_json`, `get_json_object`
and `json_tuple` ([Fix
#1](https://github.com/apache/spark/commit/c8803c06854683c8761fdb3c0e4c55d5a9e22a95),
[Fix
#2](https://github.com/apache/spark/commit/86174ea89b39a300caaba6baffac70f3dc702788)),
but they weren't comprehensive it seems. I wanted to extend those fixes
to all the parsers, and add tests for each case.
## How was this patch tested?
Regression tests
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #20302 from brkyvz/json-invfix.
(cherry picked from commit e01919e834d301e13adc8919932796ebae900576)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
(commit: a295034da6178f8654c3977903435384b3765b5e)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/sources/JsonHadoopFsRelationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/CreateJacksonParser.scala (diff)
Commit 7057e310ab3756c83c13586137e8390fe9ef7e9a by vanzin
[SPARK-22962][K8S] Fail fast if submission client local files are used
## What changes were proposed in this pull request?
In the Kubernetes mode, fails fast in the submission process if any
submission client local dependencies are used as the use case is not
supported yet.
## How was this patch tested?
Unit tests, integration tests, and manual tests.
vanzin foxish
Author: Yinan Li <liyinan926@gmail.com>
Closes #20320 from liyinan926/master.
(cherry picked from commit 5d7c4ba4d73a72f26d591108db3c20b4a6c84f3f)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: 7057e310ab3756c83c13586137e8390fe9ef7e9a)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestrator.scala (diff)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala (diff)
Commit acf3b70d16cc4d2416b4ce3f42b3cf95836170ed by tathagata.das1565
[SPARK-23142][SS][DOCS] Added docs for continuous processing
## What changes were proposed in this pull request?
Added documentation for continuous processing. Modified two locations.
- Modified the overview to have a mention of Continuous Processing.
- Added a new section on Continuous Processing at the end.
![image](https://user-images.githubusercontent.com/663212/35083551-a3dd23f6-fbd4-11e7-9e7e-90866f131ca9.png)
![image](https://user-images.githubusercontent.com/663212/35083618-d844027c-fbd4-11e7-9fde-75992cc517bd.png)
## How was this patch tested? N/A
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #20308 from tdas/SPARK-23142.
(cherry picked from commit 4cd2ecc0c7222fef1337e04f1948333296c3be86)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
(commit: acf3b70d16cc4d2416b4ce3f42b3cf95836170ed)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
Commit 225b1afdd1582cd4087e7cb98834505eaf16743e by sowen
[DOCS] change to dataset for java code in
structured-streaming-kafka-integration document
## What changes were proposed in this pull request?
In latest structured-streaming-kafka-integration document, Java code
example for Kafka integration is using `DataFrame<Row>`, shouldn't it be
changed to `DataSet<Row>`?
## How was this patch tested?
manual test has been performed to test the updated example Java code in
Spark 2.2.1 with Kafka 1.0
Author: brandonJY <brandonJY@users.noreply.github.com>
Closes #20312 from brandonJY/patch-2.
(cherry picked from commit 6121e91b7f5c9513d68674e4d5edbc3a4a5fd5fd)
Signed-off-by: Sean Owen <sowen@cloudera.com>
(commit: 225b1afdd1582cd4087e7cb98834505eaf16743e)
The file was modifieddocs/structured-streaming-kafka-integration.md (diff)
Commit 541dbc00b24f17d83ea2531970f2e9fe57fe3718 by wenchen
[SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType casting when casting
PythonUserDefinedType to String.
## What changes were proposed in this pull request?
This is a follow-up of #20246.
If a UDT in Python doesn't have its corresponding Scala UDT, cast to
string will be the raw string of the internal value, e.g.
`"org.apache.spark.sql.catalyst.expressions.UnsafeArrayDataxxxxxxxx"` if
the internal type is `ArrayType`.
This pr fixes it by using its `sqlType` casting.
## How was this patch tested?
Added a test and existing tests.
Author: Takuya UESHIN <ueshin@databricks.com>
Closes #20306 from ueshin/issues/SPARK-23054/fup1.
(cherry picked from commit 568055da93049c207bb830f244ff9b60c638837c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 541dbc00b24f17d83ea2531970f2e9fe57fe3718)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala (diff)
The file was modifiedpython/pyspark/sql/tests.py (diff)
Commit 54c1fae12df654c7713ac5e7eb4da7bb2f785401 by sameerag
[BUILD][MINOR] Fix java style check issues
## What changes were proposed in this pull request?
This patch fixes a few recently introduced java style check errors in
master and release branch.
As an aside, given that [java linting currently
fails](https://github.com/apache/spark/pull/10763
) on machines with a clean maven cache, it'd be great to find another
workaround to [re-enable the java style
checks](https://github.com/apache/spark/blob/3a07eff5af601511e97a05e6fea0e3d48f74c4f0/dev/run-tests.py#L577)
as part of Spark PRB.
/cc zsxwing JoshRosen srowen for any suggestions
## How was this patch tested?
Manual Check
Author: Sameer Agarwal <sameerag@apache.org>
Closes #20323 from sameeragarwal/java.
(cherry picked from commit 9c4b99861cda3f9ec44ca8c1adc81a293508190c)
Signed-off-by: Sameer Agarwal <sameerag@apache.org>
(commit: 54c1fae12df654c7713ac5e7eb4da7bb2f785401)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaBatchDataSourceV2.java (diff)
Commit e58223171ecae6450482aadf4e7994c3b8d8a58d by nickp
[SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols
parameter
Update user guide entry for `FeatureHasher` to match the Scala / Python
doc, to describe the `categoricalCols` parameter.
## How was this patch tested?
Doc only
Author: Nick Pentreath <nickp@za.ibm.com>
Closes #20293 from MLnick/SPARK-23127-catCol-userguide.
(cherry picked from commit 60203fca6a605ad158184e1e0ce5187e144a3ea7)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
(commit: e58223171ecae6450482aadf4e7994c3b8d8a58d)
The file was modifieddocs/ml-features.md (diff)
Commit ef7989d55b65f386ed1ab87535a44e9367029a52 by nickp
[SPARK-23048][ML] Add OneHotEncoderEstimator document and examples
## What changes were proposed in this pull request?
We have `OneHotEncoderEstimator` now and `OneHotEncoder` will be
deprecated since 2.3.0. We should add `OneHotEncoderEstimator` into
mllib document.
We also need to provide corresponding examples for
`OneHotEncoderEstimator` which are used in the document too.
## How was this patch tested?
Existing tests.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #20257 from viirya/SPARK-23048.
(cherry picked from commit b74366481cc87490adf4e69d26389ec737548c15)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
(commit: ef7989d55b65f386ed1ab87535a44e9367029a52)
The file was removedexamples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderExample.scala
The file was removedexamples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderExample.java
The file was removedexamples/src/main/python/ml/onehot_encoder_example.py
The file was addedexamples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderEstimatorExample.scala
The file was addedexamples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderEstimatorExample.java
The file was modifieddocs/ml-features.md (diff)
The file was addedexamples/src/main/python/ml/onehot_encoder_estimator_example.py
Commit b7a81999df8f43223403c77db9c1aedddb58370d by wenchen
[SPARK-23089][STS] Recreate session log directory if it doesn't exist
## What changes were proposed in this pull request?
When creating a session directory, Thrift should create the parent
directory (i.e. /tmp/base_session_log_dir) if it is not present. It is
common that many tools delete empty directories, so the directory may be
deleted. This can cause the session log to be disabled.
This was fixed in HIVE-12262: this PR brings it in Spark too.
## How was this patch tested?
manual tests
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #20281 from mgaido91/SPARK-23089.
(cherry picked from commit e41400c3c8aace9eb72e6134173f222627fb0faf)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: b7a81999df8f43223403c77db9c1aedddb58370d)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java (diff)
Commit 8d6845cf926a14e21ca29a43f2cc9a3a9475afd5 by wenchen
[SPARK-23000][TEST] Keep Derby DB Location Unchanged After Session
Cloning
## What changes were proposed in this pull request? After session
cloning in `TestHive`, the conf of the singleton SparkContext for derby
DB location is changed to a new directory. The new directory is created
in `HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false)`.
This PR is to keep the conf value of
`ConfVars.METASTORECONNECTURLKEY.varname` unchanged during the session
clone.
## How was this patch tested? The issue can be reproduced by the
command:
> build/sbt -Phive "hive/test-only
org.apache.spark.sql.hive.HiveSessionStateSuite
org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite"
Also added a test case.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #20328 from gatorsmile/fixTestFailure.
(cherry picked from commit 6c39654efcb2aa8cb4d082ab7277a6fa38fb48e4)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8d6845cf926a14e21ca29a43f2cc9a3a9475afd5)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSessionStateSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SessionStateSuite.scala (diff)
Commit 55efeffd774a776806f379df5b2209af05270cc4 by gatorsmile
[SPARK-23149][SQL] polish ColumnarBatch
## What changes were proposed in this pull request?
Several cleanups in `ColumnarBatch`
* remove `schema`. The `ColumnVector`s inside `ColumnarBatch` already
have the data type information, we don't need this `schema`.
* remove `capacity`. `ColumnarBatch` is just a wrapper of
`ColumnVector`s, not builders, it doesn't need a capacity property.
* remove `DEFAULT_BATCH_SIZE`. As a wrapper, `ColumnarBatch` can't
decide the batch size, it should be decided by the reader, e.g. parquet
reader, orc reader, cached table reader. The default batch size should
also be defined by the reader.
## How was this patch tested?
existing tests.
Author: Wenchen Fan <wenchen@databricks.com>
Closes #20316 from cloud-fan/columnar-batch.
(cherry picked from commit d8aaa771e249b3f54b57ce24763e53fd65a0dbf7)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 55efeffd774a776806f379df5b2209af05270cc4)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaBatchDataSourceV2.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala (diff)
Commit ffe45913d0c666185f8c252be30b5e269a909c07 by vanzin
[SPARK-23104][K8S][DOCS] Changes to Kubernetes scheduler documentation
## What changes were proposed in this pull request?
Docs changes:
- Adding a warning that the backend is experimental.
- Removing a defunct internal-only option from documentation
- Clarifying that node selectors can be used right away, and other minor
cosmetic changes
## How was this patch tested?
Docs only change
Author: foxish <ramanathana@google.com>
Closes #20314 from foxish/ambiguous-docs.
(cherry picked from commit 73d3b230f3816a854a181c0912d87b180e347271)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: ffe45913d0c666185f8c252be30b5e269a909c07)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifieddocs/cluster-overview.md (diff)
Commit 4b79514c90ca76674d17fd80d125e9dbfb0e845e by irashid
[SPARK-20664][CORE] Delete stale application data from SHS.
Detect the deletion of event log files from storage, and remove data
about the related application attempt in the SHS.
Also contains code to fix SPARK-21571 based on code by ericvandenbergfb.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #20138 from vanzin/SPARK-20664.
(cherry picked from commit fed2139f053fac4a9a6952ff0ab1cc2a5f657bd0)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
(commit: 4b79514c90ca76674d17fd80d125e9dbfb0e845e)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala (diff)
Commit d0cb19873bb325be7e31de62b0ba117dd6b92619 by irashid
[SPARK-23103][CORE] Ensure correct sort order for negative values in
LevelDB.
The code was sorting "0" as "less than" negative values, which is a
little wrong. Fix is simple, most of the changes are the added test and
related cleanup.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #20284 from vanzin/SPARK-23103.
(cherry picked from commit aa3a1276f9e23ffbb093d00743e63cd4369f9f57)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
(commit: d0cb19873bb325be7e31de62b0ba117dd6b92619)
The file was modifiedcommon/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala (diff)
The file was modifiedcommon/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java (diff)
The file was modifiedcommon/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java (diff)
Commit f9ad00a5aeeecf4b8d261a0dae6c8cb6be8daa67 by sameerag
[SPARK-23135][UI] Fix rendering of accumulators in the stage page.
This follows the behavior of 2.2: only named accumulators with a value
are rendered.
Screenshot:
![accs](https://user-images.githubusercontent.com/1694083/35065700-df409114-fb82-11e7-87c1-550c3f674371.png)
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #20299 from vanzin/SPARK-23135.
(cherry picked from commit f6da41b0150725fe96ccb2ee3b48840b207f47eb)
Signed-off-by: Sameer Agarwal <sameerag@apache.org>
(commit: f9ad00a5aeeecf4b8d261a0dae6c8cb6be8daa67)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala (diff)
Commit c647f918b1aee27d7a53852aca74629f03ad49f6 by gatorsmile
[SPARK-21771][SQL] remove useless hive client in SparkSQLEnv
## What changes were proposed in this pull request?
Once a meta hive client is created, it generates its SessionState which
creates a lot of session related directories, some deleteOnExit, some
does not. if a hive client is useless we may not create it at the very
start.
## How was this patch tested? N/A
cc hvanhovell cloud-fan
Author: Kent Yao <11215016@zju.edu.cn>
Closes #18983 from yaooqinn/patch-1.
(cherry picked from commit 793841c6b8b98b918dcf241e29f60ef125914db9)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: c647f918b1aee27d7a53852aca74629f03ad49f6)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala (diff)
Commit 0cde5212a80b5572bfe53b06ed557e6c2ec8c903 by gatorsmile
[SPARK-23091][ML] Incorrect unit test for approxQuantile
## What changes were proposed in this pull request?
Narrow bound on approx quantile test to epsilon from 2*epsilon to match
paper
## How was this patch tested?
Existing tests.
Author: Sean Owen <sowen@cloudera.com>
Closes #20324 from srowen/SPARK-23091.
(cherry picked from commit 396cdfbea45232bacbc03bfaf8be4ea85d47d3fd)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 0cde5212a80b5572bfe53b06ed557e6c2ec8c903)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala (diff)
Commit e11d5eaf79ffccbe3a5444a5b9ecf3a203e1fc90 by gatorsmile
[SPARK-23165][DOC] Spelling mistake fix in quick-start doc.
## What changes were proposed in this pull request?
Fix spelling in quick-start doc.
## How was this patch tested?
Doc only.
Author: Shashwat Anand <me@shashwat.me>
Closes #20336 from ashashwat/SPARK-23165.
(cherry picked from commit 84a076e0e9a38a26edf7b702c24fdbbcf1e697b9)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: e11d5eaf79ffccbe3a5444a5b9ecf3a203e1fc90)
The file was modifieddocs/graphx-programming-guide.md (diff)
The file was modifieddocs/configuration.md (diff)
The file was modifieddocs/running-on-mesos.md (diff)
The file was modifieddocs/running-on-yarn.md (diff)
The file was modifieddocs/monitoring.md (diff)
The file was modifieddocs/cloud-integration.md (diff)
The file was modifieddocs/structured-streaming-kafka-integration.md (diff)
The file was modifieddocs/quick-start.md (diff)
The file was modifieddocs/streaming-programming-guide.md (diff)
The file was modifieddocs/submitting-applications.md (diff)
The file was modifieddocs/sql-programming-guide.md (diff)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
The file was modifieddocs/security.md (diff)
The file was modifieddocs/storage-openstack-swift.md (diff)
Commit b9c1367b7d9240070c5d83572dc7b43c7480b456 by gatorsmile
[SPARK-21786][SQL] The 'spark.sql.parquet.compression.codec' and
'spark.sql.orc.compression.codec' configuration doesn't take effect on
hive table writing
[SPARK-21786][SQL] The 'spark.sql.parquet.compression.codec' and
'spark.sql.orc.compression.codec' configuration doesn't take effect on
hive table writing
What changes were proposed in this pull request?
Pass ‘spark.sql.parquet.compression.codec’ value to
‘parquet.compression’. Pass ‘spark.sql.orc.compression.codec’ value to
‘orc.compress’.
How was this patch tested?
Add test.
Note: This is the same issue mentioned in #19218 . That branch was
deleted mistakenly, so make a new pr instead.
gatorsmile maropu dongjoon-hyun discipleforteen
Author: fjh100456 <fu.jinhua6@zte.com.cn> Author: Takeshi Yamamuro
<yamamuro@apache.org> Author: Wenchen Fan <wenchen@databricks.com>
Author: gatorsmile <gatorsmile@gmail.com> Author: Yinan Li
<liyinan926@gmail.com> Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: Juliusz Sompolski <julek@databricks.com> Author: Felix Cheung
<felixcheung_m@hotmail.com> Author: jerryshao <sshao@hortonworks.com>
Author: Li Jin <ice.xelloss@gmail.com> Author: Gera Shegalov
<gera@apache.org> Author: chetkhatri <ckhatrimanjal@gmail.com> Author:
Joseph K. Bradley <joseph@databricks.com> Author: Bago Amirbekian
<bago@databricks.com> Author: Xianjin YE <advancedxy@gmail.com> Author:
Bruce Robbins <bersprockets@gmail.com> Author: zuotingbing
<zuo.tingbing9@zte.com.cn> Author: Kent Yao <yaooqinn@hotmail.com>
Author: hyukjinkwon <gurwls223@gmail.com> Author: Adrian Ionescu
<adrian@databricks.com>
Closes #20087 from fjh100456/HiveTableWriting.
(cherry picked from commit 00d169156d4b1c91d2bcfd788b254b03c509dc41)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: b9c1367b7d9240070c5d83572dc7b43c7480b456)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveOptions.scala (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/CompressionCodecSuite.scala
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala (diff)
Commit e0ef30f770329f058843a7a486bf357e9cd6e26a by gatorsmile
[SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition
is false/null
## What changes were proposed in this pull request?
CheckCartesianProduct raises an AnalysisException also when the join
condition is always false/null. In this case, we shouldn't raise it,
since the result will not be a cartesian product.
## How was this patch tested?
added UT
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #20333 from mgaido91/SPARK-23087.
(cherry picked from commit 121dc96f088a7b157d5b2cffb626b0e22d1fc052)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: e0ef30f770329f058843a7a486bf357e9cd6e26a)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala (diff)
Commit 7520491bf80eb2e21f0630aa13d7cdaad881626b by felixcheung
[SPARK-21293][SS][SPARKR] Add doc example for streaming join, dedup
## What changes were proposed in this pull request?
streaming programming guide changes
## How was this patch tested?
manually
Author: Felix Cheung <felixcheung_m@hotmail.com>
Closes #20340 from felixcheung/rstreamdoc.
(cherry picked from commit 2239d7a410e906ccd40aa8e84d637e9d06cd7b8a)
Signed-off-by: Felix Cheung <felixcheung@apache.org>
(commit: 7520491bf80eb2e21f0630aa13d7cdaad881626b)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
Commit 5781fa79e28e2123e370fc1096488e318f2b4ee2 by sshao
[SPARK-22976][CORE] Cluster mode driver dir removed while running
## What changes were proposed in this pull request?
The clean up logic on the worker perviously determined the liveness of a
particular applicaiton based on whether or not it had running executors.
This would fail in the case that a directory was made for a driver
running in cluster mode if that driver had no running executors on the
same machine. To preserve driver directories we consider both executors
and running drivers when checking directory liveness.
## How was this patch tested?
Manually started up two node cluster with a single core on each node.
Turned on worker directory cleanup and set the interval to 1 second and
liveness to one second. Without the patch the driver directory is
removed immediately after the app is launched. With the patch it is not
### Without Patch
``` INFO  2018-01-05 23:48:24,693 Logging.scala:54 - Asked to launch
driver driver-20180105234824-0000 INFO  2018-01-05 23:48:25,293
Logging.scala:54 - Changing view acls to: cassandra INFO  2018-01-05
23:48:25,293 Logging.scala:54 - Changing modify acls to: cassandra INFO
2018-01-05 23:48:25,294 Logging.scala:54 - Changing view acls groups to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing modify acls
groups to: INFO  2018-01-05 23:48:25,294 Logging.scala:54 -
SecurityManager: authentication disabled; ui acls disabled; users  with
view permissions: Set(cassandra); groups with view permissions: Set();
users  with modify permissions: Set(cassandra); groups with modify
permissions: Set() INFO  2018-01-05 23:48:25,330 Logging.scala:54 -
Copying user jar file:/home/automaton/writeRead-0.1.jar to
/var/lib/spark/worker/driver-20180105234824-0000/writeRead-0.1.jar INFO
2018-01-05 23:48:25,332 Logging.scala:54 - Copying
/home/automaton/writeRead-0.1.jar to
/var/lib/spark/worker/driver-20180105234824-0000/writeRead-0.1.jar INFO
2018-01-05 23:48:25,361 Logging.scala:54 - Launch Command:
"/usr/lib/jvm/jdk1.8.0_40//bin/java" ....
**** INFO  2018-01-05 23:48:56,577 Logging.scala:54 - Removing
directory: /var/lib/spark/worker/driver-20180105234824-0000  ### <<
Cleaned up
****
-- One minute passes while app runs (app has 1 minute sleep built in)
--
WARN  2018-01-05 23:49:58,080 ShuffleSecretManager.java:73 - Attempted
to unregister application app-20180105234831-0000 when it is not
registered INFO  2018-01-05 23:49:58,081
ExternalShuffleBlockResolver.java:163 - Application
app-20180105234831-0000 removed, cleanupLocalDirs = false INFO
2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 -
Application app-20180105234831-0000 removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,082 ExternalShuffleBlockResolver.java:163 -
Application app-20180105234831-0000 removed, cleanupLocalDirs = true
INFO  2018-01-05 23:50:00,999 Logging.scala:54 - Driver
driver-20180105234824-0000 exited successfully
```
With Patch
``` INFO  2018-01-08 23:19:54,603 Logging.scala:54 - Asked to launch
driver driver-20180108231954-0002 INFO  2018-01-08 23:19:54,975
Logging.scala:54 - Changing view acls to: automaton INFO  2018-01-08
23:19:54,976 Logging.scala:54 - Changing modify acls to: automaton INFO
2018-01-08 23:19:54,976 Logging.scala:54 - Changing view acls groups to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls
groups to: INFO  2018-01-08 23:19:54,976 Logging.scala:54 -
SecurityManager: authentication disabled; ui acls disabled; users  with
view permissions: Set(automaton); groups with view permissions: Set();
users  with modify permissions: Set(automaton); groups with modify
permissions: Set() INFO  2018-01-08 23:19:55,029 Logging.scala:54 -
Copying user jar file:/home/automaton/writeRead-0.1.jar to
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar INFO
2018-01-08 23:19:55,031 Logging.scala:54 - Copying
/home/automaton/writeRead-0.1.jar to
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar INFO
2018-01-08 23:19:55,038 Logging.scala:54 - Launch Command: ...... INFO
2018-01-08 23:21:28,674 ShuffleSecretManager.java:69 - Unregistered
shuffle secret for application app-20180108232000-0000 INFO  2018-01-08
23:21:28,675 ExternalShuffleBlockResolver.java:163 - Application
app-20180108232000-0000 removed, cleanupLocalDirs = false INFO
2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 -
Application app-20180108232000-0000 removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,681 ExternalShuffleBlockResolver.java:163 -
Application app-20180108232000-0000 removed, cleanupLocalDirs = true
INFO  2018-01-08 23:21:31,703 Logging.scala:54 - Driver
driver-20180108231954-0002 exited successfully
***** INFO  2018-01-08 23:21:32,346 Logging.scala:54 - Removing
directory: /var/lib/spark/worker/driver-20180108231954-0002 ### <
Happening AFTER the Run completes rather than during it
*****
```
Author: Russell Spitzer <Russell.Spitzer@gmail.com>
Closes #20298 from RussellSpitzer/SPARK-22976-master.
(cherry picked from commit 11daeb833222b1cd349fb1410307d64ab33981db)
Signed-off-by: jerryshao <sshao@hortonworks.com>
(commit: 5781fa79e28e2123e370fc1096488e318f2b4ee2)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/Worker.scala (diff)
Commit 36af73b59b6fb3d5f8e8a8e1caf44bd565e97b3d by hyukjinkwon
[MINOR][SQL] Fix wrong comments on
org.apache.spark.sql.parquet.row.attributes
## What changes were proposed in this pull request?
This PR fixes the wrong comment on
`org.apache.spark.sql.parquet.row.attributes` which is useful for UDTs
like Vector/Matrix. Please see
[SPARK-22320](https://issues.apache.org/jira/browse/SPARK-22320) for the
usage.
Originally,
[SPARK-19411](https://github.com/apache/spark/commit/bf493686eb17006727b3ec81849b22f3df68fdef#diff-ee26d4c4be21e92e92a02e9f16dbc285L314)
left this behind during removing optional column metadatas. In the same
PR, the same comment was removed at line 310-311.
## How was this patch tested?
N/A (This is about comments).
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #20346 from dongjoon-hyun/minor_comment_parquet.
(cherry picked from commit 8142a3b883a5fe6fc620a2c5b25b6bde4fda32e5)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
(commit: 36af73b59b6fb3d5f8e8a8e1caf44bd565e97b3d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (diff)
Commit 57c320a0dcc6ca784331af0191438e252d418075 by wenchen
[SPARK-23020][CORE] Fix races in launcher code, test.
The race in the code is because the handle might update its state to the
wrong state if the connection handling thread is still processing
incoming data; so the handle needs to wait for the connection to finish
up before checking the final state.
The race in the test is because when waiting for a handle to reach a
final state, the waitFor() method needs to wait until all handle state
is updated (which also includes waiting for the connection thread above
to finish). Otherwise, waitFor() may return too early, which would cause
a bunch of different races (like the listener not being yet notified of
the state change, or being in the middle of being notified, or the
handle not being properly disposed and causing postChecks() to assert).
On top of that I found, by code inspection, a couple of potential races
that could make a handle end up in the wrong state when being killed.
The original version of this fix introduced the flipped version of the
first race described above; the connection closing might override the
handle state before the handle might have a chance to do cleanup. The
fix there is to only dispose of the handle from the connection when
there is an error, and let the handle dispose itself in the normal case.
The fix also caused a bug in YarnClusterSuite to be surfaced; the code
was checking for a file in the classpath that was not expected to be
there in client mode. Because of the above issues, the error was not
propagating correctly and the (buggy) test was incorrectly passing.
Tested by running the existing unit tests a lot (and not seeing the
errors I was seeing before).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #20297 from vanzin/SPARK-23020.
(cherry picked from commit ec228976156619ed8df21a85bceb5fd3bdeb5855)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 57c320a0dcc6ca784331af0191438e252d418075)
The file was modifiedcore/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java (diff)
The file was modifiedlauncher/src/test/java/org/apache/spark/launcher/LauncherServerSuite.java (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/AbstractAppHandle.java (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java (diff)
The file was modifiedlauncher/src/test/java/org/apache/spark/launcher/BaseSuite.java (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/LauncherConnection.java (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/LauncherServer.java (diff)
Commit cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7 by sshao
[MINOR][DOC] Fix the path to the examples jar
## What changes were proposed in this pull request?
The example jar file is now in ./examples/jars directory of Spark
distribution.
Author: Arseniy Tashoyan <tashoyan@users.noreply.github.com>
Closes #20349 from tashoyan/patch-1.
(cherry picked from commit 60175e959f275d2961798fbc5a9150dac9de51ff)
Signed-off-by: jerryshao <sshao@hortonworks.com>
(commit: cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7)
The file was modifieddocs/running-on-yarn.md (diff)
Commit 743b9173f8feaed8e594961aa85d61fb3f8e5e70 by gatorsmile
[SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration
## What changes were proposed in this pull request?
This PR is to update the docs for UDF registration
## How was this patch tested?
N/A
Author: gatorsmile <gatorsmile@gmail.com>
Closes #20348 from gatorsmile/testUpdateDoc.
(cherry picked from commit 73281161fc7fddd645c712986ec376ac2b1bd213)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 743b9173f8feaed8e594961aa85d61fb3f8e5e70)
The file was modifiedpython/pyspark/sql/udf.py (diff)
Commit d933fcea6f3b1d2a5bfb03d808ec83db0f97298a by gatorsmile
[SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and
optimizer rules
## What changes were proposed in this pull request?
Dump the statistics of effective runs of analyzer and optimizer rules.
## How was this patch tested?
Do a manual run of TPCDSQuerySuite
```
=== Metrics of Analyzer/Optimizer Rules === Total number of runs: 175899
Total time: 25.486559948 seconds
Rule                                                                   
                          Effective Time / Total Time                  
Effective Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.ColumnPruning                  
                          1603280450 / 2868461549                      
761 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution        
                          2045860009 / 2056602674                      
37 / 788
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
                         440719059 / 1693110949                        
38 / 1982
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries   
                          1429834919 / 1446016225                      
39 / 285 org.apache.spark.sql.catalyst.optimizer.PruneFilters         
                                    33273083 / 1389586938              
           3 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences      
                          821183615 / 1266668754                       
616 / 1982 org.apache.spark.sql.catalyst.optimizer.ReorderJoin        
                                      775837028 / 866238225            
             132 / 1592
org.apache.spark.sql.catalyst.analysis.DecimalPrecision                
                          550683593 / 748854507                        
211 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery        
                          513075345 / 634370596                        
49 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability         
                          33475731 / 606406532                         
12 / 742
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts  
                          193144298 / 545403925                        
86 / 1982 org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
                                     18651497 / 495725004              
            7 / 1592
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin       
                          369257217 / 489934378                        
709 / 1592
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases         
                          3707000 / 468291609                          
9 / 1592
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints    
                          410155900 / 435254175                        
192 / 285
org.apache.spark.sql.execution.datasources.FindDataSourceTable         
                          348885539 / 371855866                        
233 / 1982 org.apache.spark.sql.catalyst.optimizer.NullPropagation    
                                      11307645 / 307531225             
             26 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions       
                          120324545 / 304948785                        
294 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
                    92323199 / 286695007                            38 /
1982 org.apache.spark.sql.catalyst.optimizer.PushDownPredicate         
                               230084193 / 265845972                   
      785 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings     
                          45938401 / 265144009                         
40 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion       
                          14888776 / 261499450                         
1 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion   
                          113796384 / 244913861                        
29 / 1982 org.apache.spark.sql.catalyst.optimizer.ConstantFolding     
                                     65008069 / 236548480              
            126 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator       
                          0 / 226338929                                
0 / 1982 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone       
                                    98134906 / 221323770               
           417 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator     
                          0 / 208421703                                
0 / 1592 org.apache.spark.sql.catalyst.optimizer.OptimizeIn           
                                    8762534 / 199351958                
           16 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 
                          11980016 / 190779046                         
27 / 1982
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison       
                          0 / 188887385                                
0 / 1592 org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals 
                                    0 / 186812106                      
           0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions
                         0 / 183885230                                 
0 / 1592 org.apache.spark.sql.catalyst.optimizer.SimplifyCasts         
                                   17128295 / 182901910                
          69 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division           
                          14579110 / 180309340                         
8 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality    
                          0 / 176740516                                
0 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion
                                    0 / 170781986                      
           0 / 1982
org.apache.spark.sql.catalyst.optimizer.LikeSimplification             
                          771605 / 164136736                           
1 / 1592
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions   
                          0 / 155958962                                
0 / 1592
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct        
                          0 / 151222943                                
0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder     
                          7534632 / 146596355                          
14 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion        
                          0 / 144488654                                
0 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion     
                          0 / 142403338                                
0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame     
                          12067635 / 141500665                         
21 / 1982 org.apache.spark.sql.catalyst.analysis.TimeWindowing        
                                     0 / 140431958                     
            0 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion
                          0 / 125471960                                
0 / 1982 org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin   
                                    14226972 / 124922019               
           11 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion      
                          0 / 123613887                                
0 / 1982
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery
                          8491071 / 121179056                          
7 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
                          55526073 / 120290529                         
11 / 1982 org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 
                                     0 / 113886790                     
            0 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer    
                          52383759 / 107160222                         
148 / 1982 org.apache.spark.sql.catalyst.analysis.CleanupAliases      
                                      52543524 / 102091518             
             344 / 1086
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject         
                          40682895 / 94403652                          
342 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
                          38473816 / 89740578                          
23 / 1982 org.apache.spark.sql.catalyst.optimizer.CollapseProject     
                                     46806090 / 83315506               
            281 / 1877
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation            
                          0 / 78750087                                 
0 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
                                    13742765 / 77227258                
           47 / 1982
org.apache.spark.sql.catalyst.optimizer.CombineFilters                 
                          53386729 / 76960344                          
448 / 1592
org.apache.spark.sql.execution.datasources.DataSourceAnalysis          
                          68034341 / 75724186                          
24 / 742
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions        
                          0 / 71151084                                 
0 / 750
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
                          12139848 / 67599140                          
8 / 1982
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates     
                          45017938 / 65968777                          
23 / 285
org.apache.spark.sql.execution.datasources.v2.PushDownOperatorsToDataSource
                       0 / 60937767                                    0
/ 285 org.apache.spark.sql.catalyst.optimizer.CollapseRepartition      
                                0 / 59897237                           
       0 / 1592
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion     
                          8547262 / 53941370                           
10 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 
                          0 / 52735976                                 
0 / 742
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes
                        9797713 / 52401665                            
9 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic
                          0 / 51741500                                 
0 / 742
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations       
                          28614911 / 51061186                          
233 / 1990
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions   
                          0 / 50621510                                 
0 / 285 org.apache.spark.sql.catalyst.optimizer.CombineUnions         
                                   2777800 / 50262112                  
          17 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates       
                          1640641 / 49633909                           
46 / 1982 org.apache.spark.sql.catalyst.optimizer.DecimalAggregates   
                                     20198374 / 48488419               
            100 / 385
org.apache.spark.sql.catalyst.optimizer.LimitPushDown                  
                          0 / 45052523                                 
0 / 1592 org.apache.spark.sql.catalyst.optimizer.CombineLimits        
                                    0 / 44719443                       
           0 / 1592
org.apache.spark.sql.catalyst.optimizer.EliminateSorts                 
                          0 / 44216930                                 
0 / 1592
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery       
                          36235699 / 44165786                          
148 / 285
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance     
                          0 / 42750307                                 
0 / 1982 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
                                    0 / 41811748                       
           0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
                 3819476 / 41776562                              4 /
1982 org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime        
                               0 / 40527808                            
      0 / 285 org.apache.spark.sql.catalyst.optimizer.CollapseWindow   
                                        0 / 36832538                   
               0 / 1592
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization         
                          0 / 36120667                                 
0 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy
                          0 / 32435826                                 
0 / 1982
org.apache.spark.sql.execution.datasources.PreprocessTableCreation     
                          0 / 32145218                                 
0 / 742 org.apache.spark.sql.execution.datasources.ResolveSQLOnFile   
                                   0 / 30295614                        
          0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot           
                          0 / 30111655                                 
0 / 1982
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions
59930 / 28038201                                26 / 8280
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables             
                          0 / 27808108                                 
0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases
                      0 / 27066690                                    0
/ 1982 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 
                                 0 / 26660210                          
        0 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin
                        0 / 25255184                                  
0 / 1982
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions     
                          0 / 24663088                                 
0 / 1990
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals    
                          9709079 / 24450670                           
4 / 788
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints
                         0 / 23776535                                  
0 / 750 org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions     
                                  0 / 22697895                         
         0 / 285
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts         
                          0 / 22523798                                 
0 / 285
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate   
                          988593 / 21535410                            
15 / 300 org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects  
                                    0 / 20269996                       
           0 / 285
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates      
                          0 / 19388592                                 
0 / 285 org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
                                   17675532 / 18971185                 
          215 / 285
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase             
                          0 / 18271152                                 
0 / 285 org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation
                                   2077097 / 17190855                  
          3 / 288
org.apache.spark.sql.catalyst.analysis.EliminateBarriers               
                          0 / 16736359                                 
0 / 1086 org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery     
                                    0 / 16669341                       
           0 / 285
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences           
                          0 / 14470235                                 
0 / 742
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin      
                          6715625 / 12190561                           
1 / 300
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin   
                          3451793 / 11431432                           
7 / 300
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate    
                          0 / 10810568                                 
0 / 285
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions
                      344198 / 10475276                               1
/ 286
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution    
                          0 / 10386630                                 
0 / 788 org.apache.spark.sql.catalyst.analysis.EliminateUnions        
                                   0 / 10096526                        
          0 / 788 org.apache.spark.sql.catalyst.analysis.AliasViewChild
                                            0 / 9991706                
                   0 / 742
org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation         
                          0 / 9649334                                  
0 / 288
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints     
                          0 / 8739109                                  
0 / 750
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion    
                          0 / 8420889                                  
0 / 742 org.apache.spark.sql.catalyst.analysis.EliminateView          
                                   0 / 8319134                         
          0 / 285
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions
                         0 / 7392627                                   
0 / 286 org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter
                                  0 / 7170516                          
         0 / 300
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateArrayOps         
                          0 / 7109643                                  
0 / 1592
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateStructOps        
                          0 / 6837590                                  
0 / 1592 org.apache.spark.sql.catalyst.optimizer.SimplifyCreateMapOps 
                                    0 / 6617848                        
           0 / 1592
org.apache.spark.sql.catalyst.optimizer.CombineConcats                 
                          0 / 5768406                                  
0 / 1592
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate
                          0 / 5349831                                  
0 / 285 org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters   
                                   0 / 5186642                         
          0 / 285
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct              
                          0 / 2427686                                  
0 / 285 org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder  
                                   0 / 2420436                         
          0 / 285
```
Author: gatorsmile <gatorsmile@gmail.com>
Closes #20342 from gatorsmile/reportExecution.
(cherry picked from commit 78801881c405de47f7e53eea3e0420dd69593dbd)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: d933fcea6f3b1d2a5bfb03d808ec83db0f97298a)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/QueryExecutionMetering.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala (diff)
Commit 1069fad41fb6896fef4245e6ae6b5ba36115ad68 by gatorsmile
[MINOR][SQL][TEST] Test case cleanups for recent PRs
## What changes were proposed in this pull request? Revert the unneeded
test case changes we made in SPARK-23000
Also fixes the test suites that do not call `super.afterAll()` in the
local `afterAll`. The `afterAll()` of `TestHiveSingleton` actually reset
the environments.
## How was this patch tested? N/A
Author: gatorsmile <gatorsmile@gmail.com>
Closes #20341 from gatorsmile/testRelated.
(cherry picked from commit 896e45af5fea264683b1d7d20a1711f33908a06f)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 1069fad41fb6896fef4245e6ae6b5ba36115ad68)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/Hive_2_1_DDLSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala (diff)
Commit d963ba031748711ec7847ad0b702911eb7319c63 by wenchen
[SPARK-23090][SQL] polish ColumnVector
## What changes were proposed in this pull request?
Several improvements:
* provide a default implementation for the batch get methods
* rename `getChildColumn` to `getChild`, which is more concise
* remove `getStruct(int, int)`, it's only used to simplify the codegen,
which is an internal thing, we should not add a public API for this
purpose.
## How was this patch tested?
existing tests
Author: Wenchen Fan <wenchen@databricks.com>
Closes #20277 from cloud-fan/column-vector.
(cherry picked from commit 5d680cae486c77cdb12dbe9e043710e49e8d51e4)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: d963ba031748711ec7847ad0b702911eb7319c63)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ArrowColumnVectorSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java (diff)