FailedChanges

Summary

  1. Revert "[SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer" (commit: cfd7ca9a06161f7622b5179a777f965c11892afa) (details)
  2. [SPARK-29152][CORE] Executor Plugin shutdown when dynamic allocation is (commit: d7843dde0f82551f0481885feb15acd63dd554c0) (details)
  3. [SPARK-29392][CORE][SQL][FOLLOWUP] More removal of 'foo Symbol syntax (commit: 3cc55f6a0a560782f6e20296ac716ef68a412d26) (details)
  4. [SPARK-30211][INFRA] Use python3 in make-distribution.sh (commit: eb509968a72831c5bcab510b9b49ff5f3a48a4bb) (details)
  5. [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' (commit: beae14d5ed4c6f2f81949b852f990fc8b801b3e4) (details)
  6. [SPARK-27506][SQL] Allow deserialization of Avro data using compatible (commit: 99ea324b6f22e979d2b4238eef0effa3709d03bd) (details)
  7. [SPARK-30207][SQL][DOCS] Enhance the SQL NULL Semantics document (commit: 82418b419cfc89c8e2ade6a21b4a3b336c07bb51) (details)
  8. [SPARK-29460][WEBUI] Add tooltip for Jobs page (commit: d46c03c3d383eb3eaf9c80db87d48a20c7bcd24d) (details)
  9. [SPARK-30200][SQL][FOLLOWUP] Fix typo in ExplainMode (commit: a59cb13cda73b0d05f68181c66558d33298600c6) (details)
  10. [SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to (commit: e933539cdd557297daf97ff5e532a3f098896979) (details)
  11. [MINOR][SS][DOC] Fix the ss-kafka doc for availability of (commit: e39bb4c9fdeba05ee16c363f2183421fa49578c2) (details)
  12. [SPARK-30195][SQL][CORE][ML] Change some function, import definitions to (commit: 33f53cb2d51b62f4c294c8640dc069e42f36d686) (details)
  13. [SPARK-30038][SQL] DESCRIBE FUNCTION should do multi-catalog resolution (commit: 9cf9304e171aa03166957d2fc5dd3d2f14c94f9e) (details)
  14. [SPARK-30198][CORE] BytesToBytesMap does not grow internal long array as (commit: b4aeaf906fe1ece886a730ae7291384e297a3bfb) (details)
  15. [SPARK-30199][DSTREAM] Recover `spark.(ui|blockManager).port` from (commit: 40b9c895a4c64546b258e0079fc896baf4e78da7) (details)
  16. [SPARK-30213][SQL] Remove the mutable status in ShuffleQueryStageExec (commit: 1ced6c15448503a899be07afdb7f605a01bd70d1) (details)
  17. [SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3 (commit: b709091b4f488d4f08b0121e1a4c46e461ea032e) (details)
  18. [SPARK-30104][SQL][FOLLOWUP] V2 catalog named 'global_temp' should (commit: 3741a36ebf326b56956289e06922d178982e4879) (details)
  19. [SPARK-30150][SQL] ADD FILE, ADD JAR, LIST FILE & LIST JAR Command do (commit: 2936507f949030547cbe2bb310012b0f20f5e4da) (details)
  20. [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when (commit: 8e9bfea1070052ebdd20f4a19b53534533bed909) (details)
  21. [SPARK-30126][CORE] support space in file path and name for addFile and (commit: ce61ee89416ea2816f29e7feadd369424db0ff38) (details)
  22. [SPARK-30170][SQL][MLLIB][TESTS] Eliminate compilation warnings: part 1 (commit: 25de90e762500e4dbb30e9e1262ec513c3756c62) (details)
  23. [SQL] Typo in HashedRelation error (commit: fd39b6db346d8cfe592fb97653cb68df4f6d6434) (details)
  24. [SPARK-30162][SQL] Add PushedFilters to metadata in Parquet DSv2 (commit: cc087a3ac5591c43d6b861b69b10647594d21b89) (details)
  25. [MINOR] Fix google style guide address (commit: 39c0696a393e9cc1e3c4d56d3e69cb4bdc529be7) (details)
  26. [SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%' (commit: cada5beef72530fa699b5ec13d67261be37730e4) (details)
  27. [SPARK-30238][SQL] hive partition pruning can only support string and (commit: 982f72f4c3c6f5ebd939753b50f44038fd6a83ca) (details)
  28. [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources (commit: 5114389aef2cacaacc82e6025696b33d6d20b2a6) (details)
  29. [SPARK-30040][SQL] DROP FUNCTION should do multi-catalog resolution (commit: cb6d2b3f836744b2b71e085949dd0ef485a4fa1a) (details)
Commit cfd7ca9a06161f7622b5179a777f965c11892afa by zsxwing
Revert "[SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer"
This reverts commit 3641c3dd69b2bd2beae028d52356450cc41f69ed.
(commit: cfd7ca9a06161f7622b5179a777f965c11892afa)
The file was removedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/InternalKafkaConnectorPoolSuite.scala
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDataConsumerSuite.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala (diff)
The file was removedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/InternalKafkaProducerPool.scala
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/InternalKafkaConsumerPool.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTest.scala (diff)
The file was removedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/InternalKafkaConnectorPool.scala
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataWriter.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala (diff)
The file was addedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/InternalKafkaConsumerPoolSuite.scala
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/CachedKafkaProducerSuite.scala (diff)
The file was modifiedexternal/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala (diff)
Commit d7843dde0f82551f0481885feb15acd63dd554c0 by vanzin
[SPARK-29152][CORE] Executor Plugin shutdown when dynamic allocation is
enabled
### What changes were proposed in this pull request? Added
`shutdownHook` for shutdown method of executor plugin. This will ensure
that shutdown method will be called always.
### Why are the changes needed? Whenever executors are not going down
gracefully, i.e getting killed due to idle time or getting killed
forcefully, shutdown method of executors plugin is not getting called.
Shutdown method can be used to release any resources that plugin has
acquired during its initialisation. So its important to make sure that
every time a executor goes down shutdown method of plugin gets called.
### Does this PR introduce any user-facing change? No.
### How was this patch tested?
Tested Manually.
Closes #26810 from iRakson/Executor_Plugin.
Authored-by: root1 <raksonrakesh@gmail.com> Signed-off-by: Marcelo
Vanzin <vanzin@cloudera.com>
(commit: d7843dde0f82551f0481885feb15acd63dd554c0)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/Executor.scala (diff)
Commit 3cc55f6a0a560782f6e20296ac716ef68a412d26 by dhyun
[SPARK-29392][CORE][SQL][FOLLOWUP] More removal of 'foo Symbol syntax
for Scala 2.13
### What changes were proposed in this pull request?
Another continuation of https://github.com/apache/spark/pull/26748
### Why are the changes needed?
To cleanly cross compile with Scala 2.13.
### Does this PR introduce any user-facing change?
None.
### How was this patch tested?
Existing tests
Closes #26842 from srowen/SPARK-29392.4.
Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 3cc55f6a0a560782f6e20296ac716ef68a412d26)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DSLHintSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala (diff)
Commit eb509968a72831c5bcab510b9b49ff5f3a48a4bb by dhyun
[SPARK-30211][INFRA] Use python3 in make-distribution.sh
### What changes were proposed in this pull request?
This PR switches python to python3 in `make-distribution.sh`.
### Why are the changes needed?
SPARK-29672 changed this
-
https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966
### Does this PR introduce any user-facing change? No
### How was this patch tested? N/A
Closes #26844 from wangyum/SPARK-30211.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: eb509968a72831c5bcab510b9b49ff5f3a48a4bb)
The file was modifieddocs/building-spark.md (diff)
The file was modifieddev/make-distribution.sh (diff)
Commit beae14d5ed4c6f2f81949b852f990fc8b801b3e4 by wenchen
[SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
### What changes were proposed in this pull request?
`global_temp` is used as a database name to access global temp views.
The current catalog lookup logic considers only the first element of
multi-part name when it resolves a catalog. This results in using the
session catalog even `global_temp` is used as a table name under v2
catalog. This PR addresses this by making sure multi-part name has two
elements before using the session catalog.
### Why are the changes needed?
Currently, 'global_temp' can be used as a table name in certain commands
(CREATE) but not in others (DESCRIBE):
```
// Assume "spark.sql.globalTempDatabase" is set to "global_temp".
sql(s"CREATE TABLE testcat.t (id bigint, data string) USING foo")
sql(s"CREATE TABLE testcat.global_temp (id bigint, data string) USING
foo") sql("USE testcat")
sql(s"DESCRIBE TABLE t").show
+---------------+---------+-------+
|       col_name|data_type|comment|
+---------------+---------+-------+
|             id|   bigint|       |
|           data|   string|       |
|               |         |       |
| # Partitioning|         |       |
|Not partitioned|         |       |
+---------------+---------+-------+
sql(s"DESCRIBE TABLE global_temp").show
org.apache.spark.sql.AnalysisException: Table not found: global_temp;;
'DescribeTable 'UnresolvedV2Relation [global_temp],
org.apache.spark.sql.connector.InMemoryTableSessionCatalog2f1af64f,
`global_temp`, false
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:47)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:46)
at
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:122)
```
### Does this PR introduce any user-facing change?
Yes, `sql(s"DESCRIBE TABLE global_temp").show` in the above example now
displays:
```
+---------------+---------+-------+
|       col_name|data_type|comment|
+---------------+---------+-------+
|             id|   bigint|       |
|           data|   string|       |
|               |         |       |
| # Partitioning|         |       |
|Not partitioned|         |       |
+---------------+---------+-------+
``` instead of throwing an exception.
### How was this patch tested?
Added new tests.
Closes #26741 from imback82/global_temp.
Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: beae14d5ed4c6f2f81949b852f990fc8b801b3e4)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala (diff)
Commit 99ea324b6f22e979d2b4238eef0effa3709d03bd by gengliang.wang
[SPARK-27506][SQL] Allow deserialization of Avro data using compatible
schemas
Follow up of https://github.com/apache/spark/pull/24405
### What changes were proposed in this pull request? The current
implementation of _from_avro_ and _AvroDataToCatalyst_ doesn't allow
doing schema evolution since it requires the deserialization of an Avro
record with the exact same schema with which it was serialized.
The proposed change is to add a new option `actualSchema` to allow
passing the schema used to serialize the records. This allows using a
different compatible schema for reading by passing both schemas to
_GenericDatumReader_. If no writer's schema is provided, nothing changes
from before.
### Why are the changes needed? Consider the following example.
```
// schema ID: 1 val schema1 = """
{
   "type": "record",
   "name": "MySchema",
   "fields": [
       {"name": "col1", "type": "int"},
       {"name": "col2", "type": "string"}
    ]
}
"""
// schema ID: 2 val schema2 = """
{
   "type": "record",
   "name": "MySchema",
   "fields": [
       {"name": "col1", "type": "int"},
       {"name": "col2", "type": "string"},
       {"name": "col3", "type": "string", "default": ""}
    ]
}
"""
```
The two schemas are compatible - i.e. you can use `schema2` to
deserialize events serialized with `schema1`, in which case there will
be the field `col3` with the default value.
Now imagine that you have two dataframes (read from batch or streaming),
one with Avro events from schema1 and the other with events from
schema2. **We want to combine them into one dataframe** for storing or
further processing.
With the current `from_avro` function we can only decode each of them
with the corresponding schema:
``` scalaval df1 = ... // Avro events created with schema1 df1:
org.apache.spark.sql.DataFrame = [eventBytes: binary] scalaval
decodedDf1 = df1.select(from_avro('eventBytes, schema1) as "decoded")
decodedDf1: org.apache.spark.sql.DataFrame = [decoded: struct<col1: int,
col2: string>]
scalaval df2= ... // Avro events created with schema2 df2:
org.apache.spark.sql.DataFrame = [eventBytes: binary] scalaval
decodedDf2 = df2.select(from_avro('eventBytes, schema2) as "decoded")
decodedDf2: org.apache.spark.sql.DataFrame = [decoded: struct<col1: int,
col2: string, col3: string>]
```
but then `decodedDf1` and `decodedDf2` have different Spark schemas and
we can't union them. Instead, with the proposed change we can decode
`df1` in the following way:
``` scalaimport scala.collection.JavaConverters._ scalaval decodedDf1 =
df1.select(from_avro(data = 'eventBytes, jsonFormatSchema = schema2,
options = Map("actualSchema" -> schema1).asJava) as "decoded")
decodedDf1: org.apache.spark.sql.DataFrame = [decoded: struct<col1: int,
col2: string, col3: string>]
```
so that both dataframes have the same schemas and can be merged.
### Does this PR introduce any user-facing change? This PR allows users
to pass a new configuration but it doesn't affect current code.
### How was this patch tested? A new unit test was added.
Closes #26780 from Fokko/SPARK-27506.
Lead-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by:
Gianluca Amori <gianluca.amori@gmail.com> Signed-off-by: Gengliang Wang
<gengliang.wang@databricks.com>
(commit: 99ea324b6f22e979d2b4238eef0effa3709d03bd)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/functions.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala (diff)
The file was modifieddocs/sql-data-sources-avro.md (diff)
The file was modifiedpython/pyspark/sql/avro/functions.py (diff)
Commit 82418b419cfc89c8e2ade6a21b4a3b336c07bb51 by wenchen
[SPARK-30207][SQL][DOCS] Enhance the SQL NULL Semantics document
### What changes were proposed in this pull request? Enhancement of the
SQL NULL Semantics document: sql-ref-null-semantics.html.
### Why are the changes needed? Clarify the behavior of `UNKNOWN` for
both `EXIST` and `IN` operation.
### Does this PR introduce any user-facing change? No.
### How was this patch tested? Doc changes only.
Closes #26837 from xuanyuanking/SPARK-30207.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: 82418b419cfc89c8e2ade6a21b4a3b336c07bb51)
The file was modifieddocs/sql-ref-null-semantics.md (diff)
Commit d46c03c3d383eb3eaf9c80db87d48a20c7bcd24d by srowen
[SPARK-29460][WEBUI] Add tooltip for Jobs page
### What changes were proposed in this pull request? Adding tooltip for
jobs tab column - Job Id (Job Group), Description ,Submitted, Duration,
Stages, Tasks
Before:
![Screenshot from 2019-11-04
11-31-02](https://user-images.githubusercontent.com/51401130/68102467-e8a54300-fef8-11e9-9f9e-48dd1b393ac8.png)
After:
![Screenshot from 2019-11-04
11-30-53](https://user-images.githubusercontent.com/51401130/68102478-f3f86e80-fef8-11e9-921a-357678229cb4.png)
### Why are the changes needed? Jobs tab do not have any tooltip for the
columns, Some page provide tooltip , inorder to resolve the
inconsistency and for better user experience.
### Does this PR introduce any user-facing change? No
### How was this patch tested? Manual
Closes #26384 from PavithraRamachandran/jobTab_tooltip.
Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com> Signed-off-by:
Sean Owen <srowen@gmail.com>
(commit: d46c03c3d383eb3eaf9c80db87d48a20c7bcd24d)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala (diff)
Commit a59cb13cda73b0d05f68181c66558d33298600c6 by dhyun
[SPARK-30200][SQL][FOLLOWUP] Fix typo in ExplainMode
### What changes were proposed in this pull request?
This pr is a follow-up of #26829 to fix typos in ExplainMode.
### Why are the changes needed?
For better docs.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes #26851 from maropu/SPARK-30200-FOLLOWUP.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: a59cb13cda73b0d05f68181c66558d33298600c6)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/ExplainMode.java (diff)
Commit e933539cdd557297daf97ff5e532a3f098896979 by wenchen
[SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to
intervals
### What changes were proposed in this pull request? In the PR, I
propose new implementation of `fromDayTimeString` which strictly parses
strings in day-time formats to intervals. New implementation accepts
only strings that match to a pattern defined by the `from` and `to`.
Here is the mapping of user's bounds and patterns:
- `[+|-]D+ H[H]:m[m]:s[s][.SSSSSSSSS]` for **DAY TO SECOND**
- `[+|-]D+ H[H]:m[m]` for **DAY TO MINUTE**
- `[+|-]D+ H[H]` for **DAY TO HOUR**
- `[+|-]H[H]:m[m]s[s][.SSSSSSSSS]` for **HOUR TO SECOND**
- `[+|-]H[H]:m[m]` for **HOUR TO MINUTE**
- `[+|-]m[m]:s[s][.SSSSSSSSS]` for **MINUTE TO SECOND**
Closes #26327 Closes #26358
### Why are the changes needed?
- Improve user experience with Spark SQL, and respect to the bound
specified by users.
- Behave the same as other broadly used DBMS - Oracle and MySQL.
### Does this PR introduce any user-facing change? Yes, before:
```sql spark-sql> SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE;
interval 1 weeks 3 days 11 hours 12 minutes
``` After:
```sql spark-sql> SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE;
Error in query: requirement failed: Interval string must match day-time
format of '^(?<sign>[+|-])?(?<hour>\d{1,2}):(?<minute>\d{1,2})$': 10
11:12:13.123(line 1, pos 16)
== SQL == SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE
----------------^^^
```
### How was this patch tested?
- Added tests to `IntervalUtilsSuite`
- By `ExpressionParserSuite`
- Updated `literals.sql`
Closes #26473 from MaxGekk/strict-from-daytime-string.
Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: e933539cdd557297daf97ff5e532a3f098896979)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/interval.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/interval.sql.out (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/interval.sql (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit e39bb4c9fdeba05ee16c363f2183421fa49578c2 by dhyun
[MINOR][SS][DOC] Fix the ss-kafka doc for availability of
'minPartitions' option
### What changes were proposed in this pull request?
This patch fixes the availability of `minPartitions` option for Kafka
source, as it is only supported by micro-batch for now. There's a WIP PR
for batch (#25436) as well but there's no progress on the PR so far, so
safer to fix the doc first, and let it be added later when we address it
with batch case as well.
### Why are the changes needed?
The doc is wrong and misleading.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Just a doc change.
Closes #26849 from HeartSaVioR/MINOR-FIX-minPartition-availability-doc.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: e39bb4c9fdeba05ee16c363f2183421fa49578c2)
The file was modifieddocs/structured-streaming-kafka-integration.md (diff)
Commit 33f53cb2d51b62f4c294c8640dc069e42f36d686 by dhyun
[SPARK-30195][SQL][CORE][ML] Change some function, import definitions to
work with stricter compiler in Scala 2.13
### What changes were proposed in this pull request?
See https://issues.apache.org/jira/browse/SPARK-30195 for the
background; I won't repeat it here. This is sort of a grab-bag of
related issues.
### Why are the changes needed?
To cross-compile with Scala 2.13 later.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests for 2.12. I've been manually checking that this actually
resolves the compile problems in 2.13 separately.
Closes #26826 from srowen/SPARK-30195.
Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 33f53cb2d51b62f4c294c8640dc069e42f36d686)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/columnar/compression/IntegralDeltaSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/columnar/compression/PassThroughEncodingSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/InternalAccumulatorSuite.scala (diff)
The file was modifiedmllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PivotFirst.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/ClosureCleanerSuite.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala (diff)
Commit 9cf9304e171aa03166957d2fc5dd3d2f14c94f9e by dhyun
[SPARK-30038][SQL] DESCRIBE FUNCTION should do multi-catalog resolution
### What changes were proposed in this pull request?
Add DescribeFunctionsStatement and make DESCRIBE FUNCTIONS go through
the same catalog/table resolution framework of v2 commands.
### Why are the changes needed?
It's important to make all the commands have the same table resolution
behavior, to avoid confusing DESCRIBE FUNCTIONS namespace.function
### Does this PR introduce any user-facing change?
Yes. When running DESCRIBE FUNCTIONS namespace.function Spark fails the
command if the current catalog is set to a v2 catalog.
### How was this patch tested?
Unit tests.
Closes #26840 from
planga82/feature/SPARK-30038_DescribeFunction_V2Catalog.
Authored-by: Pablo Langa <soypab@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 9cf9304e171aa03166957d2fc5dd3d2f14c94f9e)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
Commit b4aeaf906fe1ece886a730ae7291384e297a3bfb by dhyun
[SPARK-30198][CORE] BytesToBytesMap does not grow internal long array as
expected
### What changes were proposed in this pull request?
This patch changes the condition to check if BytesToBytesMap should grow
up its internal array. Specifically, it changes to compare by the
capacity of the array, instead of its size.
### Why are the changes needed?
One Spark job on our cluster hangs forever at
BytesToBytesMap.safeLookup. After inspecting, the long array size is
536870912.
Currently in BytesToBytesMap.append, we only grow the internal array if
the size of the array is less than its MAX_CAPACITY that is 536870912.
So in above case, the array can not be grown up, and safeLookup can not
find an empty slot forever.
But it is wrong because we use two array entries per key, so the array
size is twice the capacity. We should compare the current capacity of
the array, instead of its size.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
This issue only happens when loading big number of values into
BytesToBytesMap, so it is hard to do unit test. This is tested manually
with internal Spark job.
Closes #26828 from viirya/fix-bytemap.
Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by:
Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: b4aeaf906fe1ece886a730ae7291384e297a3bfb)
The file was modifiedcore/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java (diff)
Commit 40b9c895a4c64546b258e0079fc896baf4e78da7 by dhyun
[SPARK-30199][DSTREAM] Recover `spark.(ui|blockManager).port` from
checkpoint
### What changes were proposed in this pull request?
This PR aims to recover `spark.ui.port` and `spark.blockManager.port`
from checkpoint like `spark.driver.port`.
### Why are the changes needed?
When the user configures these values, we can respect them.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with the newly added test cases.
Closes #26827 from dongjoon-hyun/SPARK-30199.
Authored-by: Aaruna <aaruna@apple.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 40b9c895a4c64546b258e0079fc896baf4e78da7)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala (diff)
Commit 1ced6c15448503a899be07afdb7f605a01bd70d1 by dhyun
[SPARK-30213][SQL] Remove the mutable status in ShuffleQueryStageExec
### What changes were proposed in this pull request? Currently
`ShuffleQueryStageExec `contain the mutable status, eg
`mapOutputStatisticsFuture `variable. So It is not easy to pass when we
copy `ShuffleQueryStageExec`. This PR will put the
`mapOutputStatisticsFuture ` variable from `ShuffleQueryStageExec` to
`ShuffleExchangeExec`. And then we can pass the value of
`mapOutputStatisticsFuture ` when copying.
### Why are the changes needed? In order to remove the mutable status in
`ShuffleQueryStageExec`
### Does this PR introduce any user-facing change? No
### How was this patch tested? Existing uts
Closes #26846 from JkSelf/removeMutableVariable.
Authored-by: jiake <ke.a.jia@intel.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 1ced6c15448503a899be07afdb7f605a01bd70d1)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala (diff)
Commit b709091b4f488d4f08b0121e1a4c46e461ea032e by gurwls223
[SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3
### What changes were proposed in this pull request?
This PR aims to update zstd-jni library to 1.4.4-3.
### Why are the changes needed?
This will bring the latest bug fixes in zstd itself and some performance
improvement.
- https://github.com/facebook/zstd/releases/tag/v1.4.4
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins.
Closes #26856 from dongjoon-hyun/SPARK-ZSTD-144.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: b709091b4f488d4f08b0121e1a4c46e461ea032e)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-1.2 (diff)
Commit 3741a36ebf326b56956289e06922d178982e4879 by wenchen
[SPARK-30104][SQL][FOLLOWUP] V2 catalog named 'global_temp' should
always be masked
### What changes were proposed in this pull request?
This is a follow up to #26741 to address the following: 1. V2 catalog
named `global_temp` should always be masked. 2. #26741 introduces
`CatalogAndIdentifer` that supersedes `CatalogObjectIdentfier`. This PR
removes `CatalogObjectIdentfier` and its usages and replace them with
`CatalogAndIdentifer`. 3. `CatalogObjectIdentifier(catalog, ident) if
!isSessionCatalog(catalog)` and `CatalogObjectIdentifier(catalog, ident)
if isSessionCatalog(catalog)` are replaced with
`NonSessionCatalogAndIdentifier` and `SessionCatalogAndIdentifier`
respectively.
### Why are the changes needed?
To fix an existing with handling v2 catalog named `global_temp` and to
simplify the code base.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Added new tests.
Closes #26853 from imback82/lookup_table.
Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 3741a36ebf326b56956289e06922d178982e4879)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/LookupCatalogSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala (diff)
Commit 2936507f949030547cbe2bb310012b0f20f5e4da by wenchen
[SPARK-30150][SQL] ADD FILE, ADD JAR, LIST FILE & LIST JAR Command do
not accept quoted path
### What changes were proposed in this pull request?
`add file "abc.txt"` and `add file 'abc.txt'` are not supported. For
these two spark sql gives `FileNotFoundException`. Only `add file
abc.txt` is supported currently.
After these changes path can be given as quoted text for ADD FILE, ADD
JAR, LIST FILE, LIST JAR commands in spark-sql
### Why are the changes needed?
In many of the spark-sql commands (like create table ,etc )we write path
in quoted format only.  To maintain this consistency we should support
quoted format with this command as well.
### Does this PR introduce any user-facing change? Yes. Now users can
write path with quotes.
### How was this patch tested? Manually tested.
Closes #26779 from iRakson/SPARK-30150.
Authored-by: root1 <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 2936507f949030547cbe2bb310012b0f20f5e4da)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
Commit 8e9bfea1070052ebdd20f4a19b53534533bed909 by gurwls223
[SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when
applied on empty DF
### What changes were proposed in this pull request?
An empty Spark DataFrame converted to a Pandas DataFrame wouldn't have
the right column types. Several type mappings were missing.
### Why are the changes needed?
Empty Spark DataFrames can be used to write unit tests, and verified by
converting them to Pandas first. But this can fail when the column types
are wrong.
### Does this PR introduce any user-facing change?
Yes; the error reported in the JIRA issue should not happen anymore.
### How was this patch tested?
Through unit tests in
`pyspark.sql.tests.test_dataframe.DataFrameTests#test_to_pandas_from_empty_dataframe`
Closes #26747 from dlindelof/SPARK-29188.
Authored-by: David <dlindelof@expediagroup.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 8e9bfea1070052ebdd20f4a19b53534533bed909)
The file was modifiedpython/pyspark/sql/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
Commit ce61ee89416ea2816f29e7feadd369424db0ff38 by wenchen
[SPARK-30126][CORE] support space in file path and name for addFile and
addJar function
### What changes were proposed in this pull request?
sparkContext.addFile and sparkContext.addJar fails when file path
contains spaces
### Why are the changes needed? When uploading a file to the spark
context via the addFile and addJar function, an exception is thrown when
file path contains a space character. Escaping the space with %20 or or
+ doesn't change the result.
### Does this PR introduce any user-facing change? No
### How was this patch tested? Add test case.
Closes #26773 from 07ARB/SPARK-30126.
Authored-by: 07ARB <ankitrajboudh@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: ce61ee89416ea2816f29e7feadd369424db0ff38)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
Commit 25de90e762500e4dbb30e9e1262ec513c3756c62 by srowen
[SPARK-30170][SQL][MLLIB][TESTS] Eliminate compilation warnings: part 1
### What changes were proposed in this pull request?
- Replace `Seq[String]` by `Seq[_]` in `StopWordsRemoverSuite` because
`String` type is unchecked due erasure.
- Throw an exception for default case in `MLTest.checkNominalOnDF`
because we don't expect other attribute types currently.
- Explicitly cast float to double in `BigDecimal(y)`. This is what the
`apply()` method does for `float`s.
- Replace deprecated `verifyZeroInteractions` by `verifyNoInteractions`.
- Equivalent replacement of `\0` by `\u0000` in `CSVExprUtilsSuite`
- Import `scala.language.implicitConversions` in
`CollectionExpressionsSuite`, `HashExpressionsSuite` and in
`ExpressionParserSuite`.
### Why are the changes needed? The changes fix compiler warnings showed
in the JIRA ticket https://issues.apache.org/jira/browse/SPARK-30170 .
Eliminating the warning highlights other warnings which could take more
attention to real problems.
### Does this PR introduce any user-facing change? No
### How was this patch tested? By existing test suites
`StopWordsRemoverSuite`, `AnalysisExternalCatalogSuite`,
`CSVExprUtilsSuite`, `CollectionExpressionsSuite`,
`HashExpressionsSuite`, `ExpressionParserSuite` and sub-tests of
`MLTest`.
Closes #26799 from MaxGekk/eliminate-warning-2.
Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: 25de90e762500e4dbb30e9e1262ec513c3756c62)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/FloatType.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisExternalCatalogSuite.scala (diff)
Commit fd39b6db346d8cfe592fb97653cb68df4f6d6434 by srowen
[SQL] Typo in HashedRelation error
### What changes were proposed in this pull request?
Fixed typo in exception message of HashedRelations
### Why are the changes needed?
Better exception messages
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
No tests needed
Closes #26822 from aaron-lau/master.
Authored-by: Aaron Lau <aaron.lau@datadoghq.com> Signed-off-by: Sean
Owen <srowen@gmail.com>
(commit: fd39b6db346d8cfe592fb97653cb68df4f6d6434)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala (diff)
Commit cc087a3ac5591c43d6b861b69b10647594d21b89 by dhyun
[SPARK-30162][SQL] Add PushedFilters to metadata in Parquet DSv2
implementation
### What changes were proposed in this pull request?
This PR proposes to add `PushedFilters` into metadata to show the pushed
filters in Parquet DSv2 implementation. In case of ORC, it is already
added at
https://github.com/apache/spark/pull/24719/files#diff-0fc82694b20da3cd2cbb07206920eef7R62-R64
### Why are the changes needed?
In order for users to be able to debug, and to match with ORC.
### Does this PR introduce any user-facing change?
```scala spark.range(10).write.mode("overwrite").parquet("/tmp/foo")
spark.read.parquet("/tmp/foo").filter("5 > id").explain()
```
**Before:**
```
== Physical Plan ==
*(1) Project [id#20L]
+- *(1) Filter (isnotnull(id#20L) AND (5 > id#20L))
  +- *(1) ColumnarToRow
     +- BatchScan[id#20L] ParquetScan Location:
InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint>
```
**After:**
```
== Physical Plan ==
*(1) Project [id#13L]
+- *(1) Filter (isnotnull(id#13L) AND (5 > id#13L))
  +- *(1) ColumnarToRow
     +- BatchScan[id#13L] ParquetScan Location:
InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint>,
PushedFilters: [IsNotNull(id), LessThan(id,5)]
```
### How was this patch tested? Unittest were added and manually tested.
Closes #26857 from HyukjinKwon/SPARK-30162.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: cc087a3ac5591c43d6b861b69b10647594d21b89)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala (diff)
Commit 39c0696a393e9cc1e3c4d56d3e69cb4bdc529be7 by srowen
[MINOR] Fix google style guide address
### What changes were proposed in this pull request?
This PR update  google style guide address to
`https://google.github.io/styleguide/javaguide.html`.
### Why are the changes needed?
`https://google-styleguide.googlecode.com/svn-history/r130/trunk/javaguide.html`
**404**:
![image](https://user-images.githubusercontent.com/5399861/70717915-431c9500-1d2a-11ea-895b-024be953a116.png)
### Does this PR introduce any user-facing change? No
### How was this patch tested?
Closes #26865 from wangyum/fix-google-styleguide.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: 39c0696a393e9cc1e3c4d56d3e69cb4bdc529be7)
The file was modifieddev/checkstyle.xml (diff)
Commit cada5beef72530fa699b5ec13d67261be37730e4 by gengliang.wang
[SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%'
### What changes were proposed in this pull request?
Since [25001](https://github.com/apache/spark/pull/25001), spark support
like escape syntax. But '%' and '_' is the reserve char in `Like`
expression. We can not use them as escape char.
### Why are the changes needed?
Avoid some unexpect problem when using like escape syntax.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Add UT.
Closes #26860 from ulysses-you/SPARK-30230.
Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Gengliang
Wang <gengliang.wang@databricks.com>
(commit: cada5beef72530fa699b5ec13d67261be37730e4)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
Commit 982f72f4c3c6f5ebd939753b50f44038fd6a83ca by dhyun
[SPARK-30238][SQL] hive partition pruning can only support string and
integral types
### What changes were proposed in this pull request?
Check the partition column data type and only allow string and integral
types in hive partition pruning.
### Why are the changes needed?
Currently we only support string and integral types in hive partition
pruning, but the check is done for literals. If the predicate is
`InSet`, then there is no literal and we may pass an unsupported
partition predicate to Hive and cause problems.
### Does this PR introduce any user-facing change?
yes. fix a bug. A query fails before and can run now.
### How was this patch tested?
a new test
Closes #26871 from cloud-fan/bug.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: 982f72f4c3c6f5ebd939753b50f44038fd6a83ca)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
Commit 5114389aef2cacaacc82e6025696b33d6d20b2a6 by gengliang.wang
[SPARK-30107][SQL] Expose nested schema pruning to all V2 sources
### What changes were proposed in this pull request?
This PR exposes the existing logic for nested schema pruning to all
sources, which is in line with the description of
`SupportsPushDownRequiredColumns` .
Right now, `SchemaPruning` (rule, not helper utility) is applied in the
optimizer directly on certain instances of `Table` ignoring
`SupportsPushDownRequiredColumns` that is part of `ScanBuilder`. I think
it would be cleaner to perform schema pruning and filter push-down in
one place. Therefore, this PR moves all the logic into
`V2ScanRelationPushDown`.
### Why are the changes needed?
This change allows all V2 data sources to benefit from nested column
pruning (if they support it).
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR mostly relies on existing tests. On top, it adds one test to
verify that top-level schema pruning works as well as one test for
predicates with subqueries.
Closes #26751 from aokolnychyi/nested-schema-pruning-ds-v2.
Authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Signed-off-by:
Gengliang Wang <gengliang.wang@databricks.com>
(commit: 5114389aef2cacaacc82e6025696b33d6d20b2a6)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala (diff)
Commit cb6d2b3f836744b2b71e085949dd0ef485a4fa1a by dhyun
[SPARK-30040][SQL] DROP FUNCTION should do multi-catalog resolution
### What changes were proposed in this pull request?
Add DropFunctionStatement and make DROP FUNCTION go through the same
catalog/table resolution framework of v2 commands.
### Why are the changes needed?
It's important to make all the commands have the same table resolution
behavior, to avoid confusing DROP FUNCTION namespace.function
### Does this PR introduce any user-facing change?
Yes. When running DROP FUNCTION namespace.function Spark fails the
command if the current catalog is set to a v2 catalog.
### How was this patch tested?
Unit tests.
Closes #26854 from planga82/feature/SPARK-30040_DropFunctionV2Catalog.
Authored-by: Pablo Langa <soypab@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: cb6d2b3f836744b2b71e085949dd0ef485a4fa1a)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)