SuccessChanges

Summary

  1. [SPARK-30620][SQL] avoid unnecessary serialization in (details)
  2. [SPARK-28794][SQL][DOC] Documentation for Create table Command (details)
  3. [SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with (details)
  4. [SPARK-29947][SQL][FOLLOWUP] Fix table lookup cache (details)
  5. [SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and (details)
  6. [SPARK-30298][SQL] Respect aliases in output partitioning of projects (details)
  7. [SPARK-27083][SQL][FOLLOW-UP] Rename spark.sql.subquery.reuse to (details)
  8. [SPARK-28962][SQL][FOLLOW-UP] Add the parameter description for the (details)
Commit 3c8b3609a123ed1ffd11b46f37b7fdd5b780bba3 by wenchen
[SPARK-30620][SQL] avoid unnecessary serialization in
AggregateExpression
### What changes were proposed in this pull request?
Expressions are very likely to be serialized and sent to executors, we
should avoid unnecessary serialization overhead as much as we can.
This PR fixes `AggregateExpression`.
### Why are the changes needed?
small improvement
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
existing tests
Closes #27342 from cloud-fan/fix.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala (diff)
Commit afe70b3b5321439318a456c7d19b7074171a286a by srowen
[SPARK-28794][SQL][DOC] Documentation for Create table Command
### What changes were proposed in this pull request? Document CREATE
TABLE statement in SQL Reference Guide.
### Why are the changes needed? Adding documentation for SQL reference.
### Does this PR introduce any user-facing change? yes
Before: There was no documentation for this.
### How was this patch tested? Used jekyll build and serve to verify.
Closes #26759 from PavithraRamachandran/create_doc.
Authored-by: Pavithra Ramachandran <pavi.rams@gmail.com> Signed-off-by:
Sean Owen <srowen@gmail.com>
The file was addeddocs/sql-ref-syntax-ddl-create-table-like.md
The file was addeddocs/sql-ref-syntax-ddl-create-table-hiveformat.md
The file was addeddocs/sql-ref-syntax-ddl-create-table-datasource.md
The file was modifieddocs/sql-ref-syntax-ddl-create-table.md (diff)
Commit 843224ebd473508cd52e362a55d0e17492257c2a by dhyun
[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with
onlyChangedFiles feature
### What changes were proposed in this pull request? Update the scalafmt
plugin to 1.0.3 and use its new onlyChangedFiles feature rather than
--diff
### Why are the changes needed? Older versions of the plugin either
didn't work with scala 2.13, or got rid of the --diff argument and
didn't allow for formatting only changed files
### Does this PR introduce any user-facing change? The /dev/scalafmt
script no longer passes through arbitrary args, instead using the arg to
select scala version.  The issue here is the plugin name literally
contains the scala version, and doesn't appear to have a shorter way to
refer to it.   If srowen or someone else with better maven-fu has an
idea I'm all ears.
### How was this patch tested? Manually, e.g. edited a file and ran
dev/scalafmt
or
dev/scalafmt 2.13
Closes #27279 from koeninger/SPARK-30570.
Authored-by: cody koeninger <cody@koeninger.org> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
The file was modifiedpom.xml (diff)
The file was modifieddev/scalafmt (diff)
Commit 976946a910d877c22213df8fe4508969f6472aa0 by dhyun
[SPARK-29947][SQL][FOLLOWUP] Fix table lookup cache
### What changes were proposed in this pull request?
Fix a bug in https://github.com/apache/spark/pull/26589 , to make this
feature work.
### Why are the changes needed?
This feature doesn't work actually.
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
new test
Closes #27341 from cloud-fan/cache.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TableLookupCacheSuite.scala
Commit 3228d723a4637d188a3918c22e2ad9eb17eb00ac by dhyun
[SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and
TableCatalog to CatalogV2Util
### What changes were proposed in this pull request? In this PR, I
propose to move the `RESERVED_PROPERTIES `s from `SupportsNamespaces`
and `TableCatalog` to `CatalogV2Util`, which can keep
`RESERVED_PROPERTIES ` safe for interval usages only.
### Why are the changes needed?
the `RESERVED_PROPERTIES` should not be changed by subclasses
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
existing uts
Closes #27318 from yaooqinn/SPARK-30603.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
Commit 4847f7380d7559f693c6604f7e1e4b4a17d0dfed by yamamuro
[SPARK-30298][SQL] Respect aliases in output partitioning of projects
and aggregates
### What changes were proposed in this pull request?
Currently, in the following scenario, bucket join is not utilized:
```scala val df = (0 until 20).map(i => (i, i)).toDF("i", "j").as("df")
df.write.format("parquet").bucketBy(8, "i").saveAsTable("t") sql("CREATE
VIEW v AS SELECT * FROM t") sql("SELECT * FROM t a JOIN v b ON a.i =
b.i").explain
```
```
== Physical Plan ==
*(4) SortMergeJoin [i#13], [i#15], Inner
:- *(1) Sort [i#13 ASC NULLS FIRST], false, 0
:  +- *(1) Project [i#13, j#14]
:     +- *(1) Filter isnotnull(i#13)
:        +- *(1) ColumnarToRow
:           +- FileScan parquet default.t[i#13,j#14] Batched: true,
DataFilters: [isnotnull(i#13)], Format: Parquet, Location:
InMemoryFileIndex[file:..., PartitionFilters: [], PushedFilters:
[IsNotNull(i)], ReadSchema: struct<i:int,j:int>, SelectedBucketsCount: 8
out of 8
+- *(3) Sort [i#15 ASC NULLS FIRST], false, 0
  +- Exchange hashpartitioning(i#15, 8), true, [id=#64] <----- Exchange
node introduced
     +- *(2) Project [i#13 AS i#15, j#14 AS j#16]
        +- *(2) Filter isnotnull(i#13)
           +- *(2) ColumnarToRow
              +- FileScan parquet default.t[i#13,j#14] Batched: true,
DataFilters: [isnotnull(i#13)], Format: Parquet, Location:
InMemoryFileIndex[file:..., PartitionFilters: [], PushedFilters:
[IsNotNull(i)], ReadSchema: struct<i:int,j:int>, SelectedBucketsCount: 8
out of 8
``` Notice that `Exchange` is present. This is because `Project`
introduces aliases and `outputPartitioning` and
`requiredChildDistribution` do not consider aliases while considering
bucket join in `EnsureRequirements`. This PR addresses to allow this
scenario.
### Why are the changes needed?
This allows bucket join to be utilized in the above example.
### Does this PR introduce any user-facing change?
Yes, now with the fix, the `explain` out is as follows:
```
== Physical Plan ==
*(3) SortMergeJoin [i#13], [i#15], Inner
:- *(1) Sort [i#13 ASC NULLS FIRST], false, 0
:  +- *(1) Project [i#13, j#14]
:     +- *(1) Filter isnotnull(i#13)
:        +- *(1) ColumnarToRow
:           +- FileScan parquet default.t[i#13,j#14] Batched: true,
DataFilters: [isnotnull(i#13)], Format: Parquet, Location:
InMemoryFileIndex[file:.., PartitionFilters: [], PushedFilters:
[IsNotNull(i)], ReadSchema: struct<i:int,j:int>, SelectedBucketsCount: 8
out of 8
+- *(2) Sort [i#15 ASC NULLS FIRST], false, 0
  +- *(2) Project [i#13 AS i#15, j#14 AS j#16]
     +- *(2) Filter isnotnull(i#13)
        +- *(2) ColumnarToRow
           +- FileScan parquet default.t[i#13,j#14] Batched: true,
DataFilters: [isnotnull(i#13)], Format: Parquet, Location:
InMemoryFileIndex[file:.., PartitionFilters: [], PushedFilters:
[IsNotNull(i)], ReadSchema: struct<i:int,j:int>, SelectedBucketsCount: 8
out of 8
``` Note that the `Exchange` is no longer present.
### How was this patch tested?
Closes #26943 from imback82/bucket_alias.
Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioning.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala (diff)
Commit 3f76bd40025181841de70a11e576d0ee948de5c0 by gatorsmile
[SPARK-27083][SQL][FOLLOW-UP] Rename spark.sql.subquery.reuse to
spark.sql.execution.subquery.reuse.enabled
### What changes were proposed in this pull request? This PR is to
rename spark.sql.subquery.reuse to
spark.sql.execution.subquery.reuse.enabled
### Why are the changes needed? Make it consistent and clear.
### Does this PR introduce any user-facing change? N/A. This is a [new
conf added in Spark 3.0](https://github.com/apache/spark/pull/23998)
### How was this patch tested? N/A
Closes #27346 from gatorsmile/spark27083.
Authored-by: Xiao Li <gatorsmile@gmail.com> Signed-off-by: Xiao Li
<gatorsmile@gmail.com>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit ddf83159a8c61fa12237a60124f7a7aa4e3a53c1 by dhyun
[SPARK-28962][SQL][FOLLOW-UP] Add the parameter description for the
Scala function API filter
### What changes were proposed in this pull request? This PR is a
follow-up PR https://github.com/apache/spark/pull/25666 for adding the
description and example for the Scala function API `filter`.
### Why are the changes needed? It is hard to tell which parameter is
the index column.
### Does this PR introduce any user-facing change? No
### How was this patch tested? N/A
Closes #27336 from gatorsmile/spark28962.
Authored-by: Xiao Li <gatorsmile@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)