Changes

Summary

  1. [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more (commit: e6ce220) (details)
  2. [SPARK-35986][PYSPARK] Fix type hint for RDD.histogram's buckets (commit: 495d234) (details)
  3. [SPARK-35980][CORE] ThreadAudit logs whether thread is daemon (commit: e4273f7) (details)
  4. [SPARK-36007][INFRA] Failed to run benchmark in GA (commit: 11fcbc7) (details)
  5. [SPARK-35987][SQL] The ANSI flags of Sum and Avg should be kept after (commit: 51103cd) (details)
  6. [SPARK-35982][SQL] Allow from_json/to_json for map types where value (commit: 6474226) (details)
  7. [SPARK-36010][BUILD] Upgrade sbt-antlr4 from 0.8.2 to 0.8.3 (commit: 554d5fe) (details)
  8. [SPARK-36013][BUILD] Upgrade Dropwizard Metrics to 4.2.2 (commit: 7f70350) (details)
  9. [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator (commit: 044dddf) (details)
  10. [SPARK-35989][SQL] Only remove redundant shuffle if shuffle origin is (commit: 7fe4c4a) (details)
  11. [SPARK-33996][BUILD][FOLLOW-UP] Match SBT's plugin checkstyle version to (commit: 2fe6c94) (details)
  12. [SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals (commit: f4237af) (details)
  13. [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType (commit: c605ba2) (details)
  14. [SPARK-35979][SQL] Return different timestamp literals based on the (commit: 2fffec7) (details)
  15. [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ (commit: 5f44acf) (details)
  16. [SPARK-34302][SQL][FOLLOWUP] More code cleanup (commit: 8b46e26) (details)
  17. [SPARK-35788][SS] Metrics support for RocksDB instance (commit: 9544277) (details)
  18. [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ (commit: b0b9643) (details)
  19. [SPARK-36011][SQL] Disallow altering permanent views based on temporary (commit: e0c6b2e) (details)
  20. [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we (commit: 87282f0) (details)
  21. [SPARK-35983][SQL] Allow from_json/to_json for map types where value (commit: c8ff613) (details)
Commit e6ce220690937718bdd9046a9ed69bf2feb57595 by viirya
[SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient

### What changes were proposed in this pull request?

This PR uses 2 ideas to make `EquivalentExpressions` more efficient:
1. do not keep all the equivalent expressions, we only need a count
2. track the "height" of common subexpressions, to quickly do child-parent sort, and filter out non-child expressions in `addCommonExprs`

This PR also fixes several small bugs (exposed by the refactoring), please see PR comments.

### Why are the changes needed?

code cleanup and small perf improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33142 from cloud-fan/codegen.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: e6ce220)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
Commit 495d234c6e06c8e1d1e551805afefb104027ac9f by gurwls223
[SPARK-35986][PYSPARK] Fix type hint for RDD.histogram's buckets

### What changes were proposed in this pull request?
Fix the type hint for `pyspark.rdd .RDD.histogram`'s `buckets` argument

### Why are the changes needed?
The current type hint is incomplete.
![image](https://user-images.githubusercontent.com/17701527/124248180-df7fd580-db22-11eb-8391-ba0bb51d689b.png)
From `pyspark.rdd .RDD.histogram`'s source:
```python
if isinstance(buckets, int):
    ...
elif isinstance(buckets, (list, tuple)):
    ...
else:
    raise TypeError("buckets should be a list or tuple or number(int or long)")
```

### Does this PR introduce _any_ user-facing change?
Fixed the warning displayed above.

### How was this patch tested?
Fixed warning above with this change.

Closes #33185 from tpvasconcelos/master.

Authored-by: Tomas Pereira de Vasconcelos <tomasvasconcelos1@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 495d234)
The file was modifiedpython/pyspark/rdd.pyi (diff)
Commit e4273f709889869ef371cfbf71b7761530778355 by gurwls223
[SPARK-35980][CORE] ThreadAudit logs whether thread is daemon

### What changes were proposed in this pull request?
Add `daemon={true|false}` to the POSSIBLE THREAD LEAK IN SUITE warning printed by test framework.
### Why are the changes needed?
This is to slightly accelerate interpretation of that warning, since non-daemon threads can block the process from exiting and are likely to be problematic.

Only affects test code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually ran some tests, inspected the output log line.

Closes #33178 from timarmstrong/thread-leak.

Authored-by: Tim Armstrong <tim.armstrong@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: e4273f7)
The file was modifiedcore/src/test/scala/org/apache/spark/ThreadAudit.scala (diff)
Commit 11fcbc73cbcbb1bdf5ba5d90eba0aba1edebb15d by gurwls223
[SPARK-36007][INFRA] Failed to run benchmark in GA

### What changes were proposed in this pull request?

When I'm running the benchmark in GA, I met the below error.

https://github.com/pingsutw/spark/runs/2867617238?check_suite_focus=true
```
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.j
ava:1692)java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
21/06/20 07:40:02 ERROR SparkContext: Error initializing SparkContext.java.lang.AssertionError: assertion failed:
spark.test.home is not set! at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.deploy.worker.Worker.<init>
(Worker.scala:148) at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) at
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:68) at
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65) at
scala.collection.immutable.Range.foreach(Range.scala:158) at
org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2954) at
org.apache.spark.SparkContext.<init>(SparkContext.scala:559) at org.apache.spark.SparkContext.<init>
(SparkContext.scala:137) at
org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86) at
org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58) at
org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58) at
org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
```

### Why are the changes needed?

Set `spark.test.home` in the benchmark workflow.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Rerun the benchmark in my fork.
https://github.com/pingsutw/spark/actions/runs/996067851

Closes #33203 from pingsutw/SPARK-36007.

Lead-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 11fcbc7)
The file was modified.github/workflows/benchmark.yml (diff)
Commit 51103cdcddfb62e6a066e53576ddb170c6784f54 by wenchen
[SPARK-35987][SQL] The ANSI flags of Sum and Avg should be kept after being copied

### What changes were proposed in this pull request?

Make the ANSI flag part of expressions `Sum` and `Average`'s parameter list, instead of fetching it from the sessional SQLConf.

### Why are the changes needed?

For Views, it is important to show consistent results even the ANSI configuration is different in the running session. This is why many expressions like 'Add'/'Divide' making the ANSI flag part of its case class parameter list.

We should make it consistent for the expressions `Sum` and `Average`

### Does this PR introduce _any_ user-facing change?

Yes, the `Sum` and `Average` inside a View always behaves the same, independent of the ANSI model SQL configuration in the current session.

### How was this patch tested?

Existing UT

Closes #33186 from gengliangwang/sumAndAvg.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 51103cd)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala (diff)
Commit 647422685292cd1a46766afa9b07b6fcfc181bbd by max.gekk
[SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals

### What changes were proposed in this pull request?

This PR fixes two issues. One is that `to_json` doesn't support `map` types where value types are `year-month` interval types like:
```
spark-sql> select to_json(map('a', interval '1-2' year to  month));
21/07/02 11:38:15 ERROR SparkSQLDriver: Failed in [select to_json(map('a', interval '1-2' year to  month))]
java.lang.RuntimeException: Failed to convert value 14 (class of class java.lang.Integer) with the type of YearMonthIntervalType(0,1) to JSON.
```
The other issue is that even if the issue of `to_json` is resolved, `from_json` doesn't support to convert `year-month` interval string to JSON. So the result of following query will be `null`.
```
spark-sql> select from_json(to_json(map('a', interval '1-2' year to month)), 'a interval year to month');
{"a":null}
```

### Why are the changes needed?

There should be no reason why year-month intervals cannot used as map value types.
`CalendarIntervalTypes` can do it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33181 from sarutak/map-json-yminterval.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 6474226)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala (diff)
Commit 554d5fef13579cefea083e69b98bce32c0c99126 by sarutak
[SPARK-36010][BUILD] Upgrade sbt-antlr4 from 0.8.2 to 0.8.3

### What changes were proposed in this pull request?

This PR proposes to upgrade sbt-antlr4 from 0.8.2 to 0.8.3 per the guides at https://github.com/ihji/sbt-antlr4
I can't find an official proper docs for this.

### Why are the changes needed?

To stick to the guides in https://github.com/ihji/sbt-antlr4, and leverage the fixes included.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI in this PR should tests it out.

Closes #33208 from HyukjinKwon/SPARK-36010.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: 554d5fe)
The file was modifiedproject/plugins.sbt (diff)
Commit 7f703509297fb5b6984a99e6f3bb96b77ca74b10 by gurwls223
[SPARK-36013][BUILD] Upgrade Dropwizard Metrics to 4.2.2

### What changes were proposed in this pull request?

This PR aims to upgrade Dropwizard Metrics from `4.2.0` to `4.2.2`.

### Why are the changes needed?

Dropwizard `4.2.1` fixes a bug related to `JMXReporter` but `4.2.1` also contains a bug. so upgrading to `4.2.2` seems better.
https://github.com/dropwizard/metrics/releases/tag/v4.2.1
https://github.com/dropwizard/metrics/releases/tag/v4.2.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #33209 from sarutak/upgrade-metrics-4.2.2.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 7f70350)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit 044dddf28860cabb813d6ccf9489ea6ef21dd11e by wenchen
[SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

### What changes were proposed in this pull request?

Current AQE has cost evaluator to decide whether to use new plan after replanning. The current used evaluator is `SimpleCostEvaluator` to make decision based on number of shuffle in the query plan. This is not perfect cost evaluator, and different production environments might want to use different custom evaluators. E.g., sometimes we might want to still do skew join even though it might introduce extra shuffle (trade off resource for better latency), sometimes we might want to take sort into consideration for cost as well. Take our own setting as an example, we are using a custom remote shuffle service (Cosco), and the cost model is more complicated. So We want to make the cost evaluator to be pluggable, and developers can implement their own `CostEvaluator` subclass and plug in dynamically based on configuration.

The approach is to introduce a new config to allow define sub-class name of `CostEvaluator` - `spark.sql.adaptive.customCostEvaluatorClass`. And add `CostEvaluator.instantiate` to instantiate the cost evaluator class in `AdaptiveSparkPlanExec.costEvaluator`.

### Why are the changes needed?

Make AQE cost evaluation more flexible.

### Does this PR introduce _any_ user-facing change?

No but an internal config is introduced - `spark.sql.adaptive.customCostEvaluatorClass` to allow custom implementation of `CostEvaluator`.

### How was this patch tested?

Added unit test in `AdaptiveQueryExecSuite.scala`.

Closes #32944 from c21/aqe-cost.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 044dddf)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/costing.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
Commit 7fe4c4a9ad92cedd398a13a1781649dff57ca4d9 by wenchen
[SPARK-35989][SQL] Only remove redundant shuffle if shuffle origin is REPARTITION_BY_COL in AQE

### What changes were proposed in this pull request?

Skip remove shuffle if it's shuffle origin is not `REPARTITION_BY_COL` in AQE.

### Why are the changes needed?

`REPARTITION_BY_COL` doesn't guarantee the output partitioning number so we can remove it safely in AQE.

For `REPARTITION_BY_NUM`, we should retain the shuffle which partition number is specified by user.
For `REBALANCE_PARTITIONS_BY_COL`, it is a special shuffle used to rebalance partitions so we should not remove it.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

add test

Closes #33188 from ulysses-you/SPARK-35989.

Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Co-authored-by: ulysses <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 7fe4c4a)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala (diff)
Commit 2fe6c9454465fbac58ecb90bde6af42a489d8bbf by gurwls223
[SPARK-33996][BUILD][FOLLOW-UP] Match SBT's plugin checkstyle version to Maven's

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/31019 that forgot to update SBT's to match.

### Why are the changes needed?

To use the same version in both Maven and SBT.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should test them.

Closes #33207 from HyukjinKwon/SPARK-33996.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 2fe6c94)
The file was modifiedproject/plugins.sbt (diff)
Commit f4237aff7ebece0b8d61e680ecbe759850f25af8 by max.gekk
[SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals properly

### What changes were proposed in this pull request?

This PR fixes an issue that `from_csv/to_csv` doesn't handle year-month intervals properly.
`from_csv` throws exception if year-month interval types are given.
```
spark-sql> select from_csv("interval '1-2' year to month", "a interval year to month");
21/07/03 04:32:24 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1-2' year to month", "a interval year to month")]
java.lang.Exception: Unsupported type: interval year to month
at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775)
at org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224)
at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134)
```

Also, `to_csv` doesn't handle year-month interval types properly though any exception is thrown.
The result of `to_csv` for year-month interval types is not ANSI interval compliant form.

```
spark-sql> select to_csv(named_struct("a", interval '1-2' year to month));
14
```
The result above should be `INTERVAL '1-2' YEAR TO MONTH`.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33210 from sarutak/csv-yminterval.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: f4237af)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala (diff)
Commit c605ba2d46742ca13db794ca1be136a4b10b652e by gengliang
[SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType

### What changes were proposed in this pull request?
This PR fix the incorrect comment for `TimestampNTZType`.

### Why are the changes needed?
Fix the incorrect comment

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
No need.

Closes #33218 from beliefer/SPARK-35664-followup.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: c605ba2)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala (diff)
Commit 2fffec7de8d31bd01c8acd8bca72acacaf189c97 by gengliang
[SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type

### What changes were proposed in this pull request?

For the timestamp literal, it should have the following behavior.
1. When `spark.sql.timestampType` is TIMESTAMP_NTZ: if there is no time zone part, return timestamp without time zone literal; otherwise, return timestamp with local time zone literal

2. When `spark.sql.timestampType` is TIMESTAMP_LTZ: return timestamp with local time zone literal

### Why are the changes needed?

When the default timestamp type is TIMESTAMP_NTZ, the result of type literal should return TIMESTAMP_NTZ when there is no time zone part in the string.

From setion 5.3 "literal" of ANSI SQL standard 2011:
```
27) The declared type of a <timestamp literal> that does not specify <time zone interval> is TIMESTAMP(P) WITHOUT TIME ZONE, where P is the number of digits in <seconds fraction>, if specified, and 0 (zero) otherwise. The declared type of a <timestamp literal> that specifies <time zone interval> is TIMESTAMP(P) WITH TIME ZONE, where P is the number of digits in <seconds fraction>, if specified, and 0 (zero) otherwise.
```
Since we don't have "timestamp with time zone", we use timestamp with local time zone instead.
### Does this PR introduce _any_ user-facing change?

No, the new timestmap type and the default timestamp configuration is not released yet.

### How was this patch tested?

Unit test

Closes #33215 from gengliangwang/tsLiteral.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 2fffec7)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 5f44acff3df51721fe891ea50c0a5bcf3a37a719 by max.gekk
[SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ

### What changes were proposed in this pull request?

Support new keyword TIMESTAMP_NTZ, which can be used for:

- timestamp without time zone data type in DDL
- timestamp without time zone data type in Cast clause.
- timestamp without time zone data type literal

### Why are the changes needed?

Users can use `TIMESTAMP_NTZ` in DDL/Cast/Literals for the timestamp without time zone type directly.

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.

### How was this patch tested?

Unit test

Closes #33221 from gengliangwang/timstamp_ntz.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 5f44acf)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala (diff)
Commit 8b46e26fc6770bd39d7fee070f88072b05d50df7 by wenchen
[SPARK-34302][SQL][FOLLOWUP] More code cleanup

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/33113, to do some code cleanup:
1. `UnresolvedFieldPosition` doesn't need to include the field name. We can get it through "context" (`AlterTableAlterColumn.column.name`).
2. Run `ResolveAlterTableCommands` in the main resolution batch, so that the column/field resolution is also unified between v1 and v2 commands (same error message).
3. Fail immediately in `ResolveAlterTableCommands` if we can't resolve the field, instead of waiting until `CheckAnalysis`. We don't expect other rules to resolve fields in ALTER  TABLE commands, so failing immediately is simpler and we can remove duplicated code in `CheckAnalysis`.

### Why are the changes needed?

code simplification.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33213 from cloud-fan/follow.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8b46e26)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/change-column.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/V2CommandsCaseSensitivitySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 9544277b0a3ab4ed401092fcae751e20cfc62316 by kabhwan.opensource
[SPARK-35788][SS] Metrics support for RocksDB instance

### What changes were proposed in this pull request?
Add more metrics for the RocksDB instance. We transform the native states from RocksDB.

### Why are the changes needed?
Improve the usability with more metrics for RocksDB instance.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #32934 from xuanyuanking/SPARK-35788.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: 9544277)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala (diff)
Commit b0b9643cd76da48ed90e958e40717a664bc7494b by gengliang
[SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ

### What changes were proposed in this pull request?

Support new keyword `TIMESTAMP_LTZ`, which can be used for:

- timestamp with local time zone data type in DDL
- timestamp with local time zone data type in Cast clause.
- timestamp with local time zone data type literal

### Why are the changes needed?

Users can use `TIMESTAMP_LTZ` in DDL/Cast/Literals for the timestamp with local time zone type directly. The new keyword is independent of the SQL configuration `spark.sql.timestampType`.

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.

### How was this patch tested?

Unit test

Closes #33224 from gengliangwang/TIMESTAMP_LTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: b0b9643)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
Commit e0c6b2e9655331e308a78d972db4a18d24b73e37 by wenchen
[SPARK-36011][SQL] Disallow altering permanent views based on temporary views or UDFs

### What changes were proposed in this pull request?
PR #15764 disabled creating permanent views based on temporary views or UDFs.  But AlterViewCommand didn't block temporary objects.

### Why are the changes needed?
More robust view canonicalization.

### Does this PR introduce _any_ user-facing change?
Yes, now if you alter a permanent view based on temporary views or UDFs, the operation will fail.

### How was this patch tested?
Add new unit tests.

Closes #33204 from jerqi/alter_view.

Authored-by: RoryQi <1242949407@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: e0c6b2e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
Commit 87282f04bf8810882187f6759d4a675190ca9b0b by viirya
[SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

### What changes were proposed in this pull request?
Ideally, in SQL query, nested columns should result to GetStructField with non-None name. But there are places that can create GetStructField with None name, such as UnresolvedStar.expand, Dataset encoder stuff, etc.
the current `nestedFieldToAlias` cannot catch it up and will cause job failed.

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT,

Closes #33183 from AngersZhuuuu/SPARK-35972.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 87282f0)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala (diff)
Commit c8ff613c3cd0d04ebfaf57feeb67e21b3c8410a2 by max.gekk
[SPARK-35983][SQL] Allow from_json/to_json for map types where value types are day-time intervals

### What changes were proposed in this pull request?

This PR fixes two issues. One is that `to_json` doesn't support `map` types where value types are `day-time` interval types like:
```
spark-sql> select to_json(map('a', interval '1 2:3:4' day to second));
21/07/06 14:53:58 ERROR SparkSQLDriver: Failed in [select to_json(map('a', interval '1 2:3:4' day to second))]
java.lang.RuntimeException: Failed to convert value 93784000000 (class of class java.lang.Long) with the type of DayTimeIntervalType(0,3) to JSON.
```
The other issue is that even if the issue of `to_json` is resolved, `from_json` doesn't support to convert `day-time` interval string to JSON. So the result of following query will be `null`.
```
spark-sql> select from_json(to_json(map('a', interval '1 2:3:4' day to second)), 'a interval day to second');
{"a":null}
```

### Why are the changes needed?

There should be no reason why day-time intervals cannot used as map value types.
`CalendarIntervalTypes` can do it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33225 from sarutak/json-dtinterval.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: c8ff613)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala (diff)