Progress:
Changes

Summary

  1. [SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index (details)
  2. [SPARK-36647][SQL][TESTS] Push down Aggregate (Min/Max/Count) for (details)
  3. [SPARK-37031][SQL][TESTS][FOLLOWUP] Add a missing test to (details)
  4. [SPARK-37125][SQL] Support AnsiInterval radix sort (details)
  5. [SPARK-37115][SQL] HiveClientImpl should use shim to wrap all hive (details)
  6. [SPARK-16280][SPARK-37082][SQL] Implements histogram_numeric aggregation (details)
Commit c7d9bd2e70c29781678f7809b6848ba3c5bba4ea by gurwls223
[SPARK-36348][PYTHON][FOLLOWUP] Complete test_astype for index

### What changes were proposed in this pull request?

This is follow-up for https://github.com/apache/spark/pull/34335.

### Why are the changes needed?

The previous bug depends on the pandas version, not the Spark version.

So the difference is still alive with pandas < 1.3.

For example,

```python
# Spark 3.2 with pandas 1.2.
>>> pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
>>> psidx = ps.Index(pidx)

>>> pidx
Index([10, 20, 15, 30, 45, None], dtype='object', name='x')
>>> psidx
Float64Index([10.0, 20.0, 15.0, 30.0, 45.0, nan], dtype='float64', name='x')

>>> pidx.astype(str)
Index(['10', '20', '15', '30', '45', 'None'], dtype='object', name='x')
>>> psidx.astype(str)
Index(['10.0', '20.0', '15.0', '30.0', '45.0', 'nan'], dtype='object', name='x')
```

I think many people are still using pandas < 1.3, so maybe we'd better to separate the test for old version of pandas for now.

### Does this PR introduce _any_ user-facing change?

No, it's test only

### How was this patch tested?

Unittest

Closes #34397 from itholic/SPARK-36348-followup.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/pandas/tests/indexes/test_base.py (diff)
Commit 4aec9d7daca7a1a146ff1fb1e7541c9443905725 by viirya
[SPARK-36647][SQL][TESTS] Push down Aggregate (Min/Max/Count) for Parquet if filter is on partition col

### What changes were proposed in this pull request?
I just realized that with the changes in https://github.com/apache/spark/pull/33650, the restriction for not pushing down Min/Max/Count for partition filter was already removed. This PR just added test to make sure Min/Max/Count in parquet are pushed down if filter is on partition col.

### Why are the changes needed?
To complete the work for Aggregate (Min/Max/Count) push down for Parquet

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
new test

Closes #34248 from huaxingao/partitionFilter.

Authored-by: Huaxin Gao <huaxin_gao@apple.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetAggregatePushDownSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala (diff)
Commit 7954a91d76b5f60234b2f8628c0330a65653297a by wenchen
[SPARK-37031][SQL][TESTS][FOLLOWUP] Add a missing test to DescribeNamespaceSuite

### What changes were proposed in this pull request?

This PR proposes to add a missing test on "keeping the legacy output schema" to `DescribeNamespaceSuite`. (#31705 didn't seem to add it).

### Why are the changes needed?

To increase the test coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added a new test.

Closes #34399 from imback82/SPARK-37031-followup.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeNamespaceSuite.scala (diff)
Commit 101dd6bbff2491a608e1ab51541a120a1f08e942 by wenchen
[SPARK-37125][SQL] Support AnsiInterval radix sort

### What changes were proposed in this pull request?

- Make `AnsiInterval` data type support radix sort in SQL.
- Enhance the `SortSuite` by disable radix.

### Why are the changes needed?

The radix sort is more faster than timsort, the benchmark result can see in `SortBenchmark`.

Since the `AnsiInterval` data type is comparable:

- `YearMonthIntervalType` -> int ordering
- `DayTimeIntervalType` -> long ordering

And we aslo support radix sort when the ordering column date type is int or long.

So `AnsiInterval` radix sort can be supported.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

- The data correctness should be ensured in `SortSuite`
- Add a new benchmark

Closes #34398 from ulysses-you/ansi-interval-sort.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AnsiIntervalSortBenchmark.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SortPrefixUtils.scala (diff)
The file was addedsql/core/benchmarks/AnsiIntervalSortBenchmark-results.txt
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala (diff)
The file was addedsql/core/benchmarks/AnsiIntervalSortBenchmark-jdk11-results.txt
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SortSuite.scala (diff)
Commit 1099fd342075b53ad9ddb2787911f2dabb340a3d by wenchen
[SPARK-37115][SQL] HiveClientImpl should use shim to wrap all hive client calls

### What changes were proposed in this pull request?
In this pr we use `shim` to wrap all hive client api to make it easier.

### Why are the changes needed?
Use `shim` to wrap all hive client api  to make it easier.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed UT

Closes #34388 from AngersZhuuuu/SPARK-37115.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala (diff)
Commit de0d7fbb4f010bec8e457d0dc00b5618e7a43750 by wenchen
[SPARK-16280][SPARK-37082][SQL] Implements histogram_numeric aggregation function which supports partial aggregation

### What changes were proposed in this pull request?
This PR implements aggregation function `histogram_numeric`. Function `histogram_numeric` returns an approximate histogram of a numerical column using a user-specified number of bins. For example, the histogram of column `col` when split to 3 bins.

Syntax:
#### an approximate histogram of a numerical column using a user-specified number of bins.
histogram_numebric(col, nBins)

###### Returns an approximate histogram of a column `col` into 3 bins.
SELECT histogram_numebric(col, 3) FROM table

##### Returns an approximate histogram of a column `col` into 5 bins.
SELECT histogram_numebric(col, 5) FROM table

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No change from user side

### How was this patch tested?
Added UT

Closes #34380 from AngersZhuuuu/SPARK-37082.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HistogramNumeric.scala
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HistogramNumericSuite.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-by.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala (diff)