SuccessChanges

Summary

  1. [SPARK-30315][SQL] Add adaptive execution context (details)
  2. [SPARK-30440][CORE][TESTS] Avoid race condition in TaskSetManagerSuite (details)
  3. [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into (details)
  4. [SPARK-30464][PYTHON][DOCS] Explicitly note that we don't add "pandas (details)
  5. [SPARK-30183][SQL] Disallow to specify reserved properties in (details)
Commit af2d3d01792f5dd43c62d5a6dc4e939c864d134a by gatorsmile
[SPARK-30315][SQL] Add adaptive execution context
### What changes were proposed in this pull request? This is a minor
code refactoring PR. It creates an adaptive execution context class to
wrap objects shared across main query and sub-queries.
### Why are the changes needed? This refactoring will improve code
readability and reduce the number of parameters used to initialize
`AdaptiveSparkPlanExec`.
### Does this PR introduce any user-facing change? No.
### How was this patch tested? Passed existing UTs.
Closes #26959 from maryannxue/aqe-context.
Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: Xiao Li
<gatorsmile@gmail.com>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala (diff)
Commit 18daa37cdb9839740816c0d1426a1d27aed218e3 by dhyun
[SPARK-30440][CORE][TESTS] Avoid race condition in TaskSetManagerSuite
by not using resourceOffer
### What changes were proposed in this pull request? There is a race
condition in test case introduced in SPARK-30359 between reviveOffers in
org.apache.spark.scheduler.TaskSchedulerImpl#submitTasks and
org.apache.spark.scheduler.TaskSetManager#resourceOffer, in the testcase
No need to do resourceOffers as submitTask will revive offers from task
set
### Why are the changes needed? Fix flaky test
### Does this PR introduce any user-facing change? No
### How was this patch tested? Test case can pass after the change
Closes #27115 from ajithme/testflaky.
Authored-by: Ajith <ajith2489@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala (diff)
Commit ee8d66105885929ac0c0c087843d70bf32de31a1 by gurwls223
[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into
'pandas' sub-package
### What changes were proposed in this pull request?
This PR proposes to move pandas related functionalities into pandas
package. Namely:
```bash pyspark/sql/pandas
├── __init__.py
├── conversion.py  # Conversion between pandas <> PySpark DataFrames
├── functions.py   # pandas_udf
├── group_ops.py   # Grouped UDF / Cogrouped UDF + groupby.apply,
groupby.cogroup.apply
├── map_ops.py     # Map Iter UDF + mapInPandas
├── serializers.py # pandas <> PyArrow serializers
├── types.py       # Type utils between pandas <> PyArrow
└── utils.py       # Version requirement checks
```
In order to separately locate `groupby.apply`, `groupby.cogroup.apply`,
`mapInPandas`, `toPandas`, and `createDataFrame(pdf)` under `pandas`
sub-package, I had to use a mix-in approach which Scala side uses often
by `trait`, and also pandas itself uses this approach (see
`IndexOpsMixin` as an example) to group related functionalities.
Currently, you can think it's like Scala's self typed trait. See the
structure below:
```python class PandasMapOpsMixin(object):
   def mapInPandas(self, ...):
       ...
       return ...
    # other Pandas <> PySpark APIs
```
```python class DataFrame(PandasMapOpsMixin):
    # other DataFrame APIs equivalent to Scala side.
```
Yes, This is a big PR but they are mostly just moving around except one
case `createDataFrame` which I had to split the methods.
### Why are the changes needed?
There are pandas functionalities here and there and I myself gets lost
where it was. Also, when you have to make a change commonly for all of
pandas related features, it's almost impossible now.
Also, after this change, `DataFrame` and `SparkSession` become more
consistent with Scala side since pandas is specific to Python, and this
change separates pandas-specific APIs away from `DataFrame` or
`SparkSession`.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests should cover. Also, I manually built the PySpark API
documentation and checked.
Closes #27109 from HyukjinKwon/pandas-refactoring.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
The file was removedpython/pyspark/sql/cogroup.py
The file was modifiedexamples/src/main/python/sql/arrow.py (diff)
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was modifiedpython/pyspark/sql/utils.py (diff)
The file was modifiedpython/setup.py (diff)
The file was addedpython/pyspark/sql/pandas/serializers.py
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala (diff)
The file was modifiedpython/pyspark/sql/tests/test_arrow.py (diff)
The file was addedpython/pyspark/sql/pandas/functions.py
The file was addedpython/pyspark/sql/pandas/utils.py
The file was modifiedpython/pyspark/sql/session.py (diff)
The file was addedpython/pyspark/sql/pandas/group_ops.py
The file was addedpython/pyspark/sql/pandas/map_ops.py
The file was modifiedpython/pyspark/sql/udf.py (diff)
The file was modifiedpython/pyspark/testing/sqlutils.py (diff)
The file was modifiedpython/pyspark/worker.py (diff)
The file was modifiedpython/pyspark/serializers.py (diff)
The file was modifiedpython/pyspark/sql/types.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was addedpython/pyspark/sql/pandas/conversion.py
The file was addedpython/pyspark/sql/pandas/types.py
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was addedpython/pyspark/sql/pandas/__init__.py
The file was modifiedpython/pyspark/sql/group.py (diff)
The file was modifiedpython/docs/pyspark.sql.rst (diff)
The file was modifiedpython/pyspark/sql/__init__.py (diff)
Commit 92a0877ee194a1709790f23678a449e1b7e8beb5 by gurwls223
[SPARK-30464][PYTHON][DOCS] Explicitly note that we don't add "pandas
compatible" aliases
### What changes were proposed in this pull request?
This PR adds a note that we're not adding "pandas compatible" aliases
anymore.
### Why are the changes needed?
We added "pandas compatible" aliases as of
https://github.com/apache/spark/pull/5544 and
https://github.com/apache/spark/pull/6066 . There are too many
differences and I don't think it makes sense to add such aliases anymore
at this moment.
I was even considering deprecating them out but decided to take a more
conservative approache by just documenting it.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests should cover.
Closes #27142 from HyukjinKwon/SPARK-30464.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
Commit c37312342e00d741ea66f1bf465336835e248f94 by wenchen
[SPARK-30183][SQL] Disallow to specify reserved properties in
CREATE/ALTER NAMESPACE syntax
### What changes were proposed in this pull request? Currently, COMMENT
and LOCATION are reserved properties for Datasource v2 namespaces. They
can be set via specific clauses and via properties. And the ones
specified in clauses take precede of properties. Since they are
reserved, which means they are not able to visit directly. They should
be used in COMMENT/LOCATION clauses ONLY.
### Why are the changes needed? make reserved properties be reserved.
### Does this PR introduce any user-facing change? yes, 'location',
'comment' are not allowed use in db properties
### How was this patch tested? UNIT tests.
Closes #26806 from yaooqinn/SPARK-30183.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)