SuccessChanges

Summary

  1. [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python (commit: e319a624e2f366a941bd92a685e1b48504c887b1) (details)
  2. [MINOR][PYTHON] Use a helper in `PythonUtils` instead of direct (commit: dad5c48b2a229bf6f9e6b8548f9335f04a15c818) (details)
  3. [SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for (commit: 7edfdfcecd07f02ecd9bda8f62c00d32884e4de8) (details)
Commit e319a624e2f366a941bd92a685e1b48504c887b1 by hyukjinkwon
[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python
3.6 and Pandas 0.23
## What changes were proposed in this pull request?
Fix test that constructs a Pandas DataFrame by specifying the column
order. Previously this test assumed the columns would be sorted
alphabetically, however when using Python 3.6 with Pandas 0.23 or
higher, the original column order is maintained. This causes the columns
to get mixed up and the test errors.
Manually tested with `python/run-tests` using Python 3.6.6 and Pandas
0.23.4
Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471.
Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: hyukjinkwon
<gurwls223@apache.org>
(cherry picked from commit 90e3955f384ca07bdf24faa6cdb60ded944cf0d8)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: e319a624e2f366a941bd92a685e1b48504c887b1)
The file was modifiedpython/pyspark/sql/tests.py (diff)
Commit dad5c48b2a229bf6f9e6b8548f9335f04a15c818 by hyukjinkwon
[MINOR][PYTHON] Use a helper in `PythonUtils` instead of direct
accessing Scala package
## What changes were proposed in this pull request?
This PR proposes to use add a helper in `PythonUtils` instead of direct
accessing Scala package.
## How was this patch tested?
Jenkins tests.
Closes #22483 from HyukjinKwon/minor-refactoring.
Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by:
hyukjinkwon <gurwls223@apache.org>
(cherry picked from commit 88e7e87bd5c052e10f52d4bb97a9d78f5b524128)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: dad5c48b2a229bf6f9e6b8548f9335f04a15c818)
The file was modifiedpython/pyspark/context.py (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/api/python/PythonUtils.scala (diff)
Commit 7edfdfcecd07f02ecd9bda8f62c00d32884e4de8 by gatorsmile
[SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId for
project expressions in each Union child, causing mistakes in constant
propagation
## What changes were proposed in this pull request?
The problem was cause by the PushProjectThroughUnion rule, which, when
creating new Project for each child of Union, uses the same exprId for
expressions of the same position. This is wrong because, for each child
of Union, the expressions are all independent, and it can lead to a
wrong result if other rules like FoldablePropagation kicks in, taking
two different expressions as the same.
This fix is to create new expressions in the new Project for each child
of Union.
## How was this patch tested?
Added UT.
Closes #22447 from maryannxue/push-project-thru-union-bug.
Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by:
gatorsmile <gatorsmile@gmail.com>
(cherry picked from commit 88446b6ad19371f15d06ef67052f6c1a8072c04a)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 7edfdfcecd07f02ecd9bda8f62c00d32884e4de8)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushProjectThroughUnionSuite.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)