SuccessChanges

Summary

  1. [SPARK-24739][PYTHON] Make PySpark compatible with Python 3.7 (details)
Commit 64c72b4de00b241fb4785e89ec3c4bbe90818293 by hyukjinkwon
[SPARK-24739][PYTHON] Make PySpark compatible with Python 3.7
## What changes were proposed in this pull request?
This PR proposes to make PySpark compatible with Python 3.7.  There are
rather radical change in semantic of `StopIteration` within a generator.
It now throws it as a `RuntimeError`.
To make it compatible, we should fix it:
```python try:
   next(...) except StopIteration
   return
```
See [release
note](https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7)
and [PEP 479](https://www.python.org/dev/peps/pep-0479/).
## How was this patch tested?
Manually tested:
```
$ ./run-tests --python-executables=python3.7 Running PySpark tests.
Output is in /.../spark/python/unit-tests.log Will test against the
following Python executables: ['python3.7'] Will test the following
Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib',
'pyspark-sql', 'pyspark-streaming'] Starting test(python3.7):
pyspark.mllib.tests Starting test(python3.7): pyspark.sql.tests Starting
test(python3.7): pyspark.streaming.tests Starting test(python3.7):
pyspark.tests Finished test(python3.7): pyspark.streaming.tests (130s)
Starting test(python3.7): pyspark.accumulators Finished test(python3.7):
pyspark.accumulators (8s) Starting test(python3.7): pyspark.broadcast
Finished test(python3.7): pyspark.broadcast (9s) Starting
test(python3.7): pyspark.conf Finished test(python3.7): pyspark.conf
(6s) Starting test(python3.7): pyspark.context Finished test(python3.7):
pyspark.context (27s) Starting test(python3.7):
pyspark.ml.classification Finished test(python3.7): pyspark.tests (200s)
... 3 tests were skipped Starting test(python3.7): pyspark.ml.clustering
Finished test(python3.7): pyspark.mllib.tests (244s) Starting
test(python3.7): pyspark.ml.evaluation Finished test(python3.7):
pyspark.ml.classification (63s) Starting test(python3.7):
pyspark.ml.feature Finished test(python3.7): pyspark.ml.clustering (48s)
Starting test(python3.7): pyspark.ml.fpm Finished test(python3.7):
pyspark.ml.fpm (0s) Starting test(python3.7): pyspark.ml.image Finished
test(python3.7): pyspark.ml.evaluation (23s) Starting test(python3.7):
pyspark.ml.linalg.__init__ Finished test(python3.7):
pyspark.ml.linalg.__init__ (0s) Starting test(python3.7):
pyspark.ml.recommendation Finished test(python3.7): pyspark.ml.image
(20s) Starting test(python3.7): pyspark.ml.regression Finished
test(python3.7): pyspark.ml.regression (58s) Starting test(python3.7):
pyspark.ml.stat Finished test(python3.7): pyspark.ml.feature (90s)
Starting test(python3.7): pyspark.ml.tests Finished test(python3.7):
pyspark.ml.recommendation (82s) Starting test(python3.7):
pyspark.ml.tuning Finished test(python3.7): pyspark.ml.stat (27s)
Starting test(python3.7): pyspark.mllib.classification Finished
test(python3.7): pyspark.sql.tests (362s) ... 102 tests were skipped
Starting test(python3.7): pyspark.mllib.clustering Finished
test(python3.7): pyspark.ml.tuning (29s) Starting test(python3.7):
pyspark.mllib.evaluation Finished test(python3.7):
pyspark.mllib.classification (39s) Starting test(python3.7):
pyspark.mllib.feature Finished test(python3.7): pyspark.mllib.evaluation
(30s) Starting test(python3.7): pyspark.mllib.fpm Finished
test(python3.7): pyspark.mllib.feature (44s) Starting test(python3.7):
pyspark.mllib.linalg.__init__ Finished test(python3.7):
pyspark.mllib.linalg.__init__ (0s) Starting test(python3.7):
pyspark.mllib.linalg.distributed Finished test(python3.7):
pyspark.mllib.clustering (78s) Starting test(python3.7):
pyspark.mllib.random Finished test(python3.7): pyspark.mllib.fpm (33s)
Starting test(python3.7): pyspark.mllib.recommendation Finished
test(python3.7): pyspark.mllib.random (12s) Starting test(python3.7):
pyspark.mllib.regression Finished test(python3.7):
pyspark.mllib.linalg.distributed (45s) Starting test(python3.7):
pyspark.mllib.stat.KernelDensity Finished test(python3.7):
pyspark.mllib.stat.KernelDensity (0s) Starting test(python3.7):
pyspark.mllib.stat._statistics Finished test(python3.7):
pyspark.mllib.recommendation (41s) Starting test(python3.7):
pyspark.mllib.tree Finished test(python3.7): pyspark.mllib.regression
(44s) Starting test(python3.7): pyspark.mllib.util Finished
test(python3.7): pyspark.mllib.stat._statistics (20s) Starting
test(python3.7): pyspark.profiler Finished test(python3.7):
pyspark.mllib.tree (26s) Starting test(python3.7): pyspark.rdd Finished
test(python3.7): pyspark.profiler (11s) Starting test(python3.7):
pyspark.serializers Finished test(python3.7): pyspark.mllib.util (24s)
Starting test(python3.7): pyspark.shuffle Finished test(python3.7):
pyspark.shuffle (0s) Starting test(python3.7): pyspark.sql.catalog
Finished test(python3.7): pyspark.serializers (15s) Starting
test(python3.7): pyspark.sql.column Finished test(python3.7):
pyspark.rdd (27s) Starting test(python3.7): pyspark.sql.conf Finished
test(python3.7): pyspark.sql.catalog (24s) Starting test(python3.7):
pyspark.sql.context Finished test(python3.7): pyspark.sql.conf (8s)
Starting test(python3.7): pyspark.sql.dataframe Finished
test(python3.7): pyspark.sql.column (29s) Starting test(python3.7):
pyspark.sql.functions Finished test(python3.7): pyspark.sql.context
(26s) Starting test(python3.7): pyspark.sql.group Finished
test(python3.7): pyspark.sql.dataframe (51s) Starting test(python3.7):
pyspark.sql.readwriter Finished test(python3.7): pyspark.ml.tests (266s)
Starting test(python3.7): pyspark.sql.session Finished test(python3.7):
pyspark.sql.group (36s) Starting test(python3.7): pyspark.sql.streaming
Finished test(python3.7): pyspark.sql.functions (57s) Starting
test(python3.7): pyspark.sql.types Finished test(python3.7):
pyspark.sql.session (25s) Starting test(python3.7): pyspark.sql.udf
Finished test(python3.7): pyspark.sql.types (10s) Starting
test(python3.7): pyspark.sql.window Finished test(python3.7):
pyspark.sql.readwriter (31s) Starting test(python3.7):
pyspark.streaming.util Finished test(python3.7): pyspark.sql.streaming
(22s) Starting test(python3.7): pyspark.util Finished test(python3.7):
pyspark.util (0s) Finished test(python3.7): pyspark.streaming.util (0s)
Finished test(python3.7): pyspark.sql.udf (16s) Finished
test(python3.7): pyspark.sql.window (12s)
```
In my local (I have two Macs but both have the same issues), I currently
faced some issues for now to install both extra dependencies PyArrow and
Pandas same as Jenkins's, against Python 3.7.
Author: hyukjinkwon <gurwls223@apache.org>
Closes #21714 from HyukjinKwon/SPARK-24739.
(cherry picked from commit 74f6a92fcea9196d62c2d531c11ec7efd580b760)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
The file was modifiedpython/setup.py (diff)
The file was modifiedpython/pyspark/rdd.py (diff)