SuccessChanges

Summary

  1. [SPARK-24740][PYTHON][ML][BACKPORT-2.3] Make PySpark's tests compatible (commit: bf3cdeae3f27effb50f874cfe05f14192be47783) (details)
Commit bf3cdeae3f27effb50f874cfe05f14192be47783 by yamamuro
[SPARK-24740][PYTHON][ML][BACKPORT-2.3] Make PySpark's tests compatible
with NumPy 1.14+
## What changes were proposed in this pull request? This PR backported
SPARK-24740 to branch-2.3; This PR proposes to make PySpark's tests
compatible with NumPy 0.14+ NumPy 0.14.x introduced rather radical
changes about its string representation.
For example, the tests below are failed:
```
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 895, in
__main__.DenseMatrix.__str__ Failed example:
   print(dm) Expected:
   DenseMatrix([[ 0.,  2.],
                [ 1.,  3.]]) Got:
   DenseMatrix([[0., 2.],
                [1., 3.]])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 899, in
__main__.DenseMatrix.__str__ Failed example:
   print(dm) Expected:
   DenseMatrix([[ 0.,  1.],
                [ 2.,  3.]]) Got:
   DenseMatrix([[0., 1.],
                [2., 3.]])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 939, in
__main__.DenseMatrix.toArray Failed example:
   m.toArray() Expected:
   array([[ 0.,  2.],
          [ 1.,  3.]]) Got:
   array([[0., 2.],
          [1., 3.]])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 324, in
__main__.DenseVector.dot Failed example:
   dense.dot(np.reshape([1., 2., 3., 4.], (2, 2), order='F')) Expected:
   array([  5.,  11.]) Got:
   array([ 5., 11.])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 567, in
__main__.SparseVector.dot Failed example:
   a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]])) Expected:
   array([ 22.,  22.]) Got:
   array([22., 22.])
```
See [release
note](https://docs.scipy.org/doc/numpy-1.14.0/release.html#compatibility-notes).
## How was this patch tested?
Manually tested:
```
$ ./run-tests --python-executables=python3.6,python2.7
--modules=pyspark-ml,pyspark-mllib Running PySpark tests. Output is in
/.../spark/python/unit-tests.log Will test against the following Python
executables: ['python3.6', 'python2.7'] Will test the following Python
modules: ['pyspark-ml', 'pyspark-mllib'] Starting test(python2.7):
pyspark.mllib.tests Starting test(python2.7): pyspark.ml.classification
Starting test(python3.6): pyspark.mllib.tests Starting test(python2.7):
pyspark.ml.clustering Finished test(python2.7): pyspark.ml.clustering
(54s) Starting test(python2.7): pyspark.ml.evaluation Finished
test(python2.7): pyspark.ml.classification (74s) Starting
test(python2.7): pyspark.ml.feature Finished test(python2.7):
pyspark.ml.evaluation (27s) Starting test(python2.7): pyspark.ml.fpm
Finished test(python2.7): pyspark.ml.fpm (0s) Starting test(python2.7):
pyspark.ml.image Finished test(python2.7): pyspark.ml.image (17s)
Starting test(python2.7): pyspark.ml.linalg.__init__ Finished
test(python2.7): pyspark.ml.linalg.__init__ (1s) Starting
test(python2.7): pyspark.ml.recommendation Finished test(python2.7):
pyspark.ml.feature (76s) Starting test(python2.7): pyspark.ml.regression
Finished test(python2.7): pyspark.ml.recommendation (69s) Starting
test(python2.7): pyspark.ml.stat Finished test(python2.7):
pyspark.ml.regression (45s) Starting test(python2.7): pyspark.ml.tests
Finished test(python2.7): pyspark.ml.stat (28s) Starting
test(python2.7): pyspark.ml.tuning Finished test(python2.7):
pyspark.ml.tuning (20s) Starting test(python2.7):
pyspark.mllib.classification Finished test(python2.7):
pyspark.mllib.classification (31s) Starting test(python2.7):
pyspark.mllib.clustering Finished test(python2.7): pyspark.mllib.tests
(260s) Starting test(python2.7): pyspark.mllib.evaluation Finished
test(python3.6): pyspark.mllib.tests (266s) Starting test(python2.7):
pyspark.mllib.feature Finished test(python2.7): pyspark.mllib.evaluation
(21s) Starting test(python2.7): pyspark.mllib.fpm Finished
test(python2.7): pyspark.mllib.feature (38s) Starting test(python2.7):
pyspark.mllib.linalg.__init__ Finished test(python2.7):
pyspark.mllib.linalg.__init__ (1s) Starting test(python2.7):
pyspark.mllib.linalg.distributed Finished test(python2.7):
pyspark.mllib.fpm (34s) Starting test(python2.7): pyspark.mllib.random
Finished test(python2.7): pyspark.mllib.clustering (64s) Starting
test(python2.7): pyspark.mllib.recommendation Finished test(python2.7):
pyspark.mllib.random (15s) Starting test(python2.7):
pyspark.mllib.regression Finished test(python2.7):
pyspark.mllib.linalg.distributed (47s) Starting test(python2.7):
pyspark.mllib.stat.KernelDensity Finished test(python2.7):
pyspark.mllib.stat.KernelDensity (0s) Starting test(python2.7):
pyspark.mllib.stat._statistics Finished test(python2.7):
pyspark.mllib.recommendation (40s) Starting test(python2.7):
pyspark.mllib.tree Finished test(python2.7): pyspark.mllib.regression
(38s) Starting test(python2.7): pyspark.mllib.util Finished
test(python2.7): pyspark.mllib.stat._statistics (19s) Starting
test(python3.6): pyspark.ml.classification Finished test(python2.7):
pyspark.mllib.tree (26s) Starting test(python3.6): pyspark.ml.clustering
Finished test(python2.7): pyspark.mllib.util (27s) Starting
test(python3.6): pyspark.ml.evaluation Finished test(python3.6):
pyspark.ml.evaluation (30s) Starting test(python3.6): pyspark.ml.feature
Finished test(python2.7): pyspark.ml.tests (234s) Starting
test(python3.6): pyspark.ml.fpm Finished test(python3.6): pyspark.ml.fpm
(1s) Starting test(python3.6): pyspark.ml.image Finished
test(python3.6): pyspark.ml.clustering (55s) Starting test(python3.6):
pyspark.ml.linalg.__init__ Finished test(python3.6):
pyspark.ml.linalg.__init__ (0s) Starting test(python3.6):
pyspark.ml.recommendation Finished test(python3.6):
pyspark.ml.classification (71s) Starting test(python3.6):
pyspark.ml.regression Finished test(python3.6): pyspark.ml.image (18s)
Starting test(python3.6): pyspark.ml.stat Finished test(python3.6):
pyspark.ml.stat (37s) Starting test(python3.6): pyspark.ml.tests
Finished test(python3.6): pyspark.ml.regression (59s) Starting
test(python3.6): pyspark.ml.tuning Finished test(python3.6):
pyspark.ml.feature (93s) Starting test(python3.6):
pyspark.mllib.classification Finished test(python3.6):
pyspark.ml.recommendation (83s) Starting test(python3.6):
pyspark.mllib.clustering Finished test(python3.6): pyspark.ml.tuning
(29s) Starting test(python3.6): pyspark.mllib.evaluation Finished
test(python3.6): pyspark.mllib.evaluation (26s) Starting
test(python3.6): pyspark.mllib.feature Finished test(python3.6):
pyspark.mllib.classification (43s) Starting test(python3.6):
pyspark.mllib.fpm Finished test(python3.6): pyspark.mllib.clustering
(81s) Starting test(python3.6): pyspark.mllib.linalg.__init__ Finished
test(python3.6): pyspark.mllib.linalg.__init__ (2s) Starting
test(python3.6): pyspark.mllib.linalg.distributed Finished
test(python3.6): pyspark.mllib.fpm (48s) Starting test(python3.6):
pyspark.mllib.random Finished test(python3.6): pyspark.mllib.feature
(54s) Starting test(python3.6): pyspark.mllib.recommendation Finished
test(python3.6): pyspark.mllib.random (18s) Starting test(python3.6):
pyspark.mllib.regression Finished test(python3.6):
pyspark.mllib.linalg.distributed (55s) Starting test(python3.6):
pyspark.mllib.stat.KernelDensity Finished test(python3.6):
pyspark.mllib.stat.KernelDensity (1s) Starting test(python3.6):
pyspark.mllib.stat._statistics Finished test(python3.6):
pyspark.mllib.recommendation (51s) Starting test(python3.6):
pyspark.mllib.tree Finished test(python3.6): pyspark.mllib.regression
(45s) Starting test(python3.6): pyspark.mllib.util Finished
test(python3.6): pyspark.mllib.stat._statistics (21s) Finished
test(python3.6): pyspark.mllib.tree (27s) Finished test(python3.6):
pyspark.mllib.util (27s) Finished test(python3.6): pyspark.ml.tests
(264s)
```
Closes #23591 from maropu/BACKPORT-24740.
Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: bf3cdeae3f27effb50f874cfe05f14192be47783)
The file was modifiedpython/pyspark/mllib/linalg/__init__.py (diff)
The file was modifiedpython/pyspark/mllib/stat/_statistics.py (diff)
The file was modifiedpython/pyspark/ml/linalg/__init__.py (diff)
The file was modifiedpython/pyspark/mllib/linalg/distributed.py (diff)
The file was modifiedpython/pyspark/mllib/clustering.py (diff)
The file was modifiedpython/pyspark/mllib/evaluation.py (diff)
The file was modifiedpython/pyspark/ml/stat.py (diff)
The file was modifiedpython/pyspark/ml/clustering.py (diff)