FailedChanges

Summary

  1. [SPARK-25659][PYTHON][TEST] Test type inference specification for (details)
  2. [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of (details)
Commit f3fed28230e4e5e08d182715e8cf901daf8f3b73 by hyukjinkwon
[SPARK-25659][PYTHON][TEST] Test type inference specification for
createDataFrame in PySpark
## What changes were proposed in this pull request?
This PR proposes to specify type inference and simple e2e tests. Looks
we are not cleanly testing those logics.
For instance, see
https://github.com/apache/spark/blob/08c76b5d39127ae207d9d1fff99c2551e6ce2581/python/pyspark/sql/types.py#L894-L905
Looks we intended to support datetime.time and None for type inference
too but it does not work:
```
>>> spark.createDataFrame([[datetime.time()]]) Traceback (most recent
call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 751, in
createDataFrame
   rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 432, in
_createFromLocal
   data = [schema.toInternal(row) for row in data]
File "/.../spark/python/pyspark/sql/types.py", line 604, in toInternal
   for f, v, c in zip(self.fields, obj, self._needConversion))
File "/.../spark/python/pyspark/sql/types.py", line 604, in <genexpr>
   for f, v, c in zip(self.fields, obj, self._needConversion))
File "/.../spark/python/pyspark/sql/types.py", line 442, in toInternal
   return self.dataType.toInternal(obj)
File "/.../spark/python/pyspark/sql/types.py", line 193, in toInternal
   else time.mktime(dt.timetuple())) AttributeError: 'datetime.time'
object has no attribute 'timetuple'
>>> spark.createDataFrame([[None]]) Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 751, in
createDataFrame
   rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 419, in
_createFromLocal
   struct = self._inferSchemaFromList(data, names=schema)
File "/.../python/pyspark/sql/session.py", line 353, in
_inferSchemaFromList
   raise ValueError("Some of types cannot be determined after
inferring") ValueError: Some of types cannot be determined after
inferring
```
## How was this patch tested?
Manual tests and unit tests were added.
Closes #22653 from HyukjinKwon/SPARK-25659.
Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by:
hyukjinkwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/sql/tests.py (diff)
Commit a4b14a9cf828572829ad74743e68a06eb376ba28 by sean.owen
[SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce test time of
LogisticRegressionSuite
...with intercept with L1 regularization
## What changes were proposed in this pull request?
In the test, "multinomial logistic regression with intercept with L1
regularization" in the "LogisticRegressionSuite", taking more than a
minute due to training of 2 logistic regression model. However after
analysing the training cost over iteration, we can reduce the
computation time by 50%. Training cost vs iteration for model 1
![image](https://user-images.githubusercontent.com/23054875/46573805-ddab7680-c9b7-11e8-9ee9-63a99d498475.png)
So, model1 is converging after iteration 150.
Training cost vs iteration for model 2
![image](https://user-images.githubusercontent.com/23054875/46573790-b3f24f80-c9b7-11e8-89c0-81045ad647cb.png)
After around 100 iteration, model2 is converging. So, if we give maximum
iteration for model1 and model2 as 175 and 125 respectively, we can
reduce the computation time by half.
## How was this patch tested? Computation time in local setup : Before
change:
~53 sec After change:
~26 sec
Please review http://spark.apache.org/contributing.html before opening a
pull request.
Closes #22659 from shahidki31/SPARK-25623.
Authored-by: Shahid <shahidki31@gmail.com> Signed-off-by: Sean Owen
<sean.owen@databricks.com>
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala (diff)