SuccessChanges

Summary

  1. [SPARK-23446][PYTHON] Explicitly check supported types in toPandas (commit: ccb0a59d7383db451b86aee67423eb6e28f1f982) (details)
  2. [SPARK-23381][CORE] Murmur3 hash generates a different value from other (commit: 8360da07110d847a01b243e6d786922a5057ad9f) (details)
Commit ccb0a59d7383db451b86aee67423eb6e28f1f982 by gatorsmile
[SPARK-23446][PYTHON] Explicitly check supported types in toPandas
## What changes were proposed in this pull request?
This PR explicitly specifies and checks the types we supported in
`toPandas`. This was a hole. For example, we haven't finished the binary
type support in Python side yet but now it allows as below:
```python spark.conf.set("spark.sql.execution.arrow.enabled", "false")
df = spark.createDataFrame([[bytearray("a")]]) df.toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
df.toPandas()
```
```
    _1 0  [97]
_1 0  a
```
This should be disallowed. I think the same things also apply to nested
timestamps too.
I also added some nicer message about
`spark.sql.execution.arrow.enabled` in the error message.
## How was this patch tested?
Manually tested and tests added in `python/pyspark/sql/tests.py`.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #20625 from HyukjinKwon/pandas_convertion_supported_type.
(cherry picked from commit c5857e496ff0d170ed0339f14afc7d36b192da6d)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: ccb0a59d7383db451b86aee67423eb6e28f1f982)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedpython/pyspark/sql/tests.py (diff)
Commit 8360da07110d847a01b243e6d786922a5057ad9f by gatorsmile
[SPARK-23381][CORE] Murmur3 hash generates a different value from other
implementations
## What changes were proposed in this pull request? Murmur3 hash
generates a different value from the original and other implementations
(like Scala standard library and Guava or so) when the length of a bytes
array is not multiple of 4.
## How was this patch tested? Added a unit test.
**Note: When we merge this PR, please give all the credits to Shintaro
Murakami.**
Author: Shintaro Murakami <mrkm4ntrgmail.com>
Author: gatorsmile <gatorsmile@gmail.com> Author: Shintaro Murakami
<mrkm4ntr@gmail.com>
Closes #20630 from gatorsmile/pr-20568.
(cherry picked from commit d5ed2108d32e1d95b26ee7fed39e8a733e935e2c)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 8360da07110d847a01b243e6d786922a5057ad9f)
The file was modifiedcommon/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java (diff)
The file was modifiedcommon/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java (diff)
The file was modifiedpython/pyspark/ml/feature.py (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala (diff)