SuccessChanges

Summary

  1. [SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator (details)
  2. [SPARK-25677][DOC] spark.io.compression.codec = (details)
  3. [SPARK-25666][PYTHON] Internally document type conversion between Python (details)
Commit cb90617f894fd51a092710271823ec7d1cd3a668 by hyukjinkwon
[SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator
## What changes were proposed in this pull request?
If we use accumulators in more than one UDFs, it is possible to
overwrite deserialized accumulators and its values. We should check if
an accumulator was deserialized before overwriting it in accumulator
registry.
## How was this patch tested?
Added test.
Closes #22635 from viirya/SPARK-25591.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by:
hyukjinkwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/sql/tests.py (diff)
The file was modifiedpython/pyspark/accumulators.py (diff)
Commit 1a6815cd9f421a106f8d96a36a53042a00f02386 by hyukjinkwon
[SPARK-25677][DOC] spark.io.compression.codec =
org.apache.spark.io.ZstdCompressionCodec throwing
IllegalArgumentException Exception
## What changes were proposed in this pull request? Documentation is
updated with proper classname org.apache.spark.io.ZStdCompressionCodec
## How was this patch tested? we used the  spark.io.compression.codec =
org.apache.spark.io.ZStdCompressionCodec and verified the logs.
Closes #22669 from shivusondur/CompressionIssue.
Authored-by: shivusondur <shivusondur@gmail.com> Signed-off-by:
hyukjinkwon <gurwls223@apache.org>
The file was modifieddocs/configuration.md (diff)
Commit a853a80202032083ad411eec5ec97b304f732a61 by hyukjinkwon
[SPARK-25666][PYTHON] Internally document type conversion between Python
data and SQL types in normal UDFs
### What changes were proposed in this pull request?
We are facing some problems about type conversions between Python data
and SQL types in UDFs (Pandas UDFs as well). It's even difficult to
identify the problems (see https://github.com/apache/spark/pull/20163
and https://github.com/apache/spark/pull/22610).
This PR targets to internally document the type conversion table. Some
of them looks buggy and we should fix them.
```python import sys import array import datetime from decimal import
Decimal
from pyspark.sql import Row from pyspark.sql.types import * from
pyspark.sql.functions import udf
if sys.version >= '3':
   long = int
data = [
   None,
   True,
   1,
   long(1),
   "a",
   u"a",
   datetime.date(1970, 1, 1),
   datetime.datetime(1970, 1, 1, 0, 0),
   1.0,
   array.array("i", [1]),
   [1],
   (1,),
   bytearray([65, 66, 67]),
   Decimal(1),
   {"a": 1},
   Row(kwargs=1),
   Row("namedtuple")(1),
]
types =  [
   BooleanType(),
   ByteType(),
   ShortType(),
   IntegerType(),
   LongType(),
   StringType(),
   DateType(),
   TimestampType(),
   FloatType(),
   DoubleType(),
   ArrayType(IntegerType()),
   BinaryType(),
   DecimalType(10, 0),
   MapType(StringType(), IntegerType()),
   StructType([StructField("_1", IntegerType())]),
]
df = spark.range(1) results = [] count = 0 total = len(types) *
len(data) spark.sparkContext.setLogLevel("FATAL") for t in types:
   result = []
   for v in data:
       try:
           row = df.select(udf(lambda: v, t)()).first()
           ret_str = repr(row[0])
       except Exception:
           ret_str = "X"
       result.append(ret_str)
       progress = "SQL Type: [%s]\n  Python Value: [%s(%s)]\n  Result
Python Value: [%s]" % (
           t.simpleString(), str(v), type(v).__name__, ret_str)
       count += 1
       print("%s/%s:\n  %s" % (count, total, progress))
   results.append([t.simpleString()] + list(map(str, result)))
schema = ["SQL Type \\ Python Value(Type)"] + list(map(lambda v:
"%s(%s)" % (str(v), type(v).__name__), data)) strings =
spark.createDataFrame(results, schema=schema)._jdf.showString(20, 20,
False) print("\n".join(map(lambda line: "    # %s  # noqa" % line,
strings.strip().split("\n"))))
```
This table was generated under Python 2 but the code above is Python 3
compatible as well.
## How was this patch tested?
Manually tested and lint check.
Closes #22655 from HyukjinKwon/SPARK-25666.
Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by:
hyukjinkwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/sql/functions.py (diff)