1. [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_udf` with keyword (commit: 328dea6f8ffcd515face7d64c29f7af71abd88a2) (details)
  2. [SPARK-23599][SQL][BACKPORT-2.3] Use RandomUUIDGenerator in Uuid (commit: 1c39dfaef09538ad63e4d5d6d9a343c9bfe9f8d3) (details)
Commit 328dea6f8ffcd515face7d64c29f7af71abd88a2 by hyukjinkwon
[SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_udf` with keyword
## What changes were proposed in this pull request?
Add documentation about the limitations of `pandas_udf` with keyword
arguments and related concepts, like `functools.partial` fn objects.
NOTE: intermediate commits on this PR show some of the steps that can be
taken to fix some (but not all) of these pain points.
### Survey of problems we face today:
(Initialize) Note: python 3.6 and spark 2.4snapshot.
from pyspark.sql import SparkSession
import inspect, functools
from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit,
spark = SparkSession.builder.getOrCreate()
df = spark.range(1,6).withColumn('b', col('id') * 2)
def ok(a,b): return a+b
Using a keyword argument at the call site `b=...` (and yes, *full* stack
trace below, haha):
---> 14 df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id',
b='id')).show() # no kwargs
TypeError: wrapper() got an unexpected keyword argument 'b'
Using partial with a keyword argument where the kw-arg is the first
argument of the fn:
*(Aside: kind of interesting that lines 15,16 work great and then 17
ValueError                                Traceback (most recent call
<ipython-input-9-e9f31b8799c1> in <module>()
    15 df.withColumn('ok', pandas_udf(f=functools.partial(ok, 7),
    16 df.withColumn('ok', pandas_udf(f=functools.partial(ok, b=7),
---> 17 df.withColumn('ok', pandas_udf(f=functools.partial(ok, a=7),
/Users/stu/ZZ/spark/python/pyspark/sql/ in pandas_udf(f,
returnType, functionType)
  2378         return functools.partial(_create_udf,
returnType=return_type, evalType=eval_type)
  2379     else:
-> 2380         return _create_udf(f=f, returnType=return_type,
/Users/stu/ZZ/spark/python/pyspark/sql/ in _create_udf(f,
returnType, evalType)
    54                 argspec.varargs is None:
    55             raise ValueError(
---> 56                 "Invalid function: 0-arg pandas_udfs are not
supported. "
    57                 "Instead, create a 1-arg pandas_udf and ignore
the arg in your function."
    58             )
ValueError: Invalid function: 0-arg pandas_udfs are not supported.
Instead, create a 1-arg pandas_udf and ignore the arg in your function.
Author: Michael (Stu) Stewart <>
Closes #20900 from mstewart141/udfkw2.
(cherry picked from commit 087fb3142028d679524e22596b0ad4f74ff47e8d)
Signed-off-by: hyukjinkwon <>
Signed-off-by: hyukjinkwon <>
(commit: 328dea6f8ffcd515face7d64c29f7af71abd88a2)
The file was modifiedpython/pyspark/sql/ (diff)
Commit 1c39dfaef09538ad63e4d5d6d9a343c9bfe9f8d3 by hvanhovell
[SPARK-23599][SQL][BACKPORT-2.3] Use RandomUUIDGenerator in Uuid
## What changes were proposed in this pull request?
As stated in Jira, there are problems with current `Uuid` expression
which uses `java.util.UUID.randomUUID` for UUID generation.
This patch uses the newly added `RandomUUIDGenerator` for UUID
generation. So we can make `Uuid` deterministic between retries.
This backports SPARK-23599 to Spark 2.3.
## How was this patch tested?
Added tests.
Author: Liang-Chi Hsieh <>
Closes #20903 from viirya/SPARK-23599-2.3.
(commit: 1c39dfaef09538ad63e4d5d6d9a343c9bfe9f8d3)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolvedUuidExpressionsSuite.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala (diff)