SuccessChanges

Summary

  1. [SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Move UDF stop iteration (details)
Commit 470cacd4982ca369ffd294ee37abfa1864d39967 by hyukjinkwon
[SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Move UDF stop iteration
wrapping from driver to executor
SPARK-23754 was fixed in #21383 by changing the UDF code to wrap the
user function, but this required a hack to save its argspec. This PR
reverts this change and fixes the `StopIteration` bug in the worker.
The root of the problem is that when an user-supplied function raises a
`StopIteration`, pyspark might stop processing data, if this function is
used in a for-loop. The solution is to catch `StopIteration`s exceptions
and re-raise them as `RuntimeError`s, so that the execution fails and
the error is reported to the user. This is done using the
`fail_on_stopiteration` wrapper, in different ways depending on where
the function is used:
- In RDDs, the user function is wrapped in the driver, because this
function is also called in the driver itself.
- In SQL UDFs, the function is wrapped in the worker, since all
processing happens there. Moreover, the worker needs the signature of
the user function, which is lost when wrapping it, but passing this
signature to the worker requires a not so nice hack.
HyukjinKwon
Author: edorigatti <emilio.dorigatti@gmail.com> Author: e-dorigatti
<emilio.dorigatti@gmail.com>
Closes #21538 from e-dorigatti/branch-2.3.
The file was modifiedpython/pyspark/sql/tests.py (diff)
The file was modifiedpython/pyspark/tests.py (diff)
The file was modifiedpython/pyspark/sql/udf.py (diff)
The file was modifiedpython/pyspark/worker.py (diff)
The file was modifiedpython/pyspark/util.py (diff)