SuccessChanges

Summary

  1. [SPARK-17952][SQL] Nested Java beans support in createDataFrame (details)
Commit 434ada12a06d1d2d3cb19c4eac5a52f330bb236c by ueshin
[SPARK-17952][SQL] Nested Java beans support in createDataFrame
## What changes were proposed in this pull request?
When constructing a DataFrame from a Java bean, using nested beans
throws an error despite
[documentation](http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection)
stating otherwise. This PR aims to add that support.
This PR does not yet add nested beans support in array or List fields.
This can be added later or in another PR.
## How was this patch tested?
Nested bean was added to the appropriate unit test.
Also manually tested in Spark shell on code emulating the referenced
JIRA:
``` scala> import scala.beans.BeanProperty import
scala.beans.BeanProperty
scala> class SubCategory(BeanProperty var id: String, BeanProperty var
name: String) extends Serializable defined class SubCategory
scala> class Category(BeanProperty var id: String, BeanProperty var
subCategory: SubCategory) extends Serializable defined class Category
scala> import scala.collection.JavaConverters._ import
scala.collection.JavaConverters._
scala> spark.createDataFrame(Seq(new Category("s-111", new
SubCategory("sc-111", "Sub-1"))).asJava, classOf[Category])
java.lang.IllegalArgumentException: The value (SubCategory65130cf2) of
the type (SubCategory) cannot be converted to
struct<id:string,name:string>
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:262)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:396)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1108)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1108)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1108)
at
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$class.toStream(Iterator.scala:1320)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1334)
at
scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1334)
at
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:423)
... 51 elided
```
New behavior:
``` scala> spark.createDataFrame(Seq(new Category("s-111", new
SubCategory("sc-111", "Sub-1"))).asJava, classOf[Category]) res0:
org.apache.spark.sql.DataFrame = [id: string, subCategory: struct<id:
string, name: string>]
scala> res0.show()
+-----+---------------+
|   id|    subCategory|
+-----+---------------+
|s-111|[sc-111, Sub-1]|
+-----+---------------+
```
Closes #22527 from michalsenkyr/SPARK-17952.
Authored-by: Michal Senkyr <mike.senkyr@gmail.com> Signed-off-by: Takuya
UESHIN <ueshin@databricks.com>
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala (diff)