SuccessChanges

Summary

  1. [SPARK-30380][ML] Refactor RandomForest.findSplits (details)
  2. [SPARK-30321][ML] Log weightSum in Algo that has weights support (details)
  3. [SPARK-30329][ML] add iterator/foreach methods for Vectors (details)
Commit 0b561a7f4686dfb2194cdcfe3efc407f9e2b039a by ruifengz
[SPARK-30380][ML] Refactor RandomForest.findSplits
### What changes were proposed in this pull request? Refactor
`RandomForest.findSplits` by applying `aggregateByKey` instead of
`groupByKey`
### Why are the changes needed? Current impl of
`RandomForest.findSplits` uses `groupByKey` to collect non-zero values
for each feature, so it is quite dangerous. After looking into the
following logic to find splits, I found that collecting all non-zero
values is not necessary, and we only need weightSums of distinct values.
### Does this PR introduce any user-facing change? No
### How was this patch tested? existing testsuites
Closes #27040 from zhengruifeng/rf_opt.
Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by:
zhengruifeng <ruifengz@foxmail.com>
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala (diff)
Commit 694da0382e31cb06b7138225fea791efd547f2ca by ruifengz
[SPARK-30321][ML] Log weightSum in Algo that has weights support
### What changes were proposed in this pull request? add
instr.logSumOfWeights in the Algo that has weightCol support
### Why are the changes needed? Many algorithms support weightCol now. I
think weightsum is useful info to add to the log.
### Does this PR introduce any user-facing change? no
### How was this patch tested? manually tested
Closes #26972 from huaxingao/spark-30321.
Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: zhengruifeng
<ruifengz@foxmail.com>
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala (diff)
Commit 23a49aff27075cbed3bf507e0d7cc42373aed5cf by ruifengz
[SPARK-30329][ML] add iterator/foreach methods for Vectors
### What changes were proposed in this pull request? 1, add new
foreach-like methods: foreach/foreachNonZero 2, add iterator:
iterator/activeIterator/nonZeroIterator
### Why are the changes needed? see the
[ticke](https://issues.apache.org/jira/browse/SPARK-30329) for details
foreach/foreachNonZero: for both convenience and performace
(SparseVector.foreach should be faster than current traversal method)
iterator/activeIterator/nonZeroIterator: add the three iterators, so
that we can futuremore add/change some impls based on those iterators
for both ml and mllib sides, to avoid vector conversions.
### Does this PR introduce any user-facing change? Yes, new methods are
added
### How was this patch tested? added testsuites
Closes #26982 from zhengruifeng/vector_iter.
Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by:
zhengruifeng <ruifengz@foxmail.com>
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregator.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala (diff)
The file was modifiedmllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/RobustScaler.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala (diff)
The file was modifiedmllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Binarizer.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HingeAggregator.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/FMRegressor.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala (diff)