Build #144463

Environment variables

NameValue
ANDROID_HOME/home/android-sdk/
AWS_ACCESS_KEY_ID[*******]
AWS_SECRET_ACCESS_KEY[*******]
BUILD_CAUSEGHPRBCAUSE
BUILD_CAUSE_GHPRBCAUSEtrue
BUILD_DISPLAY_NAME#144463
BUILD_ID144463
BUILD_NUMBER144463
BUILD_TAGjenkins-SparkPullRequestBuilder-144463
BUILD_URLhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144463/
CLASSPATH$CLASSPATH
DBUS_SESSION_BUS_ADDRESSunix:path=/run/user/1001/bus
EXECUTOR_NUMBER3
GITHUB_OAUTH_KEY[*******]
GIT_BRANCHSPARK-36579
GIT_COMMITaa95a22dc54a909da8e96dbaaa14ce8911a765c3
GIT_PREVIOUS_COMMIT36e4a089a2f352542f2405d6f564508302d13d0a
GIT_PREVIOUS_SUCCESSFUL_COMMIT8fd104806eac482d62d46defc29201267f166c45
GIT_URLhttps://github.com/apache/spark.git
HOME/home/jenkins
HUDSON_HOME/var/lib/jenkins
HUDSON_SERVER_COOKIE472906e9832aeb79
HUDSON_URLhttps://amplab.cs.berkeley.edu/jenkins/
JAVA_HOME/usr/java/latest
JENKINS_HOME/var/lib/jenkins
JENKINS_SERVER_COOKIE472906e9832aeb79
JENKINS_URLhttps://amplab.cs.berkeley.edu/jenkins/
JOB_BASE_NAMESparkPullRequestBuilder
JOB_NAMESparkPullRequestBuilder
JOB_URLhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
LANGen_US.UTF-8
LOGNAMEjenkins
MOTD_SHOWNpam
NODE_LABELSresearch-jenkins-worker-04 ubuntu ubuntu20
NODE_NAMEresearch-jenkins-worker-04
OLDPWD/home/jenkins
PATH/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/:/home/jenkins/gems/bin:/usr/local/go/bin:/home/jenkins/go-projects/bin:/home/jenkins/anaconda2/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/:/home/jenkins/gems/bin:/usr/local/go/bin:/home/jenkins/go-projects/bin:/home/jenkins/anaconda2/bin:$PATH
PWD/home/jenkins
ROOT_BUILD_CAUSEGHPRBCAUSE
ROOT_BUILD_CAUSE_GHPRBCAUSEtrue
RUN_ARTIFACTS_DISPLAY_URLhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144463/display/redirect?page=artifacts
RUN_CHANGES_DISPLAY_URLhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144463/display/redirect?page=changes
RUN_DISPLAY_URLhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144463/display/redirect
RUN_TESTS_DISPLAY_URLhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144463/display/redirect?page=tests
SHELL/bin/bash
SHLVL0
SPARK_TEST_KEY[*******]
SSH_CLIENT192.168.10.11 39236 22
SSH_CONNECTION192.168.10.11 39236 192.168.10.24 22
USERjenkins
WORKSPACE/home/jenkins/workspace/SparkPullRequestBuilder
XDG_RUNTIME_DIR/run/user/1001
XDG_SESSION_CLASSuser
XDG_SESSION_ID1
XDG_SESSION_TYPEtty
_/usr/java/latest/bin/java
ghprbActualCommit9e4a5a89d08188836233028594f1aafde50322f8
ghprbActualCommitAuthorAngerszhuuuu
ghprbActualCommitAuthorEmailangers.zhu@gmail.com
ghprbAuthorRepoGitUrlhttps://github.com/AngersZhuuuu/spark.git
ghprbCommentBodynull
ghprbCredentialsIdb7d94526-9e9b-435f-9275-d7dbf209f4a3
ghprbGhRepositoryapache/spark
ghprbPullAuthorEmailangers.zhu@gmail.com
ghprbPullAuthorLoginAngersZhuuuu
ghprbPullAuthorLoginMention@AngersZhuuuu
ghprbPullDescriptionGitHub pull request #33828 of commit 9e4a5a89d08188836233028594f1aafde50322f8, no merge conflicts.
ghprbPullId33828
ghprbPullLinkhttps://github.com/apache/spark/pull/33828
ghprbPullLongDescription### What changes were proposed in this pull request?\r\nConsider such cases:\r\n\r\n1. we close a job when it is doing dynamic partition insert, it will remain such staging dir under table's path.  So we make the staging dir customized like hive can avoid remain such staging dir under table path.\r\n2. In hive's API, if we specify a staging dir, not use default staging dir (under table path), it can directly rename to target path and can avoid many hdfs file operations. In spark currently only dynamic partition insert support staging dir, we can do this like https://github.com/apache/spark/pull/33811\r\n3. We can support add a file commit protocol that support staging dir for all types of insert, then when we use that commit protocol, wen can do:\r\n    - Insert into non-partitioned table form it self\r\n    - Insert into partition table's statistic partition and read data from target partition\r\n    - Insert into different partition using statistic partition together\r\n\r\n### Why are the changes needed?\r\nMake spark data source insert's  stagingDir can be customized and then we can do more optimize base on this.\r\n\r\n\r\n### Does this PR introduce _any_ user-facing change?\r\nUser can define staging dir by `spark.exec.stagingDir`\r\n\r\n### How was this patch tested?\r\nAdded UT\r\n
ghprbPullTitle[SPARK-36579][CORE][SQL] Make spark source stagingDir can be customized
ghprbSourceBranchSPARK-36579
ghprbTargetBranchmaster
ghprbTriggerAuthor
ghprbTriggerAuthorEmailangers.zhu@gmail.com
ghprbTriggerAuthorLoginAngersZhuuuu
ghprbTriggerAuthorLoginMention@AngersZhuuuu
sha1origin/pr/33828/merge