FailedConsole Output

[EnvInject] - Mask passwords passed as build parameters.
GitHub pull request #6465 of commit 7492b311c2e8f74c82317d2e5dad5a9847e3fed2 automatically merged.
[EnvInject] - Loading node environment variables.
Building remotely on amp-jenkins-staging-worker-02 (ubuntu ubuntu-gpu ubuntu-avx2 staging-02 staging) in workspace /home/jenkins/workspace/Tune-Tests
[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Done
Cloning the remote Git repository
Cloning repository https://github.com/ray-project/ray.git
 > git init /home/jenkins/workspace/Tune-Tests # timeout=10
Fetching upstream changes from https://github.com/ray-project/ray.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/ray-project/ray.git +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/ray-project/ray.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/ray-project/ray.git # timeout=10
Fetching upstream changes from https://github.com/ray-project/ray.git
 > git fetch --tags --progress https://github.com/ray-project/ray.git +refs/pull/*:refs/remotes/origin/pr/*
 > git rev-parse origin/pr/6465/merge^{commit} # timeout=10
 > git branch -a -v --no-abbrev --contains 5c64644f547d67af4fd6488947314bd06ea4317f # timeout=10
Checking out Revision 5c64644f547d67af4fd6488947314bd06ea4317f (origin/pr/6465/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5c64644f547d67af4fd6488947314bd06ea4317f
First time build. Skipping changelog.
[Tune-Tests] $ /bin/sh -xe /tmp/hudson2937175834182542997.sh
+ ./ci/jenkins_tests/run_tune_tests.sh
+ MEMORY_SIZE=
+ SHM_SIZE=
+ DOCKER_SHA=
+++ dirname ./ci/jenkins_tests/run_tune_tests.sh
++ cd ./ci/jenkins_tests
++ pwd
+ ROOT_DIR=/home/jenkins/workspace/Tune-Tests/ci/jenkins_tests
+ SUPPRESS_OUTPUT=/home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output
+ '[' '' == '' ']'
+ MEMORY_SIZE=20G
+ '[' '' == '' ']'
+ SHM_SIZE=20G
+ '[' '' == '' ']'
+ echo 'Building application docker.'
Building application docker.
+ docker build -q --no-cache -t ray-project/base-deps docker/base-deps
sha256:582bd3a95fa3987465e32fc8ba42e5f2ff01dc488affbeed5a2df4f7295aac49
+ git rev-parse HEAD
++ git rev-parse HEAD
+ git archive -o ./docker/tune_test/ray.tar 5c64644f547d67af4fd6488947314bd06ea4317f
++ docker build --no-cache -q -t ray-project/tune_test docker/tune_test
+ DOCKER_SHA=sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769
+ echo 'Using Docker image' sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769
Using Docker image sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 pytest /ray/python/ray/tune/tests/test_cluster.py
This command has been running for more than 5 minutes...

real	5m16.537s
user	0m0.076s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 pytest /ray/python/ray/tune/tests/test_actor_reuse.py

real	0m48.470s
user	0m0.048s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 pytest /ray/python/ray/tune/tests/test_tune_restore.py
This command has been running for more than 5 minutes...

real	6m36.026s
user	0m0.048s
sys	0m0.028s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/tests/example.py

real	1m33.169s
user	0m0.048s
sys	0m0.008s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 bash -c 'pip install -U tensorflow && python /ray/python/ray/tune/tests/test_logger.py'

real	0m25.766s
user	0m0.048s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 bash -c 'pip install -U tensorflow==1.15 && python /ray/python/ray/tune/tests/test_logger.py'

real	1m1.631s
user	0m0.040s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 bash -c 'pip install -U tensorflow==1.14 && python /ray/python/ray/tune/tests/test_logger.py'

real	0m58.664s
user	0m0.040s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 bash -c 'pip install -U tensorflow==1.12 && python /ray/python/ray/tune/tests/test_logger.py'

real	0m42.357s
user	0m0.040s
sys	0m0.044s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G -e MPLBACKEND=Agg sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/tests/tutorial.py
This command has been running for more than 5 minutes...

real	8m47.389s
user	0m0.076s
sys	0m0.032s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/pbt_example.py --smoke-test

real	0m25.153s
user	0m0.036s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/hyperband_example.py --smoke-test

real	0m18.313s
user	0m0.036s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/async_hyperband_example.py --smoke-test

real	0m18.476s
user	0m0.040s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/tf_mnist_example.py --smoke-test

real	0m30.040s
user	0m0.040s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/lightgbm_example.py

real	0m58.596s
user	0m0.036s
sys	0m0.028s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/xgboost_example.py

real	0m19.008s
user	0m0.048s
sys	0m0.008s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/logging_example.py --smoke-test

real	0m12.415s
user	0m0.032s
sys	0m0.032s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/mlflow_example.py

real	0m20.623s
user	0m0.040s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/bayesopt_example.py --smoke-test

real	0m28.615s
user	0m0.040s
sys	0m0.024s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/hyperopt_example.py --smoke-test

real	0m18.461s
user	0m0.048s
sys	0m0.016s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G -e SIGOPT_KEY sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/sigopt_example.py --smoke-test

real	0m42.328s
user	0m0.052s
sys	0m0.032s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/nevergrad_example.py --smoke-test

real	0m19.522s
user	0m0.028s
sys	0m0.028s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/tune_mnist_keras.py --smoke-test

real	0m21.956s
user	0m0.048s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/mnist_pytorch.py --smoke-test

real	0m44.963s
user	0m0.036s
sys	0m0.044s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/mnist_pytorch_trainable.py --smoke-test

real	0m35.899s
user	0m0.040s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/genetic_example.py --smoke-test

real	0m18.208s
user	0m0.044s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/skopt_example.py --smoke-test

real	0m31.954s
user	0m0.040s
sys	0m0.028s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/pbt_memnn_example.py --smoke-test

real	1m24.053s
user	0m0.036s
sys	0m0.032s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/pbt_convnet_example.py --smoke-test

real	1m11.642s
user	0m0.052s
sys	0m0.020s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python /ray/python/ray/tune/examples/pbt_dcgan_mnist/pbt_dcgan_mnist.py --smoke-test

real	4m28.687s
user	0m0.040s
sys	0m0.032s
+ /home/jenkins/workspace/Tune-Tests/ci/jenkins_tests/../suppress_output docker run --rm --shm-size=20G --memory=20G sha256:45989197278cce30f96680c15fee6404ef1402cef9ef05df75e0d5594edd0769 python -m pytest /ray/python/ray/experimental/sgd/tests

real	2m47.187s
user	0m0.048s
sys	0m0.024s
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
============================= test session starts ==============================
platform linux -- Python 3.6.8, pytest-5.3.2, py-1.5.3, pluggy-0.13.1
rootdir: /ray/python
plugins: remotedata-0.3.2, timeout-1.3.4, doctestplus-0.1.3, openfiles-0.3.0, arraydiff-0.2
collected 23 items

python/ray/experimental/sgd/tests/test_pytorch.py .........(pid=1397) /opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
(pid=1397)   out=out, **kwargs)
(pid=1397) /opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
(pid=1397)   ret = ret.dtype.type(ret / rcount)
F(pid=1507) /opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
(pid=1507)   out=out, **kwargs)
(pid=1507) /opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
(pid=1507)   ret = ret.dtype.type(ret / rcount)
F            [ 47%]
python/ray/experimental/sgd/tests/test_pytorch_runner.py ......          [ 73%]
python/ray/experimental/sgd/tests/test_tensorflow.py ......              [100%]

=================================== FAILURES ===================================
_________________________________ test_resize __________________________________

ray_start_2_cpus = {'node_ip_address': '172.17.0.4', 'object_store_address': '/tmp/ray/session_2020-01-16_07-24-11_277740_1/sockets/plasm...socket_name': '/tmp/ray/session_2020-01-16_07-24-11_277740_1/sockets/raylet', 'redis_address': '172.17.0.4:41272', ...}

    def test_resize(ray_start_2_cpus):
        if not dist.is_available():
            return
    
        def step_with_fail(self):
            worker_stats = [w.step.remote() for w in self.workers]
            if self._num_failures < 1:
                self.workers[0].__ray_kill__()
            success = check_for_failure(worker_stats)
            return success, worker_stats
    
        with patch.object(PyTorchTrainer, "_train_step", step_with_fail):
            trainer1 = PyTorchTrainer(
                model_creator,
                data_creator,
                optimizer_creator,
                loss_creator=lambda config: nn.MSELoss(),
                num_replicas=2)
    
            @ray.remote
            def try_test():
                import time
                time.sleep(100)
    
            try_test.remote()
>           trainer1.train(max_retries=1)

python/ray/experimental/sgd/tests/test_pytorch.py:221: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/pytorch_trainer.py:197: in train
    success, worker_stats = self._train_step()
python/ray/experimental/sgd/tests/test_pytorch.py:204: in step_with_fail
    success = check_for_failure(worker_stats)
/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/utils.py:147: in check_for_failure
    finished = ray.get(finished)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

object_ids = [ObjectID(ceb264be3405891f6dabbe8e010000c801000000)]
timeout = None

    def get(object_ids, timeout=None):
        """Get a remote object or a list of remote objects from the object store.
    
        This method blocks until the object corresponding to the object ID is
        available in the local object store. If this object is not in the local
        object store, it will be shipped from an object store that has it (once the
        object has been created). If object_ids is a list, then the objects
        corresponding to each object in the list will be returned.
    
        Args:
            object_ids: Object ID of the object to get or a list of object IDs to
                get.
            timeout (Optional[float]): The maximum amount of time in seconds to
                wait before returning.
    
        Returns:
            A Python object or a list of Python objects.
    
        Raises:
            RayTimeoutError: A RayTimeoutError is raised if a timeout is set and
                the get takes longer than timeout to return.
            Exception: An exception is raised if the task that created the object
                or that created one of the objects raised an exception.
        """
        worker = global_worker
        worker.check_connected()
    
        if hasattr(
                worker,
                "core_worker") and worker.core_worker.current_actor_is_asyncio():
            raise RayError("Using blocking ray.get inside async actor. "
                           "This blocks the event loop. Please "
                           "use `await` on object id with asyncio.gather.")
    
        with profiling.profile("ray.get"):
            is_individual_id = isinstance(object_ids, ray.ObjectID)
            if is_individual_id:
                object_ids = [object_ids]
    
            if not isinstance(object_ids, list):
                raise ValueError("'object_ids' must either be an object ID "
                                 "or a list of object IDs.")
    
            global last_task_error_raise_time
            # TODO(ujvl): Consider how to allow user to retrieve the ready objects.
            values = worker.get_objects(object_ids, timeout=timeout)
            for i, value in enumerate(values):
                if isinstance(value, RayError):
                    last_task_error_raise_time = time.time()
                    if isinstance(value, ray.exceptions.UnreconstructableError):
                        worker.core_worker.dump_object_store_memory_usage()
                    if isinstance(value, RayTaskError):
>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(RuntimeError): ray_worker (pid=1397, ip=172.17.0.4)
E                         File "python/ray/_raylet.pyx", line 638, in ray._raylet.execute_task
E                           with core_worker.profile_event(b"task:execute"):
E                         File "python/ray/_raylet.pyx", line 643, in ray._raylet.execute_task
E                           outputs = next(outputs)
E                         File "python/ray/_raylet.pyx", line 623, in function_executor
E                           yield function(actor, *arguments, **kwarguments)
E                         File "/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/distributed_pytorch_runner.py", line 87, in step
E                           return super(DistributedPyTorchRunner, self).step()
E                         File "/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/pytorch_runner.py", line 117, in step
E                           self.given_optimizers, self.config)
E                         File "/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/utils.py", line 27, in train
E                           for i, (features, target) in enumerate(train_iterator):
E                         File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 277, in __iter__
E                           return _SingleProcessDataLoaderIter(self)
E                         File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 376, in __init__
E                           super(_SingleProcessDataLoaderIter, self).__init__(loader)
E                         File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 332, in __init__
E                           self._base_seed = torch.empty((), dtype=torch.int64).random_().item()
E                       RuntimeError: FunctionParameter(): invalid type string: IntArrayRef

/opt/conda/lib/python3.6/site-packages/ray/worker.py:1492: RayTaskError(RuntimeError)
---------------------------- Captured stderr setup -----------------------------
2020-01-16 07:24:11,277	WARNING worker.py:682 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2020-01-16 07:24:11,278	WARNING services.py:592 -- setpgrp failed, processes may not be cleaned up properly: [Errno 1] Operation not permitted.
2020-01-16 07:24:11,279	INFO resource_spec.py:212 -- Starting Ray with 287.26 GiB memory available for workers and up to 0.15 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-01-16 07:24:11,613	WARNING services.py:1080 -- Failed to start the dashboard. The dashboard requires Python 3 as well as 'pip install aiohttp psutil setproctitle grpcio'.
----------------------------- Captured stderr call -----------------------------
2020-01-16 07:24:11,786	INFO pytorch_trainer.py:92 -- Using gloo as backend.
2020-01-16 07:24:11,786	INFO pytorch_trainer.py:103 -- start_workers: Setting 2 replicas.
--------------------------- Captured stderr teardown ---------------------------
2020-01-16 07:24:14,537	WARNING worker.py:1063 -- A worker died or was killed while executing task ffffffffffffffff6dabbe8e0100.
_______________________________ test_fail_thrice _______________________________

ray_start_2_cpus = {'node_ip_address': '172.17.0.4', 'object_store_address': '/tmp/ray/session_2020-01-16_07-24-16_581646_1/sockets/plasm...socket_name': '/tmp/ray/session_2020-01-16_07-24-16_581646_1/sockets/raylet', 'redis_address': '172.17.0.4:32789', ...}

    def test_fail_thrice(ray_start_2_cpus):
        if not dist.is_available():
            return
    
        def step_with_fail(self):
            worker_stats = [w.step.remote() for w in self.workers]
            if self._num_failures < 2:
                self.workers[0].__ray_kill__()
            success = check_for_failure(worker_stats)
            return success, worker_stats
    
        with patch.object(PyTorchTrainer, "_train_step", step_with_fail):
            trainer1 = PyTorchTrainer(
                model_creator,
                data_creator,
                optimizer_creator,
                loss_creator=lambda config: nn.MSELoss(),
                num_replicas=2)
    
>           trainer1.train(max_retries=2)

python/ray/experimental/sgd/tests/test_pytorch.py:244: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/pytorch_trainer.py:197: in train
    success, worker_stats = self._train_step()
python/ray/experimental/sgd/tests/test_pytorch.py:233: in step_with_fail
    success = check_for_failure(worker_stats)
/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/utils.py:147: in check_for_failure
    finished = ray.get(finished)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

object_ids = [ObjectID(d5bc30934c636f1270cbb26f010000c801000000)]
timeout = None

    def get(object_ids, timeout=None):
        """Get a remote object or a list of remote objects from the object store.
    
        This method blocks until the object corresponding to the object ID is
        available in the local object store. If this object is not in the local
        object store, it will be shipped from an object store that has it (once the
        object has been created). If object_ids is a list, then the objects
        corresponding to each object in the list will be returned.
    
        Args:
            object_ids: Object ID of the object to get or a list of object IDs to
                get.
            timeout (Optional[float]): The maximum amount of time in seconds to
                wait before returning.
    
        Returns:
            A Python object or a list of Python objects.
    
        Raises:
            RayTimeoutError: A RayTimeoutError is raised if a timeout is set and
                the get takes longer than timeout to return.
            Exception: An exception is raised if the task that created the object
                or that created one of the objects raised an exception.
        """
        worker = global_worker
        worker.check_connected()
    
        if hasattr(
                worker,
                "core_worker") and worker.core_worker.current_actor_is_asyncio():
            raise RayError("Using blocking ray.get inside async actor. "
                           "This blocks the event loop. Please "
                           "use `await` on object id with asyncio.gather.")
    
        with profiling.profile("ray.get"):
            is_individual_id = isinstance(object_ids, ray.ObjectID)
            if is_individual_id:
                object_ids = [object_ids]
    
            if not isinstance(object_ids, list):
                raise ValueError("'object_ids' must either be an object ID "
                                 "or a list of object IDs.")
    
            global last_task_error_raise_time
            # TODO(ujvl): Consider how to allow user to retrieve the ready objects.
            values = worker.get_objects(object_ids, timeout=timeout)
            for i, value in enumerate(values):
                if isinstance(value, RayError):
                    last_task_error_raise_time = time.time()
                    if isinstance(value, ray.exceptions.UnreconstructableError):
                        worker.core_worker.dump_object_store_memory_usage()
                    if isinstance(value, RayTaskError):
>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(RuntimeError): ray_worker (pid=1507, ip=172.17.0.4)
E                         File "python/ray/_raylet.pyx", line 638, in ray._raylet.execute_task
E                           with core_worker.profile_event(b"task:execute"):
E                         File "python/ray/_raylet.pyx", line 643, in ray._raylet.execute_task
E                           outputs = next(outputs)
E                         File "python/ray/_raylet.pyx", line 623, in function_executor
E                           yield function(actor, *arguments, **kwarguments)
E                         File "/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/distributed_pytorch_runner.py", line 87, in step
E                           return super(DistributedPyTorchRunner, self).step()
E                         File "/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/pytorch_runner.py", line 117, in step
E                           self.given_optimizers, self.config)
E                         File "/opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/utils.py", line 27, in train
E                           for i, (features, target) in enumerate(train_iterator):
E                         File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 277, in __iter__
E                           return _SingleProcessDataLoaderIter(self)
E                         File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 376, in __init__
E                           super(_SingleProcessDataLoaderIter, self).__init__(loader)
E                         File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 332, in __init__
E                           self._base_seed = torch.empty((), dtype=torch.int64).random_().item()
E                       RuntimeError: FunctionParameter(): invalid type string: IntArrayRef

/opt/conda/lib/python3.6/site-packages/ray/worker.py:1492: RayTaskError(RuntimeError)
---------------------------- Captured stderr setup -----------------------------
2020-01-16 07:24:16,581	WARNING worker.py:682 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2020-01-16 07:24:16,582	WARNING services.py:592 -- setpgrp failed, processes may not be cleaned up properly: [Errno 1] Operation not permitted.
2020-01-16 07:24:16,583	INFO resource_spec.py:212 -- Starting Ray with 287.7 GiB memory available for workers and up to 0.15 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-01-16 07:24:16,932	WARNING services.py:1080 -- Failed to start the dashboard. The dashboard requires Python 3 as well as 'pip install aiohttp psutil setproctitle grpcio'.
----------------------------- Captured stderr call -----------------------------
2020-01-16 07:24:17,163	INFO pytorch_trainer.py:92 -- Using gloo as backend.
2020-01-16 07:24:17,164	INFO pytorch_trainer.py:103 -- start_workers: Setting 2 replicas.
--------------------------- Captured stderr teardown ---------------------------
2020-01-16 07:24:20,141	WARNING worker.py:1063 -- A worker died or was killed while executing task ffffffffffffffff70cbb26f0100.
=============================== warnings summary ===============================
/opt/conda/lib/python3.6/site-packages/IPython/lib/pretty.py:91
  /opt/conda/lib/python3.6/site-packages/IPython/lib/pretty.py:91: DeprecationWarning: IPython.utils.signatures backport for Python 2 is deprecated in IPython 6, which only supports Python 3
    from IPython.utils.signatures import signature

/opt/conda/lib/python3.6/site-packages/h5py/__init__.py:36
  /opt/conda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters

/opt/conda/lib/python3.6/site-packages/h5py/tests/old/test_attrs_data.py:251
  /opt/conda/lib/python3.6/site-packages/h5py/tests/old/test_attrs_data.py:251: DeprecationWarning: invalid escape sequence \H
    s = b"Hello\x00\Hello"

ray/experimental/sgd/tests/test_pytorch.py::test_train[1]
ray/experimental/sgd/tests/test_pytorch.py::test_train[2]
  /opt/conda/lib/python3.6/site-packages/ray/experimental/sgd/pytorch/pytorch_trainer.py:239: RuntimeWarning: Mean of empty slice
    [s.get(stat_key, np.nan) for s in worker_stats])

ray/experimental/sgd/tests/test_pytorch_runner.py::TestPyTorchRunner::testSingleLoader
ray/experimental/sgd/tests/test_pytorch_runner.py::TestPyTorchRunner::testStep
ray/experimental/sgd/tests/test_pytorch_runner.py::TestPyTorchRunner::testValidate
  /opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
    out=out, **kwargs)

ray/experimental/sgd/tests/test_pytorch_runner.py::TestPyTorchRunner::testSingleLoader
ray/experimental/sgd/tests/test_pytorch_runner.py::TestPyTorchRunner::testStep
ray/experimental/sgd/tests/test_pytorch_runner.py::TestPyTorchRunner::testValidate
  /opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
    ret = ret.dtype.type(ret / rcount)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============ 2 failed, 21 passed, 11 warnings in 163.91s (0:02:43) =============
FAILED 1
Process leaked file descriptors. See http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information
Build step 'Execute shell' marked build as failure
Sending e-mails to: rliaw@berkeley.edu
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/Tune-Tests/348/
Tune tests failed.
Finished: FAILURE