Embarrassingly Parallel Time Series Analysis for Large Scale Weak Memory Systems

Distributed_time_series_analysis

Second order stationary models in time series analysis are based on the analysis of essential statistics whose computations follow a common pattern. In particular, with a map-reduce nomenclature, most of these operations can be modeled as mapping a kernel that only depends on short windows of consecutive data and reducing the results produced by each computation. This computational pattern stems from the ergodicity of the model under consideration and is often referred to as weak or short memory when it comes to data indexed with respect to time. In the following we will show how studying weak memory systems can be done in a scalable manner thanks to a framework relying on specifically designed overlapping distributed data structures that enable fragmentation and replication of the data across many machines as well as parallelism in computations. This scheme has been implemented for Apache Spark but is certainly not system specific. Indeed we prove it is also adapted to leveraging high bandwidth fragmented memory blocks on GPUs.econd order stationary models in time series analysis are based on the analysis of essential statistics whose computations follow a common pattern. In particular, with a map-reduce nomenclature, most of these operations can be modeled as mapping a kernel that only depends on short windows of consecutive data and reducing the results produced by each computation. This computational pattern stems from the ergodicity of the model under consideration and is often referred to as weak or short memory when it comes to data indexed with respect to time. In the following we will show how studying weak memory systems can be done in a scalable manner thanks to a framework relying on specifically designed overlapping distributed data structures that enable fragmentation and replication of the data across many machines as well as parallelism in computations. This scheme has been implemented for Apache Spark but is certainly not system specific. Indeed we prove it is also adapted to leveraging high bandwidth fragmented memory blocks on GPUs.

Arxiv link.