Prospects for Wideband VLBI Correlation in the Cloud

Prospects for Wideband VLBI Correlation in the CloudGill, AjayBlackburn, LindyRoshanineshat, ArashChan, Chi-kwanDoeleman, Sheperd S.Johnson, Michael D.Raymond, Alexander W.Weintroub, JonathanDOI: info:10.1088/1538-3873/ab32a8v. 131124501
Gill, Ajay, Blackburn, Lindy, Roshanineshat, Arash, Chan, Chi-kwan, Doeleman, Sheperd S., Johnson, Michael D., Raymond, Alexander W., and Weintroub, Jonathan. 2019. "Prospects for Wideband VLBI Correlation in the Cloud." Publications of the Astronomical Society of the Pacific 131:124501.
ID: 154713
Type: article
Authors: Gill, Ajay; Blackburn, Lindy; Roshanineshat, Arash; Chan, Chi-kwan; Doeleman, Sheperd S.; Johnson, Michael D.; Raymond, Alexander W.; Weintroub, Jonathan
Abstract: This paper proposes a cloud architecture for the correlation of wide bandwidth Very Long Baseline Interferometry (VLBI) data. Cloud correlation facilitates processing of entire experiments in parallel using flexibly allocated and practically unlimited compute resources. This approach offers a potential improvement over dedicated correlation clusters, which are constrained by a fixed number of installed processor nodes and playback units. Additionally, cloud storage offers an alternative to maintaining a fleet of hard disk drives that might be utilized intermittently. Here, we describe benchmarks of VLBI correlation using the DiFX-2.5.2 software on the Google Cloud Platform to assess cloud-based correlation performance. In our analysis, the number of virtual central processing units per virtual machine was varied to determine the optimum configuration of cloud resources. The number of stations was varied to determine the scaling of correlation time with VLBI arrays of different sizes. Data transfer rates from Google cloud storage to the virtual machines performing the correlation were also measured. Based on the results, we present an example cloud correlation configuration. Current cloud service and equipment pricing data is used to compile cost estimates allowing an approximate economic comparison between cloud and cluster processing. We note that the economic comparisons are based on cost figures which are a moving target, and are highly dependent on factors such as the utilization of cluster and media, which are a challenge to estimate. Our model suggests that shifting to the cloud is an alternative path for high data rate, low duty cycle wideband VLBI correlation that should continue to be explored. In the production phase of VLBI correlation, the cloud has the potential to significantly reduce data processing times and allow the processing of more science experiments in a given year for the petabyte- scale data sets increasingly common in both astronomy and geodesy VLBI applications.