Loading…
This event has ended. Visit the official site or create your own event on Sched.
Gateways 2020 is scheduled from October 12 to October 23, with the tutorial and workshop track during the first week and the main conference track during the second week. This fifth Gateways annual conference is an opportunity for gateway creators and enthusiasts to learn, share, connect, and shape the future of gateways, while supporting and growing our community. Register for the conference by October 5.

The default time zone is Eastern Time. You can adjust it to your time zone on the right side of the schedule underneath the search box (or in the top bar, depending on the width of your screen).

Already registered for the conference and want to personalize your own schedule? Sign up for your own free Gateways 2020 Sched account. Note: Signing up for Sched is NOT the same as registering for the conference.
Back To Schedule
Thursday, October 22 • 2:45pm - 3:00pm
G2: A Scalable Cloud-Based Analysis Platform for Survey Astronomy

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Research in astronomy is undergoing a major paradigm shift, transformed by the advent of large, automated, sky-surveys into a data-rich field where multi-TB to PB-sized spatio-temporal data sets are commonplace. For example the Legacy Survey of Space and Time; LSST) is about to begin delivering observations of >10^10 objects, including a database with >4 x 10^13 rows of time series data. This volume presents a challenge: how should a domain-scientist with little experience in data management or distributed computing access data and perform analyses at PB-scale?

We present a possible solution to this problem built on (adapted) industry standard tools and made accessible through web gateways. We have i) developed Astronomy eXtensions for Spark, AXS, a series of astronomy-specific modifications to Apache Spark allowing astronomers to tap into its computational scalability ii) deployed datasets in AXS-queriable format in Amazon S3, leveraging its I/O scalability, iii) developed a deployment of Spark on Kubernetes with auto-scaling configurations requiring no end-user interaction, and iv) provided a Jupyter notebook, web-accessible, front-end via JupyterHub including a rich library of pre-installed common astronomical software (accessible at http://hub.dirac.institute).

We use this system to enable the analysis of data from the Zwicky Transient Facility, presently the closest precursor survey to the LSST, and discuss initial results. To our knowledge, this is a first application of cloud-based scalable analytics to astronomical datasets approaching LSST-scale. The code is available at https://github.com/astronomy-commons.


Thursday October 22, 2020 2:45pm - 3:00pm EDT
Concurrent Room 1