JUWELS Cluster and Booster: Exascale Pathfinder with Modular Supercomputing Architecture at Juelich Supercomputing Centre

Authors

DOI:

https://doi.org/10.17815/jlsrf-7-183

Abstract

JUWELS is a multi-petaflop modular supercomputer operated by Juelich Supercomputing Centre at Forschungszentrum Juelich as a European and national supercomputing resource for the Gauss Centre for Supercomputing. In addition, JUWELS serves the Earth system modeling community and the AI community within the Helmholtz Association as well. JUWELS currently consists of two modules. The first module deployed in 2018 is the so-called Cluster module. The Cluster is a BullSequana X1000 system with Intel Xeon Skylake-SP processors and Mellanox EDR InfiniBand. The second module deployed in 2020 is the so-called Booster module. The Booster is a BullSequana XH2000 system with 2nd generation AMD EPYC processors, NVIDIA Ampere GPUs and NVIDIA/Mellanox HDR Infiniband. This paper describes in detail the architecture of the system from a users perspective, and additionally provides further insights into the administrative infrastructure used to operate the supercomputer.

Author Biography

Damian Alvarez, Juelich Supercomputing Centre

High Performance Computing Systems

References

Atos. (2021a). Atos Bullsequana X1000 product webpage. Retrieved from https://atos.net/en/products/high-performance-computing-hpc/bullsequana-x-supercomputers/bullsequana-x1000

Atos. (2021b). Atos Bullsequana XH2000 product webpage. Retrieved from https://atos.net/en/solutions/high-performance-computing-hpc/bullsequana-x-supercomputers#bullsequana-xh2000

Ceph. (2021). Ceph distributed storage system. Retrieved from https://ceph.io

ClusterLabs. (2021). ClusterLabs Stack webpage. Retrieved from https://clusterlabs.org

Eicker, N., Lippert, T., Moschny, T., & Suarez, E. (2016). The DEEP Project An alternative approach to heterogeneous cluster-computing in the many-core era. Concurrency and computation, 28(8), 2394–2411. http://dx.doi.org/10.1002/cpe.3562

Elasticsearch B.V. (2021). ELK Stack. Retrieved from https://www.elastic.co

Forschungszentrum Jülich. (2015). JUQUEEN: IBM BlueGene/Q Supercomputer System at the Jülich Supercomputing Centre. Journal of large-scale research facilities, 1, A1. http://dx.doi.org/10.17815/jlsrf-1-18

Forschungszentrum Jülich. (2021a). Forschungszentrum Jülich webpage. Retrieved from https://www.fz-juelich.de

Forschungszentrum Jülich. (2021b). Jülich Supercomputing Centre webpage. Retrieved from https:// www.fz-juelich.de/ias/jsc

Forschungszentrum Jülich. (2021c). Jupyter@JSC webpage. Retrieved from https://jupyter-jsc.fz-juelich.de

Forschungszentrum Jülich. (2021d). JUST webpage. Retrieved from https://www.fz-juelich.de/ias/jsc/just

Forschungszentrum Jülich. (2021e). JUWELS webpage. Retrieved from https://www.fz-juelich.de/ias/jsc/juwels

Forschungszentrum Jülich. (2021f). LLview webpage. Retrieved from https://www.fz-juelich.de/jsc/llview

Gauss Centre for Supercomputing. (2021a). Gauss Centre for Supercomputing webpage. Retrieved from https://www.gauss-centre.eu

Gauss Centre for Supercomputing. (2021b). HPC Access Gauss Centre for Supercomputing e.V. Retrieved from https://www.gauss-centre.eu/for-users/hpc-access/

Grafana Labs. (2021). Grafana: The open observability platform. Retrieved from https://grafana.com Helmholtz Association. (2021). Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. (HGF) webpage. Retrieved from https://www.helmholtz.de

High-Performance Computing Center Stuttgart. (2021). High-Performance Computing Center Stuttgart webpage. Retrieved from https://www.hlrs.de

Hoste, K., Timmerman, J., Georges, A., & De Weirdt, S. (2012). EasyBuild: Building Software with Ease. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (p. 572-582). http://dx.doi.org/10.1109/SC.Companion.2012.81

IBM. (2021). IBM Spectrum Scale product webpage. Retrieved from https://www.ibm.com/de-de/products/spectrum-scale

John von Neumann Institute for Computing. (2021). John von Neumann Institute for Computing (NIC) webpage. Retrieved from http://www.john-von-neumann-institut.de

Jülich Supercomputing Centre. (2018). JURECA: Modular supercomputer at Jülich Supercomputing Centre. Journal of large-scale research facilities, 4, A132. http://dx.doi.org/10.17815/jlsrf-4-121-1

Jülich Supercomputing Centre. (2019). JUWELS: Modular Tier-0/1 Supercomputer at the Jülich Supercomputing Centre.Journal of large-scale research facilities, 5, A135. http://dx.doi.org/10.17815/jlsrf-5-171

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences, & Humanities. (2021). Leibniz Supercomputing Centre webpage. Retrieved from https://lrz.de

ParTec Cluster Competence Center GmbH. (2021). ParTec webpage. Retrieved from https://www.par-tec.com

Partnership for Advanced Computing in Europe. (2021). Partnership for Advanced Computing in Europe webpage. Retrieved from https://www.prace-ri.eu

Prometheus. (2021). Prometheus - Monitoring system & time series database. Retrieved from https://prometheus.io/

Red Hat Inc. (2021). Ansible Configuration Manager webpage. Retrieved from https://www.ansible.com

SchedMD LLC. (2021). Slurm Workload Manager webpage. Retrieved from https://slurm.schedmd.com

Top500. (2021). Top500 June 2021 list. Retrieved from https://www.top500.org/lists/2021/06

UNICORE. (2021). Uniform Interface to Computing Resources (UNICORE) webpage. Retrieved from https://www.unicore.eu

Downloads

Published

2021-10-29

Issue

Section

Articles

URN