
To exploit high-performance CPU/GPU architectures to efficiently support Highlights new challenges and opportunities for communication runtimes Overview of parallelization strategies for distributed training and ML models and Deep Neural Networks (DNNs). Offer high-performance training and deployment for various types of Modern ML/DL and Data Scienceįrameworks including TensorFlow, PyTorch, and Dask have emerged that Recent advances in Machine and Deep Learning (ML/DL) have led to manyĮxciting challenges and opportunities.
Image pro plus ohio state university software#
Subramoni is doing research on the design and development of MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM, UPC and CAF)) software packages.
Image pro plus ohio state university professional#
He has been actively involved in various professional activities in academic journals and conferences. He has published over 70 papers in international journals and conferences related to these research areas. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, deep learning and cloud computing. Hari Subramoni is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. degrees in computer science from the University of New Mexico.ĭr.

He's served on open standards committees for parallel programming models, including OpenSHMEM and MPI for many years, and he is the author of more than 50 research papers in this area. While at ORNL, he researched, designed, and developed several innovative and high-performing communication middleware for HPC systems, including InfiniBand systems and Cray (XK7, XE). Previously, he was a research scientist and the Languages Team lead at Oak Ridge National Laboratory. His focus is on architecting features for current and next-generation NVIDIA's networking products, programming models, and network libraries to address the needs of HPC and AI/DL systems and workloads. Manjunath Gorentla Venkata is a Principal Software Architect at NVIDIA. His research interests include high speed interconnects, parallel programming models and HPC/DL software. He had received his Master’s degree in Computer Science and Engineering from the Indian Institute of Technology, Kanpur. Panda, involved in the design and development of MVAPICH. Previously, he was a software developer at The Ohio State University in network-Based Computing Laboratory led by Dr.

At Mellanox, Devendar was instrumental in building several key technologies like SHARP,UCX, HCOLL.etc. SHARP has been successfully powering HPC and AI/WL workloads through collective libraries such as HCOLL, and SHARP.ĭevendar Bureddy is a Principal SW Engineer at Mellanox Technologies.

After introducing UCC, in the last part of the talk, we provide a brief overview of SHARP. One of the important and successful hardware implementations of collective operations is SHARP. The UCC provides a user-facing public API and library implementation which leverages software protocols, and hardware solutions to implement collective operations. Then, we will also share the status of UCC implementation and the upcoming plans of the working group. In this talk, we will highlight some of the design principles of the UCC v1.0 specification.

Over the last year, the UCC WG group has met weekly to develop the UCC specification. UCC is a community-driven effort to develop collective API and library implementation for applications in various domains, including High-Performance Computing, Artificial Intelligence, Data Center, and I/O. In this talk, we will provide a brief overview of both these solutions. UCC and SHARP are important building blocks for collective operations for HPC and AI/DL workloads.
