In this session we will discuss the importance of parallel and high performance computing. We will by example, show the basic concepts of parallel computing. The advantages and disadvantages of parallel computing will be discussed. We will present an overview of current and future trends in HPC hardware. We will provide a very brief overview, a comparison and contrast, of some of the paradigms of HPC , including OpenMP, Message Passing Interface (MPI), GPU programming and programming for KnightsLanding.
Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation (with Unix shell). Participants will be encouraged to help one another and to apply what they have learned to their own research problems.
The course is aimed at undergraduate student researchers, graduate students, faculty, postdocs, and other researchers from RMACC. You don't need to have any previous knowledge of the tools that will be presented at the workshop.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (we will send out information ahead of time). They are also required to abide by Software Carpentry's Code of Conduct.
The Unix Shell
The Rocky Mountain region's newest supercomputer, named Summit, will be installed this summer. Any researcher from an RMACC-affiliated institution will be eligible to use it, with preference to CU-Boulder, Colorado State, and institutions that do not have their own supercomputing resources. We will describe Summit's overall architecture and outline the installation and availability schedule. We will also discuss changes that might be necessary for your applications and workflow in order to best take advantage of Summit's advanced features, which include Omni-Path high-performance network interconnect and some nodes with Intel "Knights Landing" Phi processors.
Target audience: Any current or prospective user of large-scale computing, and anyone interested in testing their applications on Omni-Path and Phi.
In this session we will go a bit deeper into the current paradigms of HPC. We will look at simple examples in OpenMP, Message Passing Interface (MPI), GPU programming (Cuda and OpenAcc) and programming for KnightsLanding. We will go into additional depth on topics based on participant interest and available time. This session can be taken with or without the
Intro to High performance computing session. Source code for in class and more advanced examples will be provided.
In the last few decades, there has been a tremendous explosion of research and data. Scientific visualization plays an increasingly important role in making new discoveries, gaining new and better insight and validation from these data. In this session we will cover the basic foundations of modern scientific visualization. We will discuss some of the basis principals, methods and techniques for transforming scientific data into state of the art visualizations. Topics will include data preparation, the modern graphics pipelines, visualization tools and methods as well as a brief introduction to color theory and perception.
Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation (with Unix shell). Participants will be encouraged to help one another and to apply what they have learned to their own research problems.
The course is aimed at undergraduate student researchers, graduate students, faculty, postdocs, and other researchers from RMACC. You don't need to have any previous knowledge of the tools that will be presented at the workshop.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (we will send out information ahead of time). They are also required to abide by Software Carpentry's Code of Conduct.
Version Control with GitResearch cyberinfrastructure for computational and data science is increasingly a key enabler to leadership in science and engineering research. NSF is not the dominant funder of research cyberinfrastructure. Its support of research cyberinfrastructure is determined by research priorities, strong community ties, as well as by recognizing the uniqueness of its role. This presentation will present an updated view of recent ACI investments and directions. It will also present recommendations from the recent National Academies’ Report on Future Directions NSF Advanced Computing Infrastructure .
More and more data of public interest is being generated by government and private organizations at all levels. Frequently this data is made available through custom portals operated by individual organizations and usually all one can do with the data is download and/or select a subset for downloading. Storage of public data in the Cloud offers greater opportunities for collaboration and computation against the data. This session will discuss what programs are available to support and use public datasets in the Cloud.
Everyone has skills. They get you in the door, but not necessarily get you the job. There can be 100 or more applicants per job posting, and they all have the same or better skills as you. It’s not just about the skills, it’s how to put your best foot forward to stand out as "the one.” In today’s extremely competitive job environment it is increasingly important for each person to create a clear, concise statement on who you are, what professional skills you offer, and why you are the best candidate for the position. Hear from marketing and industry professionals on what employers want to know and how to prepare for the questions employers are going to ask.
Description: Deep learning is a rapidly growing segment of artificial intelligence. It is increasingly used to deliver near-human level accuracy for image classification, voice recognition, natural language processing, sentiment analysis, recommendation engines, and more. Applications areas include facial recognition, scene detection, advanced medical and pharmaceutical research, and autonomous, self-driving vehicles. This talk focuses on the role GPUs play in accelerating all aspects of deep learning and where NVIDIA technologies play a key role in academia, supercomputing and industry.
What social and technical infrastructure elements are necessary to support research data management at the campus level? How does the campus, including Research Computing, Office of the VP for Research, the CIO, and the Library, collaborate to meet current and future needs? Join panelists from Colorado State University, the University of Colorado, the Colorado School of Mines, and the University of Wyoming for a discussion of key issues and challenges in managing and preserving scholarly and scientific data at their respective institutions.
The Jetstream cloud is a collaboration between Indiana University, TACC, the University of Arizona, and several domain-specific partners that expands the community of users who benefit from National Science Foundation investment in shared computing resources. It is an Infrastructure-as-a-Service platform comprised of two geographically isolated OpenStack+Ceph clusters, each supporting hundreds of virtual machines and data volumes. The two cloud systems are integrated via a user-friendly web application that provides a task-oriented user interface for common cloud computing operations, authentication to XSEDE via Globus, and an expressive set of web service APIs. Users are supported by expert staff drawn from the partner institutions and from the XSEDE national computing infrastructure. Jetstream enables on-demand access to interactive, user-configurable computing and analysis capability. Because Jetstream is easy to access and use, it democratizes access to cloud capabilities and technologies, and with its focus on sharing, discovery, and use of useful virtual machine images, it helps promote sharable, reproducible research in nearly any research domain. This talk will describe Jetstream in greater detail, as well as how its unique combination of hardware, software, and user engagement position it well to support the "long tail of science".
Representatives and system administrators from RMACC institutions will participate in a discussion and share their experiences, best practices, preferred system tools and challenges in managing large HPC clusters.
Campus Champions from RMACC institutions and from the XSEDE program will lead a discussion with an overview of XSEDE resources and policies, along with what being a part of the Campus Champion program involves and how to get started. Our regional champion program can provide information and support to schools without champions.
What are you doing about HPN, HPC, data, security on your campus, tools, research IT environments - issues, problems, etc.
The rapid growth and availability of large sets of structured and unstructured data (“big data”) has created significant opportunities to discover patterns, trends, and interactions that are applicable to many research fields. A significant amount of research accessing big data is focused on human health and behavior. A number of initial reports arising from collection and query of big data were challenged by data error. Traditionally, data error arises from a lack of suitable manpower, planning, materials, documentation collection and archiving practices, and equipment. Data error results in a lack of repeatability and reproducibility, which can have dire consequences both for the research team, and, when occurring in human research studies, potentially licensed products which don’t maintain their claims once used by the general public. Reproducibility issues are magnified when big data is used due to additional challenges pertaining to its collection, management, storage and appropriate archiving. This talk will focus both on practical and ethical issues of reproducibility for any research project, as well as topics specific to studies collecting and utilizing big data, using both historical and contemporary content. A brief discussion of regulations addressing data reproducibility will be included in the discussion.
Modern mathematical algorithms, when wed with high performance computing resources, allow one to treat large scale problems in Data Science. In particular, dimensionality reduction is one of the cornerstones of data analytics, and such ideas form the foundation for many other approaches commonly used in industry and academics. In this talk, we will describe how techniques such as low-rank matrix completion can be used to reduce the dimensionality of large scale data sets in an efficient and distributed fashion.
Meet with staff from RMACC institutions to share information about approaches to user training, education, and managing other user support issues. Are there coordination opportunities to share resources, training, and support provided at RMACC member institutions
How does ESnet engage researchers
Case Studies:
- NCAR/UCAR - CMIP6 data access/analysis
- UU - MRI domain sicence and compute science at a scale
- CU-B - Engaging researchers for the design of a shared supercomputer
- NYSERNET - Incentives for University Collaboration
For the last few decades (including most of the careers of current practitioners in science and engineering) one could count on Moore’s Law to provide progressively more powerful individual machines. Therefore, if one’s code or problem didn’t run fast enough on today’s machine, one could count on substantially faster performance within a few years. Frequently one just had to wait a little while and new hardware would solve the problem for you. However, around 2010, clock speeds of individual CPUs stopped getting faster and they are not likely to get faster in the foreseeable future. The best option available today for solving problems faster and/or solving bigger problems is to put more computers on the problem in parallel. Parallel computing has been the domain of HPC for decades and there are many lessons and skills that can be learned that are now relevant to a much wider audience. Come to this session to discuss what those are and whether current training is providing the needed instruction.
In high performance computing, data sets are increasing in size and workflows are growing in complexity. Additionally, it is becoming too costly to have copies of that data and, perhaps more importantly, too time and energy intensive to move them. Thus, the novel Zero Copy Architecture (ZCATM) was developed, where each process in a multi-stage workflow writes data locally for performance, yet other stages can access data globally. The result is accelerated workflows with the ability to perform burst buffer operations, in-situ analytics & visualization without the need for a data copy or movement.
In today’s world we face an ever growing increase in the size and complexity of our data as well as the globalization of interactive collaboration on projects. The data often requires high performance hardware and software available only on large clusters. In addition, researchers and analysts need the ability to work interactively in a collaborative manner from each of their individual stations. Modern remote collaborative visualization systems allow for this blending of the modern workflows. In this session we will explore some of the current state of the art collaborative visualization systems, including both hardware and software solutions, and how they can help improve your workflow.
R is a free and open source programming language that is the most common language in statistical programming today. In this workshop we will introduce R as a language from a conceptual and practical perspective, discussing why R is so popular for data science applications, and introducing the basic elements of the language. We will cover common data structures in R, reading and writing data files, and basic plotting functionality. We will also introduce some important packages that exist in the R ecosystem that help with data processing, analysis, and visualization.
Fortran is one of the primary languages of HPC. There are many advantages of Fortran 90, 95,`03 and `08 over Fortran 77, including: enhanced performance,portability,reliability and maintainability. This session will primarily cover the new features up to `95 such as: Kind facility, modules, interfaces, pointers, array operations, dynamic memory allocation, function overloading, internal IO, and the forall statement. We will “build” a program that demonstrates these new features. We will mention additions (object oriented features) in Fortran `03 and `08 and look at stream IO from `03 and C interoperability in more detail. Source code will be available.
The Linux shell is much more than just a way to enter individual commands. In this session, we'll learn to use bash's built-in programming elements, including loops, tests and conditions, variables, and functions. With the full power of the shell at your fingertips, your efficiency and productivity will skyrocket! If you would like to follow along with the examples, please bring a laptop that a) runs Linux or Mac OSX, or b) allows you to log in to a Linux server using ssh. Previous experience with the Linux command line would be helpful.
Introduction to Science Engagement-
Overview of the process, and what will be accomplished during the session.
Discussing the intersection between IT research environment support and research and scientific needs-
Overview of the items that matter from the research world (process of science, isntrumentation, collaborations) and the IT world (networks, computational hardware, storage, software, and protocols).
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Session 1 Intel Compliers: Boost your applications performance with Intel® C++ Compiler and Intel® Fortran Compiler for Windows* and Linux* (OSX*). The built-in OpenMP* parallel models combined with performance libraries simplify the implementation of fast, parallel code.
During this year's RMACC High Performance Computing Symposium, the Open Science Grid User Support team is offering a session on Thursday morning to integrate your campus HPC cluster into the OSG, in real time. What are the benefits of doing this?
Requirements are minimal and light!
The requirements and pre-workshop preparation are minimal: http://bit.ly/osgrmacc. No OSG software will need to be installed on your system for the basic connection; a gateway service dedicated to your campus will be hosted by the OSG. All that is required is a normal user account and SSH access to your login host. You control the policy as for any user. Many sites will offer a low priority backfill queue (evicted jobs will be rescheduled to other sites automatically if preemption is used).
If interested please prepare the complete this form https://goo.gl/forms/u82tRfSmW4aYHX1h1 before the start of the Symposium (the sooner the better so we can work out kinks or answer technical questions in advance). Please send any questions to user-support@opensciencegrid.org.
Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation (with Unix shell). Participants will be encouraged to help one another and to apply what they have learned to their own research problems.
The course is aimed at undergraduate student researchers, graduate students, faculty, postdocs, and other researchers from RMACC. You don't need to have any previous knowledge of the tools that will be presented at the workshop.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (we will send out information ahead of time). They are also required to abide by Software Carpentry's Code of Conduct.
This course will focus on building cohesive and efficient data processing pipelines for analysis. Specific focus will be placed on handling common data processing tasks including import/export, filtering, grouping, and summarising using the dplyr family of packages (dplyr, tidyr, magrittr, etc.). This course will conclude by looking at creating efficient visualizations with this pipeline using ggplot2. Slides and code examples of common tasks will be provided.
Gathering Network Requirements via Technology-
Determining technology triggers via scientific needs. Learning the roles of network, compute, and storage monitoring and reporting.
Gathering Science Requirements via Social Engineering-
Collaboration between Technology and Science. Conducting Requirements interviews and having discussions with academic early adopters of the requirements review process.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Session 2 Intel VTune Amplifier: Optimize serial and parallel performance with an advanced performance and thread profiler (Intel® VTune™ Amplifier). Tune C, C++, FORTRAN, Assembly and Java* applications.
CloudLab is a testbed where researchers can build their own clouds, giving them control of the parts of the cloud computing stack that would be "givens" if using someone else's cloud: virtualization, storage, networking, management, etc. This enables research that seeks to transform the cloud, not just to use it as-is.
This tutorial will cover the major features of CloudLab, and participants will learn how to create their own instance of OpenStack, over which they have complete administrative control, inside of CloudLab.
This will be a hands-on tutorial, and participants should bring a laptop with either the Google Chrome or Mozilla Firefox browser.
This course introduces the fundamentals of shared memory programming. Teaching you how to code using OpenMP, providing hands-on experience of parallel computing geared towards numerical applications.
Topics:
For both OpenMP and MPI tutorials we assume no expertise in parallel programming. It is expected that you are familiar with a compiled language like C, C++ or Fortran. These tutorials are hands-on, please bring a sufficiently recent (mutli-core) laptop so as to be able to participate.
Summit, the newly installed CU/CSU/RMACC supercomputer, offers several modern architectural features including multi- and many-core processors and Omni-Path high-performance network interconnect. Getting the maximum performance from these components requires some care when developing, compiling, and running applications on Summit. In this tutorial we will quickly cover introductory concepts such as optimization, parallelization, and vectorization. We will also give a variety of examples of how to structure vectorization-friendly code. Next, we'll show how to use the compiler to squeeze the most performance from your own codes and from programs you download and compile. In addition, we'll provide some hints on optimizing communication between nodes and to the scratch storage over the Omni-Path fabric. Finally, we'll consider the "high-throughput computing" issue for applications that are not well-suited for parallelization.
Determining Trends in Case Studies-
Reviewing Case Study and Interview Trends. Drafting Long Term Support strategies
Dedicating personnel for the process as part of a campus CI-plan.
Use Cases and Examples-
Review of serveral example case studies, and mechainsms that were used to improve the process of science.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Intel Inspector, find bugs before they happen with Intel® Inspector, an easy to use memory and threading debugger for C, C++ and FORTRAN applications.
Intel Advisor, find the greatest parallel performance potential and identify critical synchronization issues quickly with Intel® Advisor, a vectorization optimization and thread prototyping tool for C, C++ and Fortran applications.
Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation (with Unix shell). Participants will be encouraged to help one another and to apply what they have learned to their own research problems.
The course is aimed at undergraduate student researchers, graduate students, faculty, postdocs, and other researchers from RMACC. You don't need to have any previous knowledge of the tools that will be presented at the workshop.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (we will send out information ahead of time). They are also required to abide by Software Carpentry's Code of Conduct.
Jupyter Notebook is a web application that integrates live code, visualizations, and documentation. Through use of a familiar WYSIWYG interface on the ubiquitous web platform, Jupyter Notebook enables scientific computing for a broader community of researchers than traditional cluster computing interfaces alone; however, a typical installation of the Jupyter Notebook software is troublesome for many of the classes of user that the software is meant to support, even in the simplest case confined to a local workstation. This complication is compounded when researchers wish to use centralized remote resources, particularly when those resources are moderated by a traditional batch queueing system.
By deploying Jupyter Notebook with JupyterHub, University of Colorado Boulder Research Computing provides a centralized Jupyter Notebook service to simplify access to the service for the target audience. We have further tailored the system to our environment by integrating third party code, and now support Jupyter Notebooks and parallel IPython clusters dispatched directly and automatically in our HPC compute cluster environments. This trivializes access to HPC resources while providing a common interface that can be deployed in any environment.
This course introduces the fundamentals of distributed memory programming, using the Message Passing Interface (MPI) standard. Similar to the OpenMP tutorial, we will be using a hands-on approach.
Topics:
For both OpenMP and MPI tutorials we assume no expertise in parallel programming. It is expected that you are familiar with a compiled language like C, C++ or Fortran. These tutorials are hands-on, please bring a sufficiently recent (mutli-core) laptop so as to be able to participate.
As scientists and engineers focus on larger computational problems, the time spent accessing disk continues to grow. In this course, we will explore the fundamentals of using parallelization to optimize file input/output. A basic knowledge of parallel programming paradigms will be useful but is not required.
This tutorial will include hands-on exercises. As such, participants should bring a multi-core laptop, or have access to a remote parallel compute environment (e.g., Janus), in order to get as much out of the tutorial as possible. Some software packages may be necessary as well; we will send out information on such packages in advance.
Topics to be covered include:
Motivation, benefits, challenges, lessons learned, future planning-
Discussing the benefits, and challenges, that come with this approach.
Open Discussion and Hands On-
Walking through some live examples of a review.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.