Description: Deep learning is a rapidly growing segment of artificial intelligence. It is increasingly used to deliver near-human level accuracy for image classification, voice recognition, natural language processing, sentiment analysis, recommendation engines, and more. Applications areas include facial recognition, scene detection, advanced medical and pharmaceutical research, and autonomous, self-driving vehicles. This talk focuses on the role GPUs play in accelerating all aspects of deep learning and where NVIDIA technologies play a key role in academia, supercomputing and industry.
In high performance computing, data sets are increasing in size and workflows are growing in complexity. Additionally, it is becoming too costly to have copies of that data and, perhaps more importantly, too time and energy intensive to move them. Thus, the novel Zero Copy Architecture (ZCATM) was developed, where each process in a multi-stage workflow writes data locally for performance, yet other stages can access data globally. The result is accelerated workflows with the ability to perform burst buffer operations, in-situ analytics & visualization without the need for a data copy or movement.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Session 1 Intel Compliers: Boost your applications performance with Intel® C++ Compiler and Intel® Fortran Compiler for Windows* and Linux* (OSX*). The built-in OpenMP* parallel models combined with performance libraries simplify the implementation of fast, parallel code.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Session 2 Intel VTune Amplifier: Optimize serial and parallel performance with an advanced performance and thread profiler (Intel® VTune™ Amplifier). Tune C, C++, FORTRAN, Assembly and Java* applications.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Intel Inspector, find bugs before they happen with Intel® Inspector, an easy to use memory and threading debugger for C, C++ and FORTRAN applications.
Intel Advisor, find the greatest parallel performance potential and identify critical synchronization issues quickly with Intel® Advisor, a vectorization optimization and thread prototyping tool for C, C++ and Fortran applications.
As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.