1994 — 1998 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Research Initiation Award: Reducing the Impact of Synchronization Latency in Shared-Memory Multiprocessors @ William Marsh Rice University
9410457 Adve The objective of this research is to develop and evaluate techniques for reducing the impact of synchronization latency in shared-memory multiprocessors. Recent trends indicate that future microprocessors will employ aggressive techniques (e.g., multiple instruction issue, dynamic scheduling) to exploit instruction-level parallelism. Current shared-memory multiprocessors, however, usually require a processor to stall on synchronization reads until the read completes, precluding the full exploitation of future uniprocessors. The research studies two approaches for reducing the impact of synchronization latency: (a) overlapping latency of explicit synchronizations, and (b) using implicit synchronization (where possible) to eliminate synchronization latency. The key components of the study will be: (1) a practical hardware technique based on guarded instructions for overlapping acquire operations, (2) alleviating implementation overheads incurred with implicit synchronization by using a novel combination of explicit and implicit synchronization, (3) determining information in the form of high-level annotations that programmers can provide to enhance detection of instruction-level parallelism, and (4) a quantitative evaluation of the efficacy of the above techniques using extensive instruction-level simulations of real applications. This research is a necessary step towards enabling wider acceptance of parallel machines because it will enable machines to be more cost-effective by exploiting more fully the potential of next-generation microprocessors, and will broaden the supported class of applications by allowing the efficient execution of finer-grained applications. ***
|
0.939 |
1995 — 1999 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: An Integrated Approach For Improving the Performance, Programmability, and Portability of Shared Memory Multiprocessors @ William Marsh Rice University
While shared-memory multiprocessors simplify many aspects of programming, it is widely believed that naive shared- momory programs are not likely to result in high performance on most shared-memory systems. The broad objective is to develop and evaluate techniques for improving the performance of shared- memory systems while simultaneously enhancing programmability and portability. A key problem that inhibited performance in shared-memory systems is that, unlike message passing programs, the exact communication requirements of shared-memory programs are not obvious. Therefore, compilers, runtime systems, and hardware typically need to make conservative assumptions, resulting in excessive communication and synchronization primitives provided on a particular machine (e.g., locks and barriers), the intended sharing behavior is obscured by these details. Furthermore, the low-level primitives that must be used to obtain good performance vary from platform to platform, thus inhibiting program portability. This research takes an integrated approach to improving the performance, and portability of systems. The approach broadly is: (1) to identify common implementation-independent communication patterns along with primitive to express those patterns in a programming language, (2) to develop heuristics that can be used by the compiler, runtime systems, hardware or some combination thereof to map these high-level primitives to the appropriate low-level primitives supported by the particular target system, and (3) to evaluate the resulting performance benefits on the system. By exposing the intended sharing behavior, many instances of unnecessary communication should be avoided, thus achieving higher performance. Furthermore, despite the larger variety of synchronization patterns that a programmer must choose from his or her task is simplified because details of implementing the patterns are left to the programming support system (note tha t the sharing patterns themselves must be understood by the programmer anyway). Finally, in combination with other projects within the overall research program, this research should demonstrate that these common, high-level synchronization patterns can be efficiently supported on a variety of platforms, thus enhancing the portability of the program. This part of the research program is focused on investigating a specific kind of target system - a hardware distributed shared- memory multiprocessor that provides software control for protocols and efficient active message support (e.g., Stanford Flash and Wisconsin Typhoon)- using execution-driven simulation. Other projects in this research program are already exploring similar issues on other types of target systems. Education: The educational objectives fall within three broad areas: curriculm development, teaching methods, and education outside the classroom. The objectives of the curriculm development part are (1) to introduce a course in parallel computing involving computational scientists and engineers along with computer scientists (both in terms of students attending and faculty teaching the course), (2) to introduce hands-on parallel programming experience as part of currently existing undergraduate computer architecture course, (3) to impart a sound education on performance analysis techniques, and (4) to impart a research experience to undergraduates by having a modest research project as part of the senior level architecture course and by actively recruiting undergraduates in the proposed research program through the Rice senior honor's thesis program. The teaching methods part of the education plan address growth as an effective teacher by exposure to research in teaching methods. The final part seeks to (1) encourage women and other under- represented groups to pursue computer science by actively interacting with such groups informally and through formal workshops, and (2) encourage an ex change between industry and academia through tutorials and short courses.
|
0.939 |
1997 |
Bennett, John (co-PI) [⬀] Adve, Sarita Adve, Vikram (co-PI) [⬀] Aazhang, Behnaam (co-PI) [⬀] Baraniuk, Richard (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Instrumentation: Design and Evaluation of Architectures, Programming Environments, and Applications For Shared-Memory Systems @ William Marsh Rice University
Adve, Sarita V. Aazhang, Behnaam Rice University CISE Research Instrumentation: Design and Evaluation of Architectures, Programming Environments, and Applications for Shared-Memory Systems This research instrumentation grant facilitates acquiring a shared-memory multiprocessor to support the following research projects: -Design and Evaluation of ILP-Based Shared-Memory Multiprocessors -Parallelizing Compilers for Shared-Memory Systems- Interactive and Adaptive Techniques for Tuning the Performance of Shared-Memory Parallel Programs -Parallel Algorithms for Communications Systems -Signal and Image Processing. The following research is thus enabled:- Architectural techniques to exploit instruction-level-parallelism in shared-memory multiprocessors: Fast simulation methods are the key enabling technology for this research. The proposed multiprocessor enables the development and use of high-performance parallel simulators. - Compilation techniques for High Performance Fortran (HPF): The proposed multiprocessor is a cost-effective platform for HPF compiler development and an important target for evaluating the compiler-generated parallel code. - Runtime techniques to identify and remedy performance bottlenecks in shared-memory programs: The multiprocessor is the desired platform to develop and evaluate the techniques. - Algorithms for wireless and network communication systems: Most such algorithms must meet stringent real-time constraints. The multiprocessor enables the development of parallel algorithms to meet these constraints.- Algorithms for signal and image processing for applications including geophysics, radar, and medical imaging diagnostics: The multiprocessor is needed to test and develop parallel and resource-intensive sequential algorithms on real data sets. Overall, the proposed system enables the above research in three critical ways: enables parallelization of applications that cannot be run sequentially, provides a testbed for compiler and tools research, and provides cost-effective resources for sequential but resource-intensive tasks.
|
0.939 |
1999 — 2003 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Architectures For Emerging Applications @ William Marsh Rice University
The broad objective of this research is to develop and evaluate general-purpose architectures for emerging and future media-processing applications. These applications are expected to require orders-of-magnitude higher performance than available on today's general-purpose systems. A key challenge is to design systems that will provide such performance improvements for media applications, without sacrificing performance for more conventional applications.
This research avails of a unique opportunity to bring together architecture researchers and applications researchers working on the technologies underlying media processing. The research focuses on three classes of applications: (1) signal processing; specifically, video, image, audio, speech processing, and high-speed modems, (2) wireless communication, and (3) virtual environments and visualization. The approach is to use simulation to develop a quantitative understanding of the behavior of these applications, and use this understanding to develop new architectural techniques. An initial quantitative study of image and video applications has motivated two architectural directions: (I) design of dynamically partitionable caches, and (II) use of reconfigurable logic for specialized media functions for a general-purpose system.
In the past, such architectural research has been impeded by a lack of access to applications. The proposed collaboration with industry and other researchers will overcome this impediment and enable the design of future architectures based on future applications.
|
1 |
2001 — 2006 |
Padua, David Kale, Laxmikant [⬀] Adve, Sarita Geubelle, Philippe (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: Performance Modeling and Programming Environments For Petaflop Computers and the Blue Gene Machine @ University of Illinois At Urbana-Champaign
EIA-0103645 Laxmikant V. Kale University of Illinois
Performance Modeling and Programming Environments for PetaFlop Computers and the Blue Gene Machine
The objective of the proposal is to develop performance simulation capabilities to allow system level analysis and prediction of performance of the next generation complex PetaFlop machines that include multiple levels of memory hierarchy and interconnects. The performance simulator that will be developed will be used to test parallel data structures and algorithms implemented in programming environments used in these machines, as well as frameworks to enable the development of applications for these machine classes. A number of important applications will be used to test and validate the CS technology advances.
|
1 |
2002 — 2006 |
Jones, Douglas (co-PI) [⬀] Adve, Sarita Nahrstedt, Klara (co-PI) [⬀] Kravets, Robin (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Collaborative Hardware-Software Adaptation For Multimedia Applications @ University of Illinois At Urbana-Champaign
Mobile systems primarily processing multimedia data are expected to become a dominant computing platform for many application domains. The design of such systems imposes several new challenges, as it must consider demanding, dynamic, and multidimensional resource requirements and constraints, with energy becoming a first-class resource. At the same time, the ability of multimedia applications to trade off output quality for system resources and the difference between their peak and average demands offers a huge opportunity for optimization.
A promising approach to meet the challenges of mobile multimedia systems, therefore, is to design all system layers with an ability to adapt in response to system or application changes. Further, to reap the full benefits of these adaptations, all system layers must cooperate to reach a system-wide globally-optimal configuration. This research seeks to develop and demonstrate an integrated cross-layer adaptive system where hardware and all software layers cooperatively adapt to changing system resources and application demands, seeking to maximize user satisfaction while meeting resource constraints of energy, time, and bandwidth. This work is expected to have a large impact because it will expose sources of substantial performance improvement not available before, for a platform of increasing importance to many application domains.
|
1 |
2002 — 2006 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Using Simultaneous Multithreaded Processors For Soft Real-Time Applications @ University of Illinois At Urbana-Champaign
This research concerns the use of simultaneous multithreaded (SMT) processors for soft real-time applications such as multimedia applications. SMT processors have the potential to provide high throughput by running multiple threads at the same time, and soft real-time applications are an increasingly important workload. The use of SMT processors for real-time applications, however, has largely been unexplored.
Most work on SMT has been driven by the goal of increasing throughput. Real-time applications additionally require high schedulability (i.e., the ability to meet deadlines) and predictability. Further, such applications often run in energy and thermal power constrained environments. This work seeks to develop co-schedule selection and resource sharing algorithms (and consequent admission tests) for SMT processors that will (1) maximize instruction throughput, (2) maximize schedulability, (3) maximize execution time predictability, (4) minimize energy, and (5) minimize thermal power, for soft real-time applications such as multimedia applications. This is the first work that considers the issues of temporal schedulability and predictability, and integrates them with energy and thermal considerations, for real-time applications and SMT. Without this research, an increasingly important class of workloads would be unable to exploit an architectural advance that has provided large benefits in other domains.
|
1 |
2006 — 2010 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Lifetime Reliability Aware Microprocessors @ University of Illinois At Urbana-Champaign
Lifetime Reliability Aware Microprocessors As CMOS scaling continues, increasingly smaller feature sizes and increasing power densities are accelerating the onset of wear-out or aging-related hard failures in processors. Current lower-level solution strategies will likely be inadequate to address this lifetime reliability problem. This research advocates higher-level, microarchitectural solutions for processor lifetime reliability. The first component of this work is in the development and validation of microarchitecture-level models, metrics, and tools that incorporate key failure mechanisms and their scaling behavior. The second component develops novel architectural solutions for the lifetime reliability problem, including dynamic reliability management and selective structural redundancy.
The performance benefits from CMOS technology scaling over the last several decades have enabled the information revolution that has affected virtually every aspect of society. The problem of lifetime reliability addressed in this proposal is one of the key impediments to seeing continued benefits from CMOS scaling. The proposed work seeks to develop a fundamentally new approach to address this problem that will enable meeting the reliability goals critical for all processor manufacturers. This work is in collaboration with researchers from IBM which will provide needed industrial expertise as well as a path for technology transfer.
|
1 |
2007 — 2010 |
Adve, Sarita Adve, Vikram (co-PI) [⬀] Zhou, Yuanyuan [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Pdos: Online Production-Run Software Failure Diagnosis At the User Site @ University of Illinois At Urbana-Champaign
As software systems have grown in size, complexity and cost, it has become increasingly difficult to deliver software bug-free to end users, which result in many software failures during production runs at the user site. While much work has been conducted on software failure diagnosis, most previous work focuses on off-site diagnosis (i.e. diagnosis at the development site with involvement of programmers) and thereby is insufficient to diagnose production-run software failure at the user site.
To effectively address production-run failures, we propose a novel approach that automatically performs on-site software failure diagnosis right at the moment of a software failure and provide programmers a detailed diagnosis report regarding the occurred failure, including bug type, bug location, likely root cause, fault propagation chain, failure-triggering input, failure-triggering execution environment, potential temporal fixes, etc, without violating user?s privacy concerns or imposing large overhead during normal execution. To achieve the ambitious goal, the proposed research tightly integrates innovations from multiple layers: (1) Low-overhead operating and run-time system support to capture the failure moment without imposing large overhead during normal execution. (2) A novel, extensible, customizable, human-like failure diagnosis protocol. (3) Novel program analysis techniques that are specifically designed for on-site failure diagnosis. (4) Leverage existing and emerging hardware support and simple hardware extensions to reduce overhead.(5) A library-based API to allow applications to control or customize the diagnosis process if necessary.
|
1 |
2008 — 2013 |
Adve, Sarita Adve, Vikram (co-PI) [⬀] Zhou, Yuanyuan (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cpa-Csa-T: Low Cost and Comprehensive Hardware Reliability @ University of Illinois At Urbana-Champaign
Hardware reliability is becoming an increasing concern in the late CMOS era. Components in shipped chips will fail for many reasons, requiring mechanisms to detect, diagnose, recover from, and repair/reconfigure around these failed components so that the system can provide reliable operation. The pervasiveness of the problem across a broad market demands low-cost and general reliability solutions that can be deployed in general-purpose, commodity systems running applications with varying reliability requirements. Traditional reliability solutions involving excessive redundancy are too expensive, as are piecemeal solutions that address individual failure modes. This work proposes a full system solution that aims to provide a common framework for error detection, diagnosis, recovery, and repair/reconfiguration for a variety of hardware failure modes, with a customizable reliability vs. overhead tradeoff.
Two key high-level observations motivate the approach. First, the hardware reliability solutions need handle only the device faults that propagate through higher levels of the system and become observable to software. Second, in spite of the reliability threat, fault-free operation remains the common case and must be optimized, possibly at the cost of increased overhead once a fault is detected. The proposed system therefore detects faults by watching for anomalous software behavior (or symptoms of faults), using novel zero to low-cost hardware and software monitors. After a fault is detected, it invokes an innovative, but potentially expensive, procedure for diagnosing the fault source to enable reconfiguration/repair (in the case of hard faults). For recovery, it relies on a checkpoint/replay mechanism, including pure hardware and hybrid software assisted recovery depending on detection latency. Coordinating all of the above is a thin firmware layer that provides flexibility and customizability. A major component of the work is a much needed formulation and validation of microarchitecture level fault models, required to drive high-level reliability solutions.
|
1 |
2010 — 2014 |
Adve, Sarita Adve, Vikram (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Denovo: Rethinking Hardware For Disciplined Parallelism @ University of Illinois At Urbana-Champaign
Designing parallel systems that are scalable, low-cost, and power efficient, and yet easily programmable, is arguably one of the biggest challenges facing the computing industry today. This proposal describes DeNovo, a hardware architecture and framework that rethinks shared memory system design from the ground up to take advantage of long term trends in disciplined parallel software. It takes the stance that, if shared memory multicore systems with hundreds of cores are to become widely used, programming languages and environments must evolve to enforce highly disciplined programming practices that greatly simplify the programmer's view of shared memory. Such languages must restrict shared memory interactions, enforcing data-race-freedom and determinism-by-default. Moreover, disciplined programming models communicate extensive information about shared memory access patterns (so the discipline can be enforced). Exploiting the parallelism discipline and the communicated information can enable far simpler and more efficient hardware design than possible today.
DeNovo proposes an extensive redesign of the memory hierarchy based on three ideas. First, the coherence protocol can be vastly simplified by taking advantage of the absence of software races to virtually eliminate races from the protocol and greatly reduce the number of hidden protocol states. Second, DeNovo uses application-level data sharing granularity (rather than software-oblivious cache lines) as the organizing principle for addressing, communication, and coherence granularities. Third, DeNovo uses more efficient, point-to-point communication (close to explicit message passing) even for shared memory programs, by minimizing indirections through the directory and exploiting information about sharing granularity for bulk data transfers. These changes will simultaneously simplify the hardware design, reduce power consumption, and improve performance. Such a solution is highly unlikely without a fundamental rethinking of the memory system design, but is required to continue to reap the benefits of Moore's law.
|
1 |
2013 — 2017 |
Adve, Sarita Adve, Vikram [⬀] Rutenbar, Robin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Programmability, Portability, Performance and Energy Efficiency For Heterogeneous Systems @ University of Illinois At Urbana-Champaign
To maximize energy efficiency, future mobile devices will include a diverse range of hardware, such as large and small general-purpose processor cores, vector units, graphics processing units (GPUs), digital signal processors (DSPs), and semi-custom and custom accelerator cores. This "heterogeneity" could power a new wave of innovation in mobile computing but is blocked by several fundamental challenges. Some of the biggest challenges are that such heterogeneous systems are highly challenging to program; that it is very difficult for software applications that use the diverse hardware to be portable across different mobile devices; that the memory systems in these devices are inflexible and inefficient; and that the semi-custom and custom accelerators are poorly integrated with the rest of the memory system and the programming environments.
A key insight behind this project is that a carefully designed hardware abstraction layer --- a "Virtual Instruction Set" --- that abstracts away the differences in parallelism and memory subsystems across the different compute units can provide a framework in which all of the above interrelated problems can be solved extremely effectively. The project is developing a framework called Virtual Instruction Set Computing that uses this approach to address the above challenges. The framework uses just two or three models of parallelism and a uniform, rich model of communication to capture the full spectrum of heterogeneous hardware. The hardware memory architecture supports specialized memory sub-systems and novel memory optimizations customized for those sub-systems, while compilers partition the memory used by applications to make use of these partitions; together, these specialization techniques will provide an order of magnitude improvement in memory efficiency. Semi-custom accelerators for the key domain of Machine Learning are driving new programming and memory system design techniques to integrate and use semi-custom accelerators in such systems. The overall research builds on the widely used LLVM virtual instruction set and compiler infrastructure (previously developed by members of this research team), which are already widely used in industry, enhancing the potential for technology transfer from this work. If this project is successful, it can enable far more powerful mobile phones, tablets, and other such devices, and far more advanced software applications that can make full use of the rich capabilities of these devices.
|
1 |
2013 — 2017 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Software-Driven Hardware Resiliency @ University of Illinois At Urbana-Champaign
Moore's law continues to provide abundant devices on chip, but they are increasingly subject to failures from many sources. The hardware reliability problem is expected to be pervasive, affecting markets from embedded systems to high performance computing. There is an urgent need for research to address this problem with extremely low overheads in area, performance, and power (precluding traditional redundancy based solutions). Recently, researchers have proposed a software-driven hardware reliability solution that handles only the device faults that become visible to software and cause anomalous software behavior. This line of work has been quite successful in detecting most faults at extremely low cost. Unfortunately, some hardware faults escape detection by the proposed anomaly monitors, resulting in silent data corruption or SDC. These remaining few SDCs have been the Achilles heel of the software-driven hardware resiliency approach and a hindrance to widespread adoption. The proposed research seeks to overcome this obstacle.
The research includes methodological innovations that can determine application sites vulnerable to SDCs within a practical workflow and resiliency solution that uses this information to develop low cost detection and recovery techniques to mitigate the impact of SDCs. It builds on a recent resiliency analysis tool developed by the Principle Investigator's group called Relyzer. The key insight is that instead of trying to determine the outcome of each fault site, Relyzer can seek to determine which application sites will produce equivalent outcomes. This enables pruning a large number of sites and focusing on fault injections for just one site per equivalence class, resulting in significant reduction in resiliency evaluation time. In addition to providing a list of SDC vulnerable instructions, Relyzer also provides a wealth of information on why they are vulnerable. This motivates the use of inexpensive application-specific detectors that exploit this information. However, Relyzer has several limitations in speed, accuracy, and generality, precluding its use in a practical workflow. This research will first develop new techniques to address these limitations and to implement them in a tool. Second, this research will explore systematic techniques to develop practical resiliency solutions that exploit the wealth of fault-propagation information exposed by Relyzer. It will develop systematic low-cost detection and recovery techniques, with quantifiable tradeoffs between resiliency and performance overheads, that can be incorporated in a practical workflow for real applications. If successful, this work will address a key challenge in meeting the expectations of Moore's law performance for a wide variety of societal advances. Besides the research benefits, it will provide a concrete tool for practical full application resiliency analysis and will also train graduate students.
|
1 |
2016 — 2019 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Hardware-Software Co-Designed Coherence: a Complete Coherence Solution For Performance-, Energy-, and Complexity-Efficiency @ University of Illinois At Urbana-Champaign
As the benefits from transistor scaling slow down, future performance increases in computing systems will increasingly rely on architectural advances. Today processors use parallelism and increasing amounts of specialization to provide this performance growth. An efficient memory hierarchy is key to achieving the full potential of both of these techniques. The coherence protocol and memory consistency model are at the heart of the complexity-, performance-, and energy-efficiency of the memory hierarchy. Unfortunately, across a variety of systems, coherence protocols and consistency models continue to struggle to obtain an appropriate balance between complexity, performance, and energy consumption. Recently, there has been work on hybrid hardware-software co-designed protocols, exemplified by the DeNovo protocol, which takes a different approach, combining the best of pure hardware and pure software protocols. The key insight is that if software is disciplined, then it is possible to design more efficient hardware. Multiple versions of the DeNovo system have successively relaxed the software restrictions. Introducing such a new technology in classrooms to both graduate and undergraduate students will better prepare them for future memory trend and challenges. Disseminating the results of this research via publications, seminars, tutorials etc. will bring new technology awareness to the community and create more synergy among academia and industry. Prior work has established the potential for DeNovo as a general-purpose system with significant advantages over the state-of-the-art. However, this work of necessity has been limited to simple workloads. Consideration of DeNovo as a viable system for widespread industrial adoption requires demonstrating an integrated system that can run complex workloads (e.g., operating systems) and legacy binaries. This project addresses the remaining research issues to achieve this goal. Although this work is driven by considerations for hybrid hardware-software coherence protocols, the intellectual contributions extend beyond those protocols as well; e.g., integrated support for efficient, coherent data accesses using a variety of disciplines ranging from completely unstructured to highly structured, statically analyzable accesses; a systematic exploration of relaxed atomics, a widely accepted difficulty in current memory consistency models; and understanding concurrent data structures and system code in a coherence neutral way.
|
1 |
2021 — 2024 |
Adve, Sarita |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ccri: New: An Open End-to-End Extended Reality System Infrastructure: Enabling Domain-Specific Edge Systems Research @ University of Illinois At Urbana-Champaign
Extended reality (XR), which encompasses virtual, augmented, and mixed reality (AR, VR, MR) and also referred to as immersive computing, is expected to pervade most human endeavors --- it will affect the way we teach, conduct science, practice medicine, entertain ourselves, train professionals, interact socially, and more. Many have said it will be the next interface for most of computing. While current XR systems exist today, they are far from providing a tetherless experience approaching perceptual abilities of humans. There is a gap of several orders of magnitude between what is needed and achievable in performance, power, and usability, requiring deep innovations from system researchers. At the same time, with the end of Dennard scaling and Moore's law, application-driven specialization or domain-specific computing has emerged as a key architectural technique to meet the requirements of emerging applications, Computer architects have responded with an explosion of research on highly efficient accelerators, targeting machine learning and other domains. To truly achieve the promise of efficient domain-specific computing in general and for the XR domain in particular, however, requires systems researchers to broaden their portfolio beyond specialization for individual accelerators. Instead, researchers must develop the science for specializing for a domain-specific system which may consist of multiple sub-domains requiring multiple parallel heterogeneous accelerators that interact with each other to collectively meet end-user demands. A key obstacle to domain-specific systems research for XR is that (until our work) there have been no open source benchmarks or testbeds covering the entire XR workflow to drive such research.
This project develops an open source end-to-end infrastructure for XR devices. It builds on an initial research prototype, ILLIXR (Illinois Extended Reality Testbed). The system is being designed to contain state-of-the-art components for a complete XR workflow, an extensible runtime that orchestrates the scheduling of these components, and extensive telemetry support to measure performance, power, and end-to-end quality of experience metrics. The system is extensible and supports a variety of operating systems (e.g., Linux, Android) and heterogeneous platforms (e.g., NVIDIA Jetson, Qualcomm Snapdragon, etc.), sensors (e.g., cameras, IMUs, etc.), and various XR applications. It enables new research opportunities in all parts of the computing stack to tackle end-to-end XR system innovations that were previously not possible. Systems researchers benefit from using the infrastructure to drive new research in post-Moore domain-specific systems, in the areas of computer architecture, programming languages, compilers, runtime systems, and security and privacy. The end-to-end infrastructure drives new techniques in co-designed systems that are optimized for end-to-end user experiences. For applications, XR encompasses multiple sub-domains such as computer vision, robotics, graphics, signal processing, and machine learning. Algorithms researchers in these areas can prototype and test new algorithms that are optimized for end-to-end system efficiencies without worrying about implementing the rest of the stack, and XR researchers in particular will be able to design systems optimized for the end-to-end user experience. This work addresses two of the most important problems in computing -- dealing with the end of Moore's law and designing systems that achieve the potential of immersive computing. Both have the potential for tremendous impact on society at large.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2022 — 2027 |
Adve, Sarita Adve, Vikram (co-PI) [⬀] Misailovic, Sasa Fletcher, Christopher Mittal, Radhika |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Pposs: Large: Scalable Specialization in Distributed Edge-Cloud Systems – the Extended Reality Case @ University of Illinois At Urbana-Champaign
This project will develop design methodologies for a scalable, domain-specific, heterogeneous, distributed edge/cloud system with stringent constraints on latency, energy, thermal power, computational requirements, and size. The work will use a distributed multiparty augmented / virtual / mixed reality (collectively, extended reality or XR) experience as a target parallel and distributed application with challenging quality-of-experience goals, scalability requirements, design constraints, and diverse and fast-evolving algorithmic components. There are orders-of-magnitude gaps between desirable design goals and today's state-of-the-art, making this a long-lived multidisciplinary research challenge. The project brings together work in Computer Architecture, Programming Languages and Compilers, Systems, Security and Privacy, and Accuracy and Correctness. It will result in innovations that cut across the system stack to improve quality-of-experience scalability with the number of users and devices and device resources, XR device performance scalability with hardware parallelism, and design methodologies scalability with system complexity. <br/><br/>The project will disseminate its research results through considerable open-source software artifacts, building on the team’s previously released ILLIXR system (the first open-source end-to-end single-device XR system), in addition to publications in top venues and talks in academic and industry venues. High-performance, energy-efficient distributed applications such as multiparty XR (and numerous others) have the potential for transformative impact on a vast number of societal activities such as medicine, education, entertainment, manufacturing, science, and more. The team will work in close collaboration with industry partners for direct technology-transfer avenues. The PIs will continue their past record of strong involvement of undergraduates, women, and minorities in research; leadership in establishing the CARES movement; and other efforts to broaden participation in computing.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |