1985 — 1991 |
Horowitz, Mark |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Presidential Young Investigator Award: Computer Tools For Vlsi
This research program explores the use of computer tools for VLSI. One problem being studied is expediting the synthesis-verification cycle inherent in computer-aided design. In all large designs, a substantial portion of effort is made implementing and verifying design changes. Although tools are available to aid in checking the design, these tools start the verification from scratch each time they are run. By designing a verification tool that accepts changes to the design interactively, it is possible to reduce the time and cost required to design integrated circuits. In addition to working on methods to improve tool performance, the research is looking at tools to improve chip performance. Using background information in both circuit design and CAD tools, methods are investigated to incorporate design knowledge initially into analysis tools, with the ultimate goal of incorporating these ideas into synthesis tools.
|
1 |
2004 — 2007 |
Horowitz, Mark Olukotun, Oyekunle [⬀] Kozyrakis, Christoforos (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Extending the Limits of Large-Scale Shared Memory Multiprocessors
The objective of this research is to substantially improve the productivity of programmers writing applications for petaflop-scale systems by using programmer defined light-weight transactions as the single abstraction for expressing parallelism, delineating communication, reasoning about memory consistency, providing failure recovery, and allowing performance optimization. Transactions as the central abstraction for designing and programming parallel systems leads to a shared memory programming and memory coherence model called Transaction Coherence and Consistency (TCC). Transactions simplify parallel programming by providing a way of writing correct shared-memory programs without threads, locks and semaphores. TCC systems provide high performance communication and synchronization with support for hardware mechanisms that can keep memory coherent and consistent based on programmer-defined transactions. To achieve the research objective, this research program will focus on five activities. First, the researchers will develop new abstractions that use transactions to provide a shared memory programming model that makes it much easier to analyze and optimize application performance. Second, the researchers will develop performance monitoring systems that make use of transactions to detect performance bottlenecks and to provide intuitive feedback to programmers. Third, the researchers will use the transaction based programming model to implement compiler-based static and dynamic feedback-directed optimizations that automatically detect and eliminate performance bottlenecks and extend the scalability of transaction coherency to 105 processors. Fourth, the researchers will use transactions to optimize the performance of parallel storage I/O. Finally, the researchers will develop simulation and emulation technology that will enable us to experiment with petaflop-scale systems that support light-weight transactions before they are available. Broader Impacts The broad impact of this research is to use transaction-based parallel programming to educate and enable a new class of parallel software developers who can implement parallel software with the same facility that sequential software is written today. Enabling parallel software development will be critical to advancing computing performance from desktop applications to large-scale scientific and commercial applications. While parallel processing has been essential for large-scale machines for a while, recent announcements by Intel, AMD and IBM demonstrate that it will soon be critical for desktop applications as well. To educate students, other researchers, and industry about the benefits of transaction-based parallel programming, we will incorporate transactional programming concepts in the parallel programming curriculum and make transaction-based applications available to the wider scientific community. The researchers expect that releasing a suite of optimized transaction-based applications along with simulation technology will be instrumental in encouraging other researchers to experiment with and explore the benefits of transactions. To further promote the use of transaction-based parallel programming we will organize a tutorial or workshop at a major scientific computing conference that will cover the principles and experience of programming with transactions.
|
1 |
2006 — 2010 |
Levoy, Marc [⬀] Horowitz, Mark |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Active Computational Imaging Using a Dense Array of Projectors and Cameras
Although arrays of cameras have existed in various forms for more than a century, arrays of projectors have been limited by available technology to small numbers of devices. As the number of projectors that can be feasibly assembled into an array increases, one can begin to treat the illumination produced by the array as a light field - radiance as a function of ray position and direction in free space. The light field has been studied abstractly by numerous researchers; however, physical systems for generating light fields have been severely limited in resolution by the cost and size of projectors. The investigators are building an array of 128 miniature, SVGA-resolution video projectors, and will interleave these projectors with an existing array of 128 video cameras, thereby producing a device that can both record and generate light fields - a fundamentally new capability. These new techniques will yield new methodologies for image-based modeling, inspection and motion capture and will have applications in areas such as in entertainment, archaeology, and modeling and simulation for training and mission rehearsal.
Using the existing camera array the investigators have studied several types of high-performance imaging, including high-speed videography, synthetic aperture photography, and tiled panoramic imaging. Using the new hybrid array,they will explore two additional problems:
(1) Measuring the 3D shape of an object from every direction at once, meaning capturing all sides of the object in parallel. No existing technology can do this, at least not at optical wavelengths. The goals are near-instantaneous shape capture of a moving object and full-shape motion capture of moving objects.
(2) Illumination of physical objects non-photorealistically, to enhance their appearance as seen by a human observer. Examples include recoloring, cloaking, or illumination that is everywhere-perpendicular or everywhere-grazing to the surface of the object.
|
1 |
2010 — 2014 |
Levoy, Marc [⬀] Horowitz, Mark |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Medium: Collaborative Research: Frankencamera - An Open-Source Camera For Research and Teaching in Computational Photography
Computational photography refers broadly to sensing strategies and algorithmic techniques that enhance or extend the capabilities of digital photography. Representative techniques include high dynamic range (HDR) imaging, flash-noflash imaging, panoramic stitching, and refocusable photography. Although interest in computational photography has steadily increased among graphics and vision researchers, progress has been hampered by the lack of a portable, programmable camera platform with enough image quality and computing power to be used outside the laboratory, i.e. for everyday photography. Similarly, courses in computational photography are offered in dozens of universities nationwide. However, none of these courses provide students with a camera on which they can implement the algorithms currently being published in the literature.
To address these two problems, we are building an open-source camera platform (called Frankencamera) that is portable, self-powered, connected to the Internet, and accommodates SLR-quality lenses and sensors. We also describe a software architecture based on Linux, and an API with bindings for C++, that permits control and synchronization of camera functions at the microsecond time scale. Our API includes pre-capture functions like metering and focusing, an image post-processing pipeline, a user interface toolkit for the viewfinder, and support for current and future I/O devices. Our plan is to distribute this platform at minimal cost to researchers and university instructors nationwide, using the computational photography courses they already teach as a natural distribution vehicle. Instructors will apply to us to be part of this outreach program, and a standing committee will evaluate these applications. Our long-term goal is to spur creation of a community of photographer-programmers who write plug-ins and apps for cameras.
|
1 |
2014 — 2017 |
Horowitz, Mark Kozyrakis, Christoforos [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Energy Efficient Memory Systems
The goal of this project is to develop energy efficient systems. Improving energy efficiency is a defining challenge for information technology and the prerequisite to increasing the capabilities of all computing systems, from smartphones to warehouse-scale data-centers. Most research in this area has focused on energy efficient computing, using specialized cores or near-threshold voltage circuits. To achieve end-to-end energy efficiency, the on-chip and off-chip memory system that feeds cores with data and instructions must also be optimized. The memory system includes large, leaking structures and communication operations that introduce energy overheads orders of magnitude higher than the overheads of compute operations.
This project proposes a holistic approach towards energy efficient memory systems that rethinks memory system architecture, dynamic runtime management, and circuit design. At the architecture level, it will optimize for emerging, data-centric applications with limited temporal locality by placing computing close to the memory structures so that energy intensive communication is minimized. It will also explore architectural support for specialization in the memory system, such as engines for specialized prefetch, data transformations, and data distribution. At the runtime management level, it will investigate scalable scheduling algorithms that minimize energy usage in the memory system by maximizing temporal and spatial locality and the use of on-chip and memory-side accelerators. At the circuit design level, it will aggressively optimize the energy consumption of the internal structures of memory chips and memory stacks for the dominant access patterns after efficient runtime management. Finally, this project will create tools for joint exploration of the architecture-management-scheduling space in order to identify Pareto optimal memory systems for different levels of performance.
|
1 |
2015 — 2018 |
Horowitz, Mark |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ccss: Emulating Mixed-Signal Vlsi Systems
Proposal 1509126 PI: Mark Horowitz Institution: Stanford University Title: Emulating Mixed-Signal VLSI Systems
Project Goals: To validate combined analog and digital VLSI chips, we create functional models of the analog components which run on digital logic emulators.
Nontechnical Technology scaling created an environment of ubiquitous computing: we now have singing cards, and computer controlled cars, and soon all these devices will be connected to the net. The Internet of Things is one of the most promising areas in computing research today. Yet there is a fundamental problem that must be addressed if we are going to fully realize the promise of a fully connected world: how to validate the complex mixed-signal integrated circuits that are the foundation of this revolution. Since these chips tightly couple analog and digital blocks, we need to validate the entire analog/digital system together. Attacking the mixed-signal validation issue is a key challenge that must be addressed to enable this new future, and it is the topic of this research.
Technical: Recent work has started to address this issue, by creating models of the analog blocks that can be run in a digital validation environment. This research extends this prior work in two way. First it will address some of the limitations of the current analog modeling work to both make it more general (handle more types of situations that occur in real circuits), and codify the procedure for creating these models. This first extension will make it much easier to use the current modeling technique. Unfortunately chip systems are so complex today that software based simulation is often too slow for system validation. As a result most companies rely on hardware emulation platforms for validation. Thus our second goal is to further extend our analog function modeling by creating a system that can map these functional models onto a platform currently used for hardware emulation. These emulation platforms either consist of field programmable logic arrays (FPGAs) or custom built emulation chips. We will explore two approaches to analog functional mapping directly mapping all analog blocks to FPGA logic, and building an analog model interpreter and choose the most cost effective approach. Since the direct mapped approach can maintain a fixed ratio between the rates of the clocks used for the analog and digital blocks, it might make sense to build simple oversampled analog models. Here the challenge is in mapping the required computation to the FPGA. Most analog functions can be mapped to digital blocks, implemented with lookup tables (LUTs) and filter/DSP blocks. In the interpreter approach, we will explore building an analog model evaluation engine that will emulate a model when its inputs change. If the analog signals change more slowly than the main FPGA clock, we might have many cycles for each model evaluation, making this a more efficient approach. In addition, it is possible that we can build many fewer model evaluation engines than analog functional models.
|
1 |
2015 — 2018 |
Boneh, Dan (co-PI) [⬀] Engler, Dawson (co-PI) [⬀] Winstein, Keith (co-PI) [⬀] Horowitz, Mark Levis, Philip [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Synergy: Collaborative: Cps-Security: End-to-End Security For the Internet of Things
Computation is everywhere. Greeting cards have processors that play songs. Fireworks have processors for precisely timing their detonation. Computers are in engines, monitoring combustion and performance. They are in our homes, hospitals, offices, ovens, planes, trains, and automobiles. These computers, when networked, will form the Internet of Things (IoT). The resulting applications and services have the potential to be even more transformative than the World Wide Web. The security implications are enormous. Internet threats today steal credit cards. Internet threats tomorrow will disable home security systems, flood fields, and disrupt hospitals. The root problem is that these applications consist of software on tiny low-power devices and cloud servers, have difficult networking, and collect sensitive data that deserves strong cryptography, but usually written by developers who have expertise in none of these areas. The goal of the research is to make it possible for two developers to build a complete, secure, Internet of Things applications in three months.
The research focuses on four important principles. The first is "distributed model view controller." A developer writes an application as a distributed pipeline of model-view-controller systems. A model specifies what data the application generates and stores, while a new abstraction called a transform specifies how data moves from one model to another. The second is "embedded-gateway-cloud." A common architecture dominates Internet of Things applications. Embedded devices communicate with a gateway over low-power wireless. The gateway processes data and communicates with cloud systems in the broader Internet. Focusing distributed model view controller on this dominant architecture constrains the problem sufficiently to make problems, such as system security, tractable. The third is "end-to-end security." Data emerges encrypted from embedded devices and can only be decrypted by end user applications. Servers can compute on encrypted data, and many parties can collaboratively compute results without learning the input. Analysis of the data processing pipeline allows the system and runtime to assert and verify security properties of the whole application. The final principle is "software-defined hardware." Because designing new embedded device hardware is time consuming, developers rely on general, overkill solutions and ignore the resulting security implications. The data processing pipeline can be compiled into a prototype hardware design and supporting software as well as test cases, diagnostics, and a debugging methodology for a developer to bring up the new device. These principles are grounded in Ravel, a software framework that the team collaborates on, jointly contributes to, and integrates into their courses and curricula on cyberphysical systems.
|
1 |
2016 — 2019 |
Horowitz, Mark Poulson, Jack (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Prism: Platform For Rapid Investigation of Efficient Scientific-Computing & Machine-Learning
Today's systems demand acceleration in processing and learning using massive datasets. Unfortunately, because of poor energy scaling and power limits, performance and power improvements due to technology scaling and instruction level parallelism in general-purpose processors have ended. It is well known that full custom, application-specific hardware accelerators can provide orders-of-magnitude improvements in energy/op for a variety of application domains. Therefore, there is a special interest in systems that can optimize and accelerate the building blocks of machine learning and data science routines. Many of these building blocks share the same characteristics as building blocks of high performance computing kernels working on matrices.
Such application specific solutions rely on joint optimization of algorithms and the hardware, but cost hundreds of millions of dollars. PRISM (Platform for Rapid Investigation of efficient Scientific- computing and Machine-learning accelerators) is proposed to amortize these costs. PRISM enables application designers to get rapid feedback about both the available parallelism and locality of their algorithm, and the efficiency of the resulting application/hardware design. PRISM platform consists of two coupled tools that incorporate design knowledge at both the hardware and algorithm level. This knowledge enables the tool to give application designers the ability to quickly evaluate the performance of their applications on the proposed/existing hardware, without the application designer needing to be an expert at hardware or algorithms. This platform will leverage tools created from the team's prior research.
Initially, these tools will be used to create an efficient solution for each application, followed by a comparison of the resulting hardware designs. The possibility of creating platforms that span multiple classes of algorithms can then be explored. Finally, a comparison of these new architectures to existing heterogeneous architectures with GPUs and FPGAs will be made, to gain understanding about what modifications are necessary for these architectures to achieve higher levels of efficiency when supporting these classes of algorithms. The work on key applications will lead to better insight about the computation and communication intrinsic to these computations, and provide algorithms for these applications that will be effective on conventional and new architectures.
|
1 |