2007 — 2015 |
Wang, Zhenlin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Modeling Data Locality For Next Generation Systems @ Michigan Technological University
Although the introduction of multi-core systems has increased overall processor speed without significantly increasing CPU clock rates, a significant speed disparity remains between the CPU core and main memory. Multi-level caches have long been used to bridge this gap. Conventional cache design favors applications with good locality. The community's understanding of locality, however, is more qualitative than quantitative. A quantitative understanding of locality is essential to exploit memory hierarchy and achieve maximal performance. The new generation of multi-core systems adds the challenge of quantifying data locality for multi-threaded programs.
This research models data locality as a function of three parameters: data size, path history, and thread count, relying on close cooperation among the compiler, the profiler, and hardware just-in-time monitoring. The compiler provides a global view of the program. The profiler, using traces, has a view of the run-time behavior of a program, but this view is based on only a limited number of training inputs. Although the hardware's view is run specific, its prediction, often depending on hardware buffers, is not always effective due to buffer size limitations. The cooperative model being developed combines the advantages of static analysis and run-time sampling and profiling, providing an accurate view of program locality for both single-threaded and multi-threaded programs. Given this model the project explores memory system performance including managing data movement in conventional multi-level cache as well as non-uniform cache architecture (NUCA) caches, reducing the memory traffic of a state-of-the-art hardware-only region prefetcher, and improving spatial locality of Java programs.
|
0.937 |
2008 — 2010 |
Carr, Steven Wang, Zhenlin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cpa-Cpl: Feedback-Directed Resource Management in Virtual Private Machines @ Michigan Technological University
In modern computers with multiple CPUs per chip, multiple levels of computer memory are shared among the CPUs. This sharing may cause varying response times for computer applications, potentially affecting their usability. For instance, a video streaming application that has unpredictable response times due to interference from other applications may be essentially unusable. Thus, the computer system must attempt to allocate shared memory resources equitably to increase response consistency. Virtual Private Machines (VPMs), in which shared computer resources are partitioned to give the illusion that a program is running on a single physical CPU running in isolation, are one method to provide response consistency. This research involves the development of software modeling techniques to predict the memory requirements of applications run on modern computers and using those predictions to guide memory resource allocation to provide consistent, predictable performance.
The focus of this research consists of the development of reuse-distance-based memory locality analysis for multi-threaded applications to predict the resource requirements of those applications. This research will apply the new memory locality models to manage the cache and bandwidth requirements for multi-threaded applications run on a VPM within a multi-programmed environment. Specifically, this research will examine reuse distance for Partitioned Global Address Space (PGAS) applications written in Unified Parallel C (UPC). The research will investigate the effects of the number of threads used per application on per-thread data size and the effect on reuse distance of shared data caused by thread interaction. The result will be novel solutions to memory locality analysis for multi-threaded applications while allowing the compiler to predict and specify the resources needed to give multi-threaded applications targeted performance.
|
0.937 |
2008 — 2010 |
Seidel, Steven [⬀] Carr, Steven Wang, Zhenlin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Performance Model For Partitioned Global Address Space Languages @ Michigan Technological University
Writing a program to solve a large scale computational problem on a supercomputer is much more difficult than writing a program to solve the same problem on an ordinary computer. DARPA's High Productivity Computing Systems program has focused attention on a new family of parallel programming languages that reduce the effort required to develop supercomputer programs. One of the most widely used of these languages is Unified Parallel C (UPC). This research project develops a way to predict how long UPC programs will run so that programmers will know beforehand whether their programs will run efficiently. Predicting program run times reduces the need for the trial-and-error development of programs. This saves the time of the programmers and the supercomputer, both expensive commodities.
This project advances work on a performance model for UPC implementations that run on clusters. A model that describes the remote reuse distance for objects in the software cache is developed. This is a natural extension of local reuse distance for hardware cache. An analysis of remote reuse distance yields functions that predict cache behavior and its impact on performance. The effects of problem size, blocking factor, and the number of threads are included in the model. Common benchmarks, such as the NAS parallel benchmarks, are used to validate the model. The addition of a model of software cache behavior provides programmers with more accurate information about overall run times and suggests ways to improve software cache management for UPC and other languages in its family.
|
0.937 |
2014 — 2017 |
Wang, Zhenlin Brown, Laura |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Collaborative Research: Adaptive Memory Resource Management in a Data Center - a Transfer Learning Approach @ Michigan Technological University
Cloud computing has become a dominant scalable computing platform for both online services and conventional data-intensive computing (examples include Amazon's EC2, Microsoft's Azure, IBM's SmartCloud, etc.). Cloud computing data centers share computing resources among a large set of users, providing a cost effective means to allow users access to computational power and data storage not practical for an individual. A data center often has to over-commit its resources to meet Quality of Service contracts. The data center software needs to effectively manage its resources to meet the demands of users submitting a variety of applications, without any prior knowledge of these applications.
This work is focused on the issue of management of memory resources in a data center. Recent progress in transfer learning methods inspires this work in the creation of dynamic models to predict the cache and memory requirements of an application. The project has four main tasks: (i) an investigation into how recent advancements in transfer learning can help solve data center resource management problems, (ii) development of a dynamic cache predictor using on-the-fly virtual machine measurements, (iii) creation of a dynamic memory predictor using runtime characteristics of a virtual machine, and (iv) development of a unified resource management scheme creating a set of heuristics that dynamically adjust cache and memory allocation to fulfill Quality of Service goals. In tasks (i)-(iii), transfer learning methods are employed and explored to facilitate the transfer of knowledge and models to new system environments and applications based on extensive training on existing systems and benchmark applications. The prediction models and management scheme will be evaluated on common benchmarks including SPEC WEB and CloudSuite 2.0. The results of this research will have broad impact on the design and implementation of cloud computing data centers. The results will help improve resource utilization, boost system throughput, and improve predication performance in a cloud computing virtualization system. Additionally, the methods designed and knowledge they impart will advance understanding in both systems research and machine learning.
|
0.937 |
2016 — 2019 |
Wang, Zhenlin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Effective Sampling-Based Miss Ratio Curves: Theory and Practice @ Michigan Technological University
Caches, such as distributed in-memory cache for key-value store, often play a key role in overall system performance. Miss ratio curves (MRCs) that relate cache miss ratio to cache size are an effective tool for cache management. This project develops a new cache locality theory that can significantly reduce the time and space overhead of MRC construction and thus makes it suitable for online profiling. The research will influence system design in both software and hardware, as nearly every system involves multiple types of cache. The results can thus benefit a wide range of systems from personal desktops to large scale data centers. We will integrate our results into existing open source infrastructure for the industry to adopt. In addition, this project will offer new course materials that motivate core computer science research and practice.
The project investigates a new cache locality theory, applies it to several caching or memory management systems, and examines the impact of different online random sampling techniques. The theory introduces a concept of average eviction time that facilitates modeling data movement in cache. The new model constructs MRCs with data reuse distribution that can be effectively sampled. This model yields a constant space overhead and linear time complexity. The research is focused on theoretical properties and limitations of this model when compared with other recent MRC models. With this lightweight model, the project seeks to guide hardware cache partitioning, improve memory demand prediction and management in a virtualized system, and optimize key-value memory cache allocation.
|
0.937 |
2022 — 2025 |
Wang, Zhenlin Chen, Bo |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Satc: Core: Small: Hardware-Assisted Self-Repairing in Decentralized Cloud Storage Against Malicious Attacks @ Michigan Technological University
A decentralized cloud storage system eliminates the need of dedicated computing infrastructures by allowing peers which have spare storage space to join the network and to provide storage services. Compared to a conventional centralized cloud storage system, it can bring significant benefits including cheaper storage cost, better fault tolerance, greater scalability, as well as more efficient data storing and retrieval. While bringing immense benefits, the decentralized cloud storage system also raises significant security concerns, as storage peers are more likely to misbehave since they are hosted by individual users who are less reputable and less skillful in security. This project thus takes an essential step towards protecting the long-term integrity of the critical data outsourced to the emerging decentralized cloud. The project's novelties are 1) enabling a new self-repair concept in the decentralized cloud and 2) developing a hardware-assisted secure decentralized cloud storage system supporting the self-repair. The project's broader significance and importance includes protecting critical digital assets outsourced to the untrusted cloud, training graduate students, and reaching out to underrepresented minority students.<br/><br/>The project aims to develop the first hardware-assisted self-repairing decentralized cloud storage system against malicious attacks. It resolves a fundamental conflict between the requirement of long-term integrity guarantee and the lack of trust in a decentralized setting, by leveraging the trusted execution environment (TEE) and the flash translation layer (FTL). Especially, the following research tasks are conducted: 1) enabling secure self-repair in each storage peer by collaborating the TEE and the FTL; 2) enabling secure self-repair across multiple untrusted storage peers by leveraging the TEE; and 3) building a fully functioning secure decentralized cloud storage system with self-repair support.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.937 |
2022 — 2024 |
Kc, Dukka Nakamura, Issei (co-PI) [⬀] Wang, Zhenlin Jiang, Jingfeng Brown, Laura |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Acquisition of a Gpu-Accelerated Cluster For Research, Training and Outreach @ Michigan Technological University
This project will enable ground-breaking research at Michigan Technological University (Michigan Tech) by acquiring a high-performance computing cluster to be named DeepBlizzard. DeepBlizzard will accelerate scientific discoveries in basic research and technological innovations by addressing emergent and longer-term needs with broad societal impacts in multiple disciplines: chemistry, forestry, mathematics, physics, engineering (biomedical, mechanical, materials science), and computer science. DeepBlizzard will be utilized by over 125 users across 20 departments and 5 Colleges at Michigan Tech and by partners at North Carolina A&T University. DeepBlizzard will catalyze and accelerate research, enable dissemination of results, and expand opportunities for collaboration, thereby promoting the advancement of these diverse scientific domains. The project will also provide various training, teaching, and outreach activities to produce a highly trained and diverse technical workforce, including the next generation of scientists. Throughout its life, DeepBlizzard will serve as the epicenter of innovative research by enabling and supporting cross-disciplinary and collaborative research opportunities.<br/><br/>The DeepBlizzard high-performance computing cluster is designed by a team of experts from Computer Science, Physics, Chemistry, and Biomedical Engineering in coordination with Information Technology (IT) professionals. The instrument architecture is based on graphical processing unit (GPU) based accelerators. DeepBlizzard is configured to meet three major requirements: high-performance deep learning and inference; high-performance single, double, and mixed-precision calculations; and the ability to execute codes using high levels of parallelism. These requirements map to the needs of ongoing and proposed computational research endeavors at Michigan Tech. In addition, several outreach and training activities – developed in partnership with Michigan Tech’s existing NSF Research Experience for Undergraduates (REU) site, NSF/NSA GenCyber Camp, and other programs involving K-12, undergraduate, and graduate student, and historically marginalized groups in STEM – will provide seamless integration of research activities with outreach.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.937 |