2007 — 2008 |
Zhou, Yuanyuan Xanthos, Spiros |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sbir Phase I: Efficient Static Analysis Tools For Detecting Bugs in Large Software
This Small Business Innovation Research project investigates and explores the feasibility of commercializing bug detection tools to improve the quality and productivity of a variety of software developed by various industry segments. The tools are based on state-of-art data-mining tools under development at the University of Illinois. The proposed project will improve the accuracy, usability and robustness of the tools in order to make them more user-friendly and reliable.
The tools, once commercialized, can benefit a large market of IT departments in different business segments (IT, finance, government, entertainment, insurance, etc) to improve their software quality and productivity and reduce the software development cost via automatic bug detection. In contrast to traditional manual efforts that usually takes a programmer 1-2 weeks to detect a bug, the proposed tools can easily identify hundreds of bugs in millions lines of code automatically in 1-2 hours. In addition to detecting software bugs, the proposed tools could also be used to detect copyright infringment and plagiarism from open source or other software.
|
0.912 |
2007 — 2012 |
Adve, Sarita (co-PI) [⬀] Adve, Vikram (co-PI) [⬀] Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Pdos: Online Production-Run Software Failure Diagnosis At the User Site @ University of Illinois At Urbana-Champaign
As software systems have grown in size, complexity and cost, it has become increasingly difficult to deliver software bug-free to end users, which result in many software failures during production runs at the user site. While much work has been conducted on software failure diagnosis, most previous work focuses on off-site diagnosis (i.e. diagnosis at the development site with involvement of programmers) and thereby is insufficient to diagnose production-run software failure at the user site.
To effectively address production-run failures, we propose a novel approach that automatically performs on-site software failure diagnosis right at the moment of a software failure and provide programmers a detailed diagnosis report regarding the occurred failure, including bug type, bug location, likely root cause, fault propagation chain, failure-triggering input, failure-triggering execution environment, potential temporal fixes, etc, without violating user?s privacy concerns or imposing large overhead during normal execution. To achieve the ambitious goal, the proposed research tightly integrates innovations from multiple layers: (1) Low-overhead operating and run-time system support to capture the failure moment without imposing large overhead during normal execution. (2) A novel, extensible, customizable, human-like failure diagnosis protocol. (3) Novel program analysis techniques that are specifically designed for on-site failure diagnosis. (4) Leverage existing and emerging hardware support and simple hardware extensions to reduce overhead.(5) A library-based API to allow applications to control or customize the diagnosis process if necessary.
|
0.951 |
2008 — 2013 |
Zhou, Yuanyuan Caesar, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nets-Neco: Collaborative Research: Fixing the Reliability Problem in Network Software From Its Root @ University of Illinois At Urbana-Champaign
Most of the Internet's complexity resides in software running on Internet routers. Bugs in this software are a highly critical problem, leading to a number of recent high-profile attacks and outages, and are increasingly becoming a bottleneck in building highly reliable networks. The PIs are designing and evaluating techniques to make the Internet resilient to software bugs. Their approach consists of two key components. First, they are building a highly reliable single instance of a network router. This involves performing a characteristic study of bugs in router software, by using static and dynamic code analysis and by taxonomizing publicly disclosed vulnerabilities. They also apply and extend techniques such as rollback, reordering inputs, microreboots, and automated debugging to construct a software router resilient to implementation bugs. Second, the PIs are developing and building an architecture for highly-available bug-resistant networks. Their design leverages the principle of "control and data diversity", which simultaneously runs multiple functionally-equivalent instances of a piece of software or data. Each instance is changed from the others in a way that makes it unlikely multiple copies will simultaneously undergo the same bug, for example by randomizing the execution environment, having each instance be responsible for a subset of routes, or by having different programmers implement each instance. In addition to producing designs and algorithms that enable these networks, the PIs will also make available tools and implementations to enable their use. Successful completion of this project will significantly improve the Internet's ability to avoid and recover from failures.
|
0.951 |
2008 — 2013 |
Adve, Sarita [⬀] Adve, Vikram (co-PI) [⬀] Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cpa-Csa-T: Low Cost and Comprehensive Hardware Reliability @ University of Illinois At Urbana-Champaign
Hardware reliability is becoming an increasing concern in the late CMOS era. Components in shipped chips will fail for many reasons, requiring mechanisms to detect, diagnose, recover from, and repair/reconfigure around these failed components so that the system can provide reliable operation. The pervasiveness of the problem across a broad market demands low-cost and general reliability solutions that can be deployed in general-purpose, commodity systems running applications with varying reliability requirements. Traditional reliability solutions involving excessive redundancy are too expensive, as are piecemeal solutions that address individual failure modes. This work proposes a full system solution that aims to provide a common framework for error detection, diagnosis, recovery, and repair/reconfiguration for a variety of hardware failure modes, with a customizable reliability vs. overhead tradeoff.
Two key high-level observations motivate the approach. First, the hardware reliability solutions need handle only the device faults that propagate through higher levels of the system and become observable to software. Second, in spite of the reliability threat, fault-free operation remains the common case and must be optimized, possibly at the cost of increased overhead once a fault is detected. The proposed system therefore detects faults by watching for anomalous software behavior (or symptoms of faults), using novel zero to low-cost hardware and software monitors. After a fault is detected, it invokes an innovative, but potentially expensive, procedure for diagnosing the fault source to enable reconfiguration/repair (in the case of hard faults). For recovery, it relies on a checkpoint/replay mechanism, including pure hardware and hybrid software assisted recovery depending on detection latency. Coordinating all of the above is a thin firmware layer that provides flexibility and customizability. A major component of the work is a much needed formulation and validation of microarchitecture level fault models, required to drive high-level reliability solutions.
|
0.951 |
2009 — 2011 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Improving Storage System Performance, Dependability and Manageability Using System Mining Techniques @ University of California-San Diego |
0.951 |
2009 — 2010 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Application-Adaptive I/O Stack For Data-Intensive Scientific Computing @ University of California-San Diego
Advances in computational sciences have been greatly accelerated by the rapid growth of high-end computing (HEC) facilities. However, the continuous speedup of end-to-end scientific discovery cycles relies on the ability to store, share, and analyze the terabytes and petabytes of data generated by today's supercomputers. With the growing performance gap between I/O systems and processor/memory units, data storage and accesses are inevitably becoming more bottleneck-prone.
In this proposal, we address the I/O stack performance problem with adaptive optimizations at multiple layers of the HEC I/O stack (from high-level scientific data libraries to secondary storage devices and archiving systems), and propose effective communication schemes to integrate such optimizations across layers. In particular, our proposed PATIO (Parallel AdapTive I/O) framework explores multi-layer caching/prefetching that coordinates storage resources ranging from processors to tape archiving systems. This novel approach will bridge existing disjoint optimization efforts at each individual layer and responds to the critical call of improving the overall I/O system performance with increasingly deep HEC I/O stacks.
|
0.951 |
2010 — 2014 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Software and Hardware Support For Detecting Concurrency, Sequential and Distributed Bugs Via Data-Flow Invariants @ University of California-San Diego
Software reliability is critical for many applications. The pervasiveness of multi-core hardware and parallel programming makes parallel bugs become an increasingly important and urgent issue. Concurrency bugs together with the other types of bugs have significantly impacted software reliability. Although much effort has been put on detecting software bugs, existing work is still far from ideal, and many bugs, especially those in parallel or distributed programs, are still difficult to catch by existing tools. This proposal makes a major step toward improving the correctness of software, especially parallel and distribute software, by proposing a novel and widely applicable invariance, called data-flow invariance, that can be used to detect various types of software bugs, including parallel bugs and other types of bugs, and make software more reliable and secure. We strongly believe that our proposed research can effectively improve our understanding of this challenge, provide substantial tool support to software development, and greatly improve the quality of parallel and distributed software. We have also planned various educational and outreach activities for students, especially women students in computer science.
|
0.951 |
2010 — 2016 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Improving Software Diagnosability Via Automatic Log Inferrence and Informative Logging @ University of California-San Diego
Many applications require high reliability and availability. Unfortunately, as software has grown in size and complexity, many software bugs escape from testing into production runs and cause computer failures in real world. When a production run system fails, software engineers are frequently called upon emergency to diagnose and solve the issue within a tight time schedule. Because such errors directly impact customers? business, vendors make diagnosing and fixing them as the highest priority. Since in many cases it is impossible to reproduce production-run failures in house due to various reasons (privacy, execution environments, etc.), the common practice is that customers send back the logs generated by the failed system. Such logs are the sole data source (in addition to source code) for software engineers to troubleshoot the occurred failure. Based on what are in the logs, they manually infer what may have happened to narrow down the root cause.
Unfortunately, the above diagnosis process is mostly manual, very often a trial-and-error guess game and therefore is time-consuming, error-prone and also expensive in terms of both labor cost and system down time. Especially because log messages are added in an ad-hoc way, many of them do not provide precise, informative clues to help narrow down the root cause. Furthermore, the rapidly growing size and complexity as well as software aging has greatly affected modern software?s diagnosability.
To enable developers to quickly troubleshoot production-rune failures and shorten system downtime, we propose automatic log inference and informative logging to make real-world software more diagnosable. We not only will investigate new diagnosis tools that can analyze logs and source code together to help software engineers narrowing down the possible root causes, but also will explore new ways to automatically enhance software logging to make log messages more effective and efficient for diagnosis. As software has been widely used in our daily life, software reliability is becoming an important issue. Our proposed solutions will allow software engineers to quickly identify root causes and patches to fix the problem, which would significantly reduce the amount of system down time. As such, it benefits both software/system vendors and computer users, especially those financial companies where an hour of down time can result in multiple millions of dollars loss in business.
|
0.951 |
2010 — 2017 |
Zhou, Yuanyuan Rosing, Tajana (co-PI) [⬀] Jhala, Ranjit (co-PI) [⬀] Gupta, Rajesh [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Variability-Aware Software For Efficient Computing With Nanoscale Devices @ University of California-San Diego
Abstract: The Variability Expedition Project: Variability-Aware Software for Efficient Computing with Nanoscale Devices
As semiconductor manufacturers build ever smaller components, circuits and chips at the nano scale become less reliable and more expensive to produce ? no longer behaving like precisely chiseled machines with tight tolerances. Modern computing is effectively ignorant of the variability in behavior of underlying system components from device to device, their wear-out over time, or the environment in which the computing system is placed. This makes them expensive, fragile and vulnerable to even the smallest changes in the environment or component failures. We envision a computing world where system components -- led by proactive software -- routinely monitor, predict and adapt to the variability of manufactured systems. Changing the way software interacts with hardware offers the best hope for perpetuating the fundamental gains in computing performance at lower cost of the past 40 years. The Variability Expedition fundamentally rethinks the rigid, deterministic hardware-software interface, to propose a new class of computing machines that are not only adaptive but also highly energy efficient. These machines will be able to discover the nature and extent of variation in hardware, develop abstractions to capture these variations, and drive adaptations in the software stack from compilers, runtime to applications. The resulting computer systems will work and continue working while using components that vary in performance or grow less reliable over time and across technology generations. A fluid software-hardware interface will thus mitigate the variability of manufactured systems and make machines robust, reliable and responsive to the changing operating conditions.
The Variability Expedition marshals the resources of researchers at the California Institute for Telecommunications and Information Technology (Calit2) at UC San Diego and UC Irvine, as well as UCLA, University of Michigan, Stanford and University of Illinois at Urbana-Champaign. With expertise in process technology, architecture, and design tools on the hardware side, and in operating systems, compilers and languages on the software side, the team also has the system implementation and applications expertise needed to drive and evaluate the research as well as transition the research accomplishments into practice via application drivers in wireless sensing, software radio and mobile platforms.
A successful Expedition will dramatically change the computing landscape. By re-architecting software to work in a world where monitoring and adaptation are the norm, it will achieve more robust, efficient and affordable systems that are able to predict and withstand not only hardware failures, but other kinds of software bugs or even attacks. The new paradigm will apply across the entire spectrum of embedded, mobile, desktop and server-class computing machines, yielding particular gains in sensor information processing, multimedia rendering, software radios, search, medical imaging and other important applications. Transforming the relationship between hardware and software presents valuable opportunities to integrate research and education, and this Expedition will build on established collaborations with educator-partners in formal and informal arenas to promote interdisciplinary teaching, training, learning and research. The team has built strong industrial and community outreach ties to ensure success and reach out to high-school students through a combination of tutoring and summer school programs. The Variability Expedition will engage undergraduate and graduate students in software, hardware and systems research, while promoting participation by underrepresented groups at all levels and broadly disseminating results within academia and industry.
|
0.951 |
2012 — 2015 |
Saul, Lawrence (co-PI) [⬀] Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Automatically Detecting, Diagnosing and Resolving Abnormal Battery Drain Issues On Smartphone Systems @ University of California-San Diego
This project presents a software system to have the smartphone itself deal with the abnormal battery drain (ABD) issues that are caused by 'battery bugs' in smartphone applications or system software as well as by battery-related configuration errors and environmental changes. The proposed system architecture contains four subsystems, namely information collection, data analysis, diagnosis, and resolution, to self-detect, self-diagnose, and self-recover with little user intervention if possible when ABD events happen. Various technical methods, including machine learning and statistical approaches, will be investigated to achieve the design goal.
|
0.951 |
2012 — 2013 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
I-Corps: Automating People Research With Intelligent Analysis and Mining of Social Network Data On the Internet @ University of California-San Diego
The proposed work is to investigate the usefulness and feasibility of previously developed big data analysis on the problem of mining people-related information, differentiating people of the same name, aggregating information from different sources, and inferring people related information such as connections from various data sources. The biggest challenge is entity resolution (sometimes also referred to as entity disambiguation or record linkage), in which the same name may refer to different real world entities. For instances, many or even hundreds of people are named "James Smith". So which data is about the same "James Smith" and can be merged and aggregated together is not an easy question. The proposed solution aims to take on this problem and allow users to easily and quickly get information related to a target person on smartphones or tablets without spending one to two hours to do tedious, error-prone people research.
The revolution of Internet has provided a sea of information publicly available. A major part of such information is related to people and their social networks, which are valuable targeted advertisement, sales, marketing, expanding social network, recruiting, job search, etc. Aggregating people-related information is not an easy task. People-information is valuable in various business functions such as recruiting, sales, business development, etc. According to major search engines about one third of search is people search. The proposed work, if successful, could bring people-related information that is currently scattered in many data sources, together without the issue of name ambiguity. This work may also make such information quickly and conveniently to assist business people in networking, making new business connections more effectively and efficiently.
|
0.951 |
2013 — 2017 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Proactive Methods in Handling Configuration Errors in Data Centers and Cloud Infrastructures @ University of California-San Diego
Configuration errors (i.e., misconfiguration) are a major cause of system failures according to several studies. For example, misconfiguration has caused serious crashes and center-wide outages in a number of data centers and commercial cloud infrastructures affecting millions of customers. In addition to system down time, misconfiguration also wastes engineers' or administrators' time in troubleshooting and corrections, leading to significant maintenance and support costs.
Although recent work on detecting misconfiguration has improved the situation to some degree, the fundamental root cause needs to be better addressed. Based on the insights gained from the PIs' recent empirical study on 546 real world configuration errors in commercial and open source systems, the intellectual merit of this project is to take a more fundamental approach to addressing misconfiguration problems from the root cause in a proactive, anticipatory way. This work has three objectives: (1) to improve configuration design to make them less error-prone; (2) to harden software systems to better tolerate and gracefully react to users' configuration errors; and (3) to detect hard-to-check configuration issues such as compatibility and cross-component parameter inconsistency.
The broader impacts include significantly reducing the amount of system downtime in data centers, decreasing vendors' customer support cost for troubleshooting configuration issues, and planned educational, outreach, and broadening participation activities.
|
0.951 |
2015 — 2018 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Practical Methods For Removing Latent Configuration Errors in Cloud Platforms @ University of California-San Diego
Configuration errors are a major cause for computer failures. A special type of configuration error called latent configuration errors, often has the highest severity and causes many serious, wide spread outages in data centers and cloud infrastructures, which affects millions of customers. These configuration errors are prevalent and expensive to troubleshoot, and can result in millions of dollars in business losses. This project addresses this important latent configuration problem that has caused many data center-wide outages in various cloud platforms. The methods developed by this project will significantly reduce the amount of severe data center-wide outages and improving the availability of cloud services and applications.
Building on previous research experience in studying thousands of real world configuration errors in data centers, this project tackles this important latent configuration error problem via three practical and innovative research thrusts: (1) Automatically build configuration checkers to detect latent configuration errors at early stage before rolling out to thousands of nodes in data centers; (2) Design and build on-site configuration validation utility to allow data center administrators easily validate their configuration settings, especially those complex, latent ones; and (3) Improve configuration design to make them less prone to errors. The first research thrust observes the hidden validation checks in usages and develops automatic ways to separate the checks from the latent usages. The second research thrust is more fundamental as it aims to systematically simplify the configuration space to reduce configuration errors. The third thrust enables data center administrators to have more control of their configurations. In addition, various educational and outreach activities for students, especially women students in computer science.
|
0.951 |
2018 — 2021 |
Zhou, Yuanyuan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Satc: Core: Small: Practical Methods For Detecting Access Permission Vulnerabilities Caused by Sysadmin's Configuration Errors @ University of California-San Diego
As data center systems become ever so complex, it has been ever so daunting for system administrators to configure various permission correctly without accidentally opening up permissions for unintended users (and also malicious users) and resulting in catastrophic security disasters. Since data centers have been used to store and manage data not only for financial, business, communication, but also our daily life such as emails, photos, even home appliances and automobile data, it has become ever so important to prevent human errors (system administrator errors) in access permission configurations to avoid security attacks and privacy leaks. This project will develop new methods to detect and prevent permission configuration errors. The project will involve various educational and outreach activities for students, especially women students in computer science; the investigator has been a role model and a mentor for women high school students, undergraduates, graduates and junior faculty.
To address this access-control misconfigurations problem, the project has three main objectives: (i) providing sysadmins with precise, complete information, (ii) detecting suspicious accesses after access permission changes and (iii) eliminating access-control configuration mistakes. These three objectives will be achieved by using a combination of static program analysis, binary instrumentation, profiling, static and quantitative methods, decision tree machine learning, software testing, etc. The proposed research includes the following three synergistic thrusts: (1) Informative Logging for Access Permission-Related Errors. (2) Intelligent monitoring and detection of suspicious accesses. (3) Holistic Cross-component Access-Control Management. The three thrusts together well cover the important security problem that has never been addressed by prior work.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.951 |