Year |
Citation |
Score |
2020 |
Li Z, Jia H, Zhang Y, Chen T, Yuan L, Vuduc R. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs Ieee Transactions On Parallel and Distributed Systems. 31: 1925-1941. DOI: 10.1109/Tpds.2020.2977629 |
0.522 |
|
2019 |
Sao P, Li XS, Vuduc R. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems Journal of Parallel and Distributed Computing. 131: 218-234. DOI: 10.1016/J.Jpdc.2019.03.004 |
0.358 |
|
2019 |
Ma Y, Li J, Wu X, Yan C, Sun J, Vuduc R. Optimizing sparse tensor times matrix on GPUs Journal of Parallel and Distributed Computing. 129: 99-109. DOI: 10.1016/J.Jpdc.2018.07.018 |
0.576 |
|
2018 |
Hossain MM, Nath C, Tucker TM, Vuduc RW, Kurfess TR. A Graphics Processor Unit-Accelerated Freeform Surface Offsetting Method for High-Resolution Subtractive Three-Dimensional Printing (Machining) Journal of Manufacturing Science and Engineering. 140. DOI: 10.1115/1.4038599 |
0.459 |
|
2017 |
Du Z, Ge R, Lee VW, Vuduc R, Bader DA, He L. Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems Scientific Programming. 2017: 1-13. DOI: 10.1155/2017/8686971 |
0.354 |
|
2017 |
You Y, Demmel J, Czechowski K, Song L, Vuduc R. Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines Ieee Transactions On Parallel and Distributed Systems. 28: 974-988. DOI: 10.1109/Tpds.2016.2608823 |
0.706 |
|
2016 |
Wu Z, Tucker TM, Nath C, Kurfess TR, Vuduc RW. Step Ring-Based Three-Dimensional Path Planning Via Graphics Processing Unit Simulation for Subtractive Three-Dimensional Printing Journal of Manufacturing Science and Engineering. 139. DOI: 10.1115/1.4034662 |
0.402 |
|
2016 |
Hossain MM, Tucker TM, Kurfess TR, Vuduc RW. Hybrid Dynamic Trees for Extreme-Resolution 3D Sparse Data Modeling Proceedings - 2016 Ieee 30th International Parallel and Distributed Processing Symposium, Ipdps 2016. 132-141. DOI: 10.1109/IPDPS.2016.75 |
0.305 |
|
2015 |
Park S, Vuduc R, Harrold MJ. UNICORN: A unified approach for localizing non-deadlock concurrency bugs Software Testing Verification and Reliability. 25: 167-190. DOI: 10.1002/Stvr.1523 |
0.414 |
|
2014 |
Choi J, Chandramowlishwaran A, Madduri K, Vuduc R. A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method Acm International Conference Proceeding Series. 64-71. DOI: 10.1145/2576779.2576787 |
0.442 |
|
2014 |
Choi J, Dukhan M, Liu X, Vuduc R. Algorithmic time, energy, and power on candidate HPC compute building blocks Proceedings of the International Parallel and Distributed Processing Symposium, Ipdps. 447-457. DOI: 10.1109/IPDPS.2014.54 |
0.349 |
|
2014 |
Dukhan M, Vuduc R. Methods for high-throughput computation of elementary functions Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8384: 86-95. DOI: 10.1007/978-3-642-55224-3_9 |
0.366 |
|
2014 |
Sao P, Vuduc R, Li XS. A distributed CPU-GPU sparse direct solver Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8632: 487-498. DOI: 10.1007/978-3-319-09873-9_41 |
0.305 |
|
2014 |
Lee D, Sao P, Vuduc R, Gray AG. A distributed kernel summation framework for general-dimension machine learning Statistical Analysis and Data Mining. 7: 1-13. DOI: 10.1002/Sam.11207 |
0.463 |
|
2013 |
Czechowski K, Vuduc R. A theoretical framework for algorithm-architecture co-design Proceedings - Ieee 27th International Parallel and Distributed Processing Symposium, Ipdps 2013. 791-802. DOI: 10.1109/IPDPS.2013.99 |
0.332 |
|
2012 |
Kim H, Vuduc R, Baghsorkhi S, Hwu WM, Jee Choi. Performance analysis and tuning for general purpose graphics processing units (GPGPU) Synthesis Lectures On Computer Architecture. 20: 1-94. DOI: 10.2200/S00451ED1V01Y201209CAC020 |
0.409 |
|
2012 |
Sim J, Dasgupta A, Kim H, Vuduc R. A performance analysis framework for identifying potential benefits in GPGPU applications Acm Sigplan Notices. 47: 11-21. DOI: 10.1145/2370036.2145819 |
0.362 |
|
2012 |
Chandramowlishwaran A, Choi JW, Madduri K, Vuduc R. Brief announcement: Towards a communication optimal Fast Multipole Method and its implications at exascale Annual Acm Symposium On Parallelism in Algorithms and Architectures. 182-184. DOI: 10.1145/2312005.2312039 |
0.412 |
|
2012 |
Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G. A massively parallel adaptive fast multipole method on heterogeneous architectures Communications of the Acm. 55: 101-109. DOI: 10.1145/2160718.2160740 |
0.428 |
|
2012 |
Lee J, Kim H, Vuduc R. When prefetching works, when it doesn't, and why Transactions On Architecture and Code Optimization. 9. DOI: 10.1145/2133382.2133384 |
0.386 |
|
2012 |
Chandramowlishwaran A, Vuduc RW. Communication-optimal parallel N-body solvers Proceedings of the 2012 Ieee 26th International Parallel and Distributed Processing Symposium Workshops, Ipdpsw 2012. 2462-2465. DOI: 10.1109/IPDPSW.2012.303 |
0.374 |
|
2012 |
Park S, Vuduc R, Harrold MJ. A unified approach for localizing non-deadlock concurrency bugs Proceedings - Ieee 5th International Conference On Software Testing, Verification and Validation, Icst 2012. 51-60. DOI: 10.1109/ICST.2012.85 |
0.323 |
|
2011 |
Vuduc R, Czechowski K. What GPU computing means for high-end systems Ieee Micro. 31: 74-78. DOI: 10.1109/Mm.2011.78 |
0.427 |
|
2010 |
Lishwaran AC, Knobe K, Vuduc R. Applying the concurrent collections programming model to asynchronous parallel dense linear algebra Acm Sigplan Notices. 45: 345-346. DOI: 10.1145/1837853.1693506 |
0.355 |
|
2010 |
Choi JW, Singh A, Vuduc RW. Model-driven autotuning of sparse matrix-vector multiply on GPUs Acm Sigplan Notices. 45: 115-125. DOI: 10.1145/1837853.1693471 |
0.47 |
|
2010 |
Chandramowlishwaran A, Madduri K, Vuduc R. Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method 2010 Acm/Ieee International Conference For High Performance Computing, Networking, Storage and Analysis, Sc 2010. DOI: 10.1109/SC.2010.19 |
0.349 |
|
2010 |
Lee J, Lakshminarayana NB, Kim H, Vuduc R. Many-thread aware prefetching mechanisms for GPGPU applications Proceedings of the Annual International Symposium On Microarchitecture, Micro. 213-224. DOI: 10.1109/MICRO.2010.44 |
0.39 |
|
2010 |
Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures Proceedings of the 2010 Ieee International Symposium On Parallel and Distributed Processing, Ipdps 2010. DOI: 10.1109/IPDPS.2010.5470415 |
0.394 |
|
2009 |
Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G. A massively parallel adaptive fast-multipole method on heterogeneous architectures Proceedings of the Conference On High Performance Computing Networking, Storage and Analysis, Sc '09. DOI: 10.1145/1654059.1654118 |
0.391 |
|
2009 |
Kang S, Bader DA, Vuduc R. Understanding the design trade-offs among current multicore systems for numerical computations Ipdps 2009 - Proceedings of the 2009 Ieee International Parallel and Distributed Processing Symposium. DOI: 10.1109/IPDPS.2009.5161055 |
0.325 |
|
2009 |
Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms Parallel Computing. 35: 178-194. DOI: 10.1016/j.parco.2008.12.006 |
0.728 |
|
2007 |
Nishtala R, Vuduc RW, Demmel JW, Yelick KA. When cache blocking of sparse matrix vector multiply works and why Applicable Algebra in Engineering, Communications and Computing. 18: 297-311. DOI: 10.1007/S00200-007-0038-9 |
0.646 |
|
2005 |
Demmel J, Dongarra J, Eijkhout V, Fuentes E, Petitet A, Vuduc R, Whaley RC, Yelick K. Self-adapting Linear Algebra algorithms and software Proceedings of the Ieee. 93: 293-311. DOI: 10.1109/JPROC.2004.840848 |
0.601 |
|
2005 |
Vuduc R, Demmel JW, Yelick KA. OSKI: A library of automatically tuned sparse matrix kernels Journal of Physics: Conference Series. 16: 521-530. DOI: 10.1088/1742-6596/16/1/071 |
0.724 |
|
2005 |
Vuduc RW, Moon HJ. Fast sparse matrix-vector multiplication by exploiting variable block structure Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 3726: 807-816. DOI: 10.1007/11557654_91 |
0.497 |
|
2004 |
Im EJ, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels International Journal of High Performance Computing Applications. 18: 135-158. |
0.72 |
|
2004 |
Vuduc R, Demmel JW, Bilmes JA. Statistical models for empirical search-based performance tuning International Journal of High Performance Computing Applications. 18: 65-94. |
0.674 |
|
2004 |
Lee Benjamin BC, Vuduc RW, Demmel JW, Yelick KA. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply Proceedings of the International Conference On Parallel Processing. 169-176. |
0.75 |
|
2003 |
Vuduc R, Gyulassy A, Demmel JW, Yelick KA. Memory hierarchy optimizations and performance bounds for sparse ATAx Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2659: 705-714. |
0.74 |
|
2001 |
Vuduc R, Demmel JW, Bilmes J. Statistical models for automatic performance tuning Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2073: 117-126. |
0.655 |
|
2000 |
Vuduc R, Demmel JW. Code generators for automatic tuning of Numerical Kernels: Experiences with FFTW position paper Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 1924: 190-211. |
0.637 |
|
Show low-probability matches. |