## Joel S Emer

List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/9597368/publications.pdf Version: 2024-02-01



LOFI S EMED

| #  | Article                                                                                                                                                                         | IF   | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-----------|
| 1  | Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE, 2017, 105, 2295-2329.                                                             | 21.3 | 2,217     |
| 2  | Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.<br>IEEE Journal of Solid-State Circuits, 2017, 52, 127-138.                     | 5.4  | 1,877     |
| 3  | Eyeriss. Computer Architecture News, 2016, 44, 367-379.                                                                                                                         | 2.5  | 833       |
| 4  | Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9, 292-308. | 3.6  | 609       |
| 5  | SCNN., 2017,,.                                                                                                                                                                  |      | 550       |
| 6  | Exploiting choice. , 1996, , .                                                                                                                                                  |      | 547       |
| 7  | Adaptive insertion policies for high performance caching. , 2007, , .                                                                                                           |      | 415       |
| 8  | High performance cache replacement using re-reference interval prediction (RRIP). , 2010, , .                                                                                   |      | 404       |
| 9  | Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. ,<br>2016, , .                                                                 |      | 258       |
| 10 | Timeloop: A Systematic Approach to DNN Accelerator Evaluation. , 2019, , .                                                                                                      |      | 251       |
| 11 | SCNN. Computer Architecture News, 2017, 45, 27-40.                                                                                                                              | 2.5  | 241       |
| 12 | Adaptive insertion policies for managing shared caches. , 2008, , .                                                                                                             |      | 239       |
| 13 | SHiP., 2011,,.                                                                                                                                                                  |      | 181       |
| 14 | High performance cache replacement using re-reference interval prediction (RRIP). Computer<br>Architecture News, 2010, 38, 60-71.                                               | 2.5  | 176       |
| 15 | There's plenty of room at the Top: What will drive computer performance after Moore's law?. Science, 2020, 368, .                                                               | 12.6 | 171       |
| 16 | Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading.<br>ACM Transactions on Computer Systems, 1997, 15, 322-354.               | 0.8  | 161       |
| 17 | DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors. , 2018, , .                                                                                   |      | 151       |
| 18 | Adaptive insertion policies for high performance caching. Computer Architecture News, 2007, 35, 381-391.                                                                        | 2.5  | 129       |

| #  | Article                                                                                                                                                                                            | IF  | CITATIONS |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | Performance of the VAX-11/780 translation buffer. ACM Transactions on Computer Systems, 1985, 3, 31-62.                                                                                            | 0.8 | 122       |
| 20 | Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor. Computer Architecture News, 2004, 32, 264.                                                                          | 2.5 | 122       |
| 21 | ExTensor. , 2019, , .                                                                                                                                                                              |     | 121       |
| 22 | Scheduling heterogeneous multi-cores through performance impact estimation (PIE). , 2012, , .                                                                                                      |     | 120       |
| 23 | Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators. IEEE Micro, 2017, 37, 12-21.                                                                                     | 1.8 | 105       |
| 24 | Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA)<br>Cache Management Policies. , 2010, , .                                                           |     | 93        |
| 25 | Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE). Computer Architecture News, 2012, 40, 213-224.                                                                   | 2.5 | 89        |
| 26 | PACMan., 2011,,.                                                                                                                                                                                   |     | 84        |
| 27 | Computing Architectural Vulnerability Factors for Address-Based Structures. Computer Architecture News, 2005, 33, 532-543.                                                                         | 2.5 | 77        |
| 28 | Reducing cache misses using hardware and software page placement. , 1999, , .                                                                                                                      |     | 76        |
| 29 | Triggered instructions. , 2013, , .                                                                                                                                                                |     | 76        |
| 30 | A Characterization of Processor Performance in the vax-11/780. , 1984, , .                                                                                                                         |     | 75        |
| 31 | Leap scratchpads. , 2011, , .                                                                                                                                                                      |     | 74        |
| 32 | Efficient Processing of Deep Neural Networks. Synthesis Lectures on Computer Architecture, 2020, 15, 1-341.                                                                                        | 1.3 | 72        |
| 33 | HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. , 2011, , .                                                                                                   |     | 64        |
| 34 | CAMP: A technique to estimate per-structure power at run-time using a few simple parameters. , 2009, ,                                                                                             |     | 59        |
| 35 | Efficient Spatial Processing Element Control via Triggered Instructions. IEEE Micro, 2014, 34, 120-137.                                                                                            | 1.8 | 58        |
| 36 | A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With<br>Ground-Referenced Signaling in 16 nm. IEEE Journal of Solid-State Circuits, 2020, 55, 920-932. | 5.4 | 57        |

| #  | Article                                                                                                                                       | IF  | CITATIONS |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures. ACM Transactions on Computer Systems, 2015, 33, 1-32. | 0.8 | 56        |
| 38 | Exploiting choice. Computer Architecture News, 1996, 24, 191-202.                                                                             | 2.5 | 53        |
| 39 | Gamma: leveraging Gustavson's algorithm to accelerate sparse matrix multiplication. , 2021, , .                                               |     | 52        |
| 40 | CRUISE. , 2012, , .                                                                                                                           |     | 50        |
| 41 | Instruction fetching. , 1995, , .                                                                                                             |     | 49        |
| 42 | Buffets., 2019,,.                                                                                                                             |     | 47        |
| 43 | How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered Harmful. IEEE<br>Solid-State Circuits Magazine, 2020, 12, 28-41.    | 0.4 | 40        |
| 44 | Computing Accurate AVFs using ACE Analysis on Performance Models: A Rebuttal. IEEE Computer<br>Architecture Letters, 2008, 7, 21-24.          | 1.5 | 36        |
| 45 | Towards closing the energy gap between HOG and CNN features for embedded vision. , 2017, , .                                                  |     | 35        |
| 46 | Freely scalable and reconfigurable optical hardware for deep learning. Scientific Reports, 2021, 11, 3144.                                    | 3.3 | 32        |
| 47 | The LEAP FPGA operating system. , 2014, , .                                                                                                   |     | 31        |
| 48 | A characterization of processor performance in the VAX-11/780. , 1998, , .                                                                    |     | 30        |
| 49 | Triggered instructions. Computer Architecture News, 2013, 41, 142-153.                                                                        | 2.5 | 30        |
| 50 | Tarantula. Computer Architecture News, 2002, 30, 281-292.                                                                                     | 2.5 | 28        |
| 51 | A-Ports. , 2008, , .                                                                                                                          |     | 26        |
| 52 | Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs. , 2008, , .                                                |     | 26        |
| 53 | Leveraging latency-insensitivity to ease multiple FPGA design. , 2012, , .                                                                    |     | 25        |
| 54 | Memory dependence prediction using store sets. Computer Architecture News, 1998, 26, 142-153.                                                 | 2.5 | 23        |

| #  | Article                                                                                                                                     | IF  | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 55 | High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches. , 2015, , . |     | 23        |
| 56 | A language for describing predictors and its application to automatic synthesis. , 1997, , .                                                |     | 20        |
| 57 | Unlocking Ordered Parallelism with the Swarm Architecture. IEEE Micro, 2016, 36, 105-117.                                                   | 1.8 | 18        |
| 58 | Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching. IEEE Micro, 2008, 28, 91-98.                                        | 1.8 | 17        |
| 59 | CRUISE. Computer Architecture News, 2012, 40, 249-260.                                                                                      | 2.5 | 17        |
| 60 | A-Port Networks. ACM Transactions on Reconfigurable Technology and Systems, 2009, 2, 1-26.                                                  | 2.5 | 15        |
| 61 | LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories. , 2014, , .                                                    |     | 14        |
| 62 | A modular digital VLSI flow for high-productivity SoC design. , 2018, , .                                                                   |     | 14        |
| 63 | SpZip: Architectural Support for Effective Data Compression In Irregular Applications. , 2021, , .                                          |     | 14        |
| 64 | The gradient-based cache partitioning algorithm. Transactions on Architecture and Code Optimization, 2012, 8, 1-21.                         | 2.0 | 11        |
| 65 | Exploiting spatial architectures for edit distance algorithms. , 2014, , .                                                                  |     | 11        |
| 66 | Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators. , 2021, , .                  |     | 11        |
| 67 | A comparative study of arbitration algorithms for the Alpha 21364 pipelined router. Computer<br>Architecture News, 2002, 30, 223-234.       | 2.5 | 10        |
| 68 | Design contest overview: Combined architecture for network stream categorization and intrusion detection (CANSCID). , 2010, , .             |     | 10        |
| 69 | Scavenger: Automating the construction of application-optimized memory hierarchies. , 2015, , .                                             |     | 10        |
| 70 | ZIP-IO: Architecture for application-specific compression of Big Data. , 2012, , .                                                          |     | 9         |
| 71 | Single-Threaded vs. Multithreaded: Where Should We Focus?. IEEE Micro, 2007, 27, 14-24.                                                     | 1.8 | 8         |
|    |                                                                                                                                             |     |           |

| #  | Article                                                                                                                                  | IF  | CITATIONS |
|----|------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 73 | The Future of Architectural Simulation. IEEE Micro, 2010, 30, 8-18.                                                                      | 1.8 | 8         |
| 74 | Optimizing under abstraction: Using prefetching to improve FPGA performance. , 2013, , .                                                 |     | 7         |
| 75 | LMC., 2016,,.                                                                                                                            |     | 7         |
| 76 | Automatic Construction of Program-Optimized FPGA Memory Networks. , 2017, , .                                                            |     | 5         |
| 77 | SAM: Optimizing Multithreaded Cores for Speculative Parallelism. , 2017, , .                                                             |     | 5         |
| 78 | CRUISE. ACM SIGPLAN Notices, 2012, 47, 249-260.                                                                                          | 0.2 | 5         |
| 79 | Instruction fetching. Computer Architecture News, 1995, 23, 345-356.                                                                     | 2.5 | 4         |
| 80 | A Characterization of Processor Performance in the vax-11/780. Computer Architecture News, 1984, 12, 301-310.                            | 2.5 | 3         |
| 81 | Late-binding. Computer Architecture News, 2007, 35, 347-357.                                                                             | 2.5 | 3         |
| 82 | Accelerating architecture research. , 2009, , .                                                                                          |     | 3         |
| 83 | Using in-flight chains to build a scalable cache coherence protocol. Transactions on Architecture and Code Optimization, 2013, 10, 1-24. | 2.0 | 3         |
| 84 | (FPL 2015) Scavenger. ACM Transactions on Reconfigurable Technology and Systems, 2017, 10, 1-23.                                         | 2.5 | 3         |
| 85 | Retrospective: characterization of processor performance in the VAX-11/780. , 1998, , .                                                  |     | 2         |
| 86 | Architecture-Level Energy Estimation for Heterogeneous Computing Systems. , 2021, , .                                                    |     | 2         |
| 87 | Performance Potential of Effective Address Prediction of Load Instructions. , 2004, , 227-246.                                           |     | 1         |
| 88 | Design analysis of a heterogeneous distributed system. , 1986, , .                                                                       |     | 0         |
| 89 | < title>Architecture of a flexible real-time video encoder/decoder: the DECchip 21230. , 1997, , .                                       |     | 0         |
| 90 | A language for describing predictors and its application to automatic synthesis. Computer<br>Architecture News, 1997, 25, 304-314.       | 2.5 | 0         |

ARTICLE IF # CITATIONS Top Picks from the 2008 Computer Architecture Conferences. IEEE Micro, 2009, 29, 6-9. A Hierarchical Architectural Framework for Reconfigurable Logic Computing., 2013,,. 92 0 Single-threaded vs. multi-threaded. IEEE Micro, 2007, 27, x6. Accelerating Simulation with FPGAs. , 2010, , 107-126. 0 94 DEC Alpha., 2011, , 535-545. Incremental versus revolutionary research. ACM Computing Surveys, 1996, 28, 27. 96 23.0 0

JOEL S EMER