## Leonel Sousa

List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/6255434/publications.pdf Version: 2024-02-01



| #  | Article                                                                                                                                                                                                                                               | IF   | CITATIONS |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-----------|
| 1  | Communication contention in task scheduling. IEEE Transactions on Parallel and Distributed Systems, 2005, 16, 503-515.                                                                                                                                | 5.6  | 143       |
| 2  | Femtomolar limit of detection with a magnetoresistive biochip. Biosensors and Bioelectronics, 2009, 24, 2690-2695.                                                                                                                                    | 10.1 | 107       |
| 3  | Cache-aware Roofline model: Upgrading the loft. IEEE Computer Architecture Letters, 2014, 13, 21-24.                                                                                                                                                  | 1.5  | 89        |
| 4  | Massively LDPC Decoding on Multicore Architectures. IEEE Transactions on Parallel and Distributed Systems, 2011, 22, 309-322.                                                                                                                         | 5.6  | 85        |
| 5  | A Survey on Fully Homomorphic Encryption. ACM Computing Surveys, 2018, 50, 1-33.                                                                                                                                                                      | 23.0 | 82        |
| 6  | A Portable and Autonomous Magnetic Detection Platform for Biosensing. Sensors, 2009, 9, 4119-4137.                                                                                                                                                    | 3.8  | 76        |
| 7  | Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach. Journal of Computer<br>Science and Technology, 2009, 24, 913-924.                                                                                                             | 1.5  | 75        |
| 8  | General method for eliminating redundant computations in video coding. Electronics Letters, 2000,<br>36, 306.                                                                                                                                         | 1.0  | 72        |
| 9  | Toward a realistic task scheduling model. IEEE Transactions on Parallel and Distributed Systems, 2006, 17, 263-275.                                                                                                                                   | 5.6  | 67        |
| 10 | A universal architecture for designing efficient modulo 2/sup n/+1 multipliers. IEEE Transactions on<br>Circuits and Systems Part 1: Regular Papers, 2005, 52, 1166-1178.                                                                             | 0.1  | 63        |
| 11 | List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures. Parallel Computing, 2004, 30, 81-101.                                                                                  | 2.1  | 61        |
| 12 | Improving residue number system multiplication with more balanced moduli sets and enhanced modular arithmetic structures. IET Computers and Digital Techniques, 2007, 1, 472.                                                                         | 1.2  | 55        |
| 13 | Challenges and trends in the development of a magnetoresistive biochip portable platform. Journal of<br>Magnetism and Magnetic Materials, 2010, 322, 1655-1663.                                                                                       | 2.3  | 55        |
| 14 | Cost-Efficient SHA Hardware Accelerators. IEEE Transactions on Very Large Scale Integration (VLSI)<br>Systems, 2008, 16, 999-1008.                                                                                                                    | 3.1  | 52        |
| 15 | RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures. Computer Journal, 2012, 55, 629-647.                                                                                                                                | 2.4  | 49        |
| 16 | QCA-LG: A tool for the automatic layout generation of QCA combinational circuits. , 2007, , .                                                                                                                                                         |      | 46        |
| 17 | How GPUs can outperform ASICs for fast LDPC decoding. , 2009, , .                                                                                                                                                                                     |      | 46        |
| 18 | MRC-Based RNS Reverse Converters for the Four-Moduli Sets \${2^{n} + 1, 2^{n} - 1, 2^{n}, 2^{2n + 1} - 1}\$ and \$ {2^{n} + 1, 2^{n} - 1, 2^{2n}, 2^{2n + 1} - 1}\$. IEEE Transactions on Circuits and Systems II: Express Briefs, 2012, 59, 244-248. | 3.0  | 44        |

| #  | Article                                                                                                                                                                                         | IF  | CITATIONS |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | Combining Residue Arithmetic to Design Efficient Cryptographic Circuits and Systems. IEEE Circuits and Systems Magazine, 2016, 16, 6-32.                                                        | 2.3 | 43        |
| 20 | Real-time implementation of remotely sensed hyperspectral image unmixing on GPUs. Journal of Real-Time Image Processing, 2015, 10, 469-483.                                                     | 3.5 | 42        |
| 21 | RNS Reverse Converters for Moduli Sets With Dynamic Ranges up to \$(8n+1)\$-bit. IEEE Transactions on<br>Circuits and Systems I: Regular Papers, 2013, 60, 1487-1500.                           | 5.4 | 41        |
| 22 | Portable LDPC Decoding on Multicores Using OpenCL [Applications Corner]. IEEE Signal Processing Magazine, 2012, 29, 81-109.                                                                     | 5.6 | 40        |
| 23 | Efficient and configurable full-search block-matching processors. IEEE Transactions on Circuits and Systems for Video Technology, 2002, 12, 1160-1167.                                          | 8.3 | 38        |
| 24 | Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic<br>Likelihood Function. , 2009, , .                                                            |     | 37        |
| 25 | Visual neuroprosthesis: a non invasive system for stimulating the cortex. IEEE Transactions on Circuits and Systems Part 1: Regular Papers, 2005, 52, 2648-2662.                                | 0.1 | 36        |
| 26 | Massive parallel LDPC decoding on GPU. , 2008, , .                                                                                                                                              |     | 35        |
| 27 | Deep Learning Architectures for Accurate Millimeter Wave Positioning in 5G. Neural Processing Letters, 2020, 51, 487-514.                                                                       | 3.2 | 35        |
| 28 | Detection of 130nm magnetic particles by a portable electronic platform using spin valve and magnetic tunnel junction sensors. Journal of Applied Physics, 2008, 103, 07A310.                   | 2.5 | 32        |
| 29 | Dethroning GPS: Low-Power Accurate 5G Positioning Systems Using Machine Learning. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2020, 10, 240-252.                      | 3.6 | 32        |
| 30 | Performance and power modeling and evaluation of virtualized servers in IaaS clouds. Information Sciences, 2017, 394-395, 106-122.                                                              | 6.9 | 31        |
| 31 | Diode/magnetic tunnel junction cell for fully scalable matrix-based biochip. Journal of Applied Physics, 2006, 99, 08B307.                                                                      | 2.5 | 30        |
| 32 | Elliptic Curve point multiplication on GPUs. , 2010, , .                                                                                                                                        |     | 30        |
| 33 | 2-Axis Magnetometers Based on Full Wheatstone Bridges Incorporating Magnetic Tunnel Junctions<br>Connected in Series. IEEE Transactions on Magnetics, 2012, 48, 4107-4110.                      | 2.1 | 30        |
| 34 | Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems. IEEE<br>Transactions on Multimedia, 2014, 16, 108-121.                                                    | 7.2 | 30        |
| 35 | Implementation Strategy of Convolution Neural Networks on Field Programmable Gate Arrays for Appliance Classification Using the Voltage and Current (V-I) Trajectory. Energies, 2018, 11, 2460. | 3.1 | 30        |
| 36 | Efficient Method for Magnitude Comparison in RNS Based on Two Pairs of Conjugate Moduli.<br>Computer Arithmetic, IEEE Symposium on, 2007, , .                                                   | 0.0 | 29        |

| #  | Article                                                                                                                                                                                                                                                                                        | IF  | CITATIONS |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | A New Hand-Held Microsystem Architecture for Biological Analysis. IEEE Transactions on Circuits and Systems Part 1: Regular Papers, 2006, 53, 2384-2395.                                                                                                                                       | 0.1 | 28        |
| 38 | Caravela: A Novel Stream-Based Distributed Computing Environment. Computer, 2007, 40, 70-77.                                                                                                                                                                                                   | 1.1 | 28        |
| 39 | Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline<br>Modeling. , 2017, , .                                                                                                                                                                               |     | 28        |
| 40 | Nonconventional Computer Arithmetic Circuits, Systems and Applications. IEEE Circuits and Systems Magazine, 2021, 21, 6-40.                                                                                                                                                                    | 2.3 | 27        |
| 41 | A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing. Signal Processing, 2011, 91, 2443-2464.                                                                                                                                         | 3.7 | 26        |
| 42 | Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling. Eurasip Journal on Advances in<br>Signal Processing, 2007, 2007, .                                                                                                                                                        | 1.7 | 24        |
| 43 | On the Design of RNS Reverse Converters for the Four-Moduli Set \${f{2^{mmb n}+1, 2^{mmb n}-1, 2^{mmb n}, 2^{{mmb n}+1}+1}}\$. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2013, 21, 1945-1949.                                                                          | 3.1 | 24        |
| 44 | <inline-formula><tex-math>\$2^n\$</tex-math><alternatives> <inline-graphic<br>xlink:type="simple" xlink:href="sousa-ieq1-2401026.gif"/&gt;</inline-graphic<br></alternatives></inline-formula><br>RNS Scalers for Extended 4-Moduli Sets. IEEE Transactions on Computers, 2015, 64, 3322-3334. | 3.4 | 24        |
| 45 | GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection. Electronics Letters, 2011, 47, 542.                                                                                                                                                                         | 1.0 | 23        |
| 46 | The CRNS framework and its application to programmable and reconfigurable cryptography.<br>Transactions on Architecture and Code Optimization, 2013, 9, 1-25.                                                                                                                                  | 2.0 | 23        |
| 47 | TrustZone-backed bitcoin wallet. , 2017, , .                                                                                                                                                                                                                                                   |     | 23        |
| 48 | High coded data rate and multicodeword WiMAX LDPC decoding on Cell/BE. Electronics Letters, 2008, 44, 1415.                                                                                                                                                                                    | 1.0 | 21        |
| 49 | GHEVC: An Efficient HEVC Decoder for Graphics Processing Units. IEEE Transactions on Multimedia, 2017, 19, 459-474.                                                                                                                                                                            | 7.2 | 21        |
| 50 | Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems. Parallel Computing, 2012, 38, 365-390.                                                                                                                                                                                      | 2.1 | 20        |
| 51 | Reverse Converter Design via Parallel-Prefix Adders: Novel Components, Methodology, and<br>Implementations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015, 23, 374-378.                                                                                               | 3.1 | 20        |
| 52 | Efficient Modular Adder Designs Based on Thermometer and One-Hot Coding. IEEE Transactions on<br>Very Large Scale Integration (VLSI) Systems, 2019, 27, 2142-2155.                                                                                                                             | 3.1 | 20        |
| 53 | A hybrid algorithm for task scheduling on heterogeneous multiprocessor embedded systems. Applied Soft Computing Journal, 2020, 91, 106202.                                                                                                                                                     | 7.2 | 20        |
| 54 | Method to Design General RNS Reverse Converters for Extended Moduli Sets. IEEE Transactions on Circuits and Systems II: Express Briefs, 2013, 60, 877-881.                                                                                                                                     | 3.0 | 19        |

| #  | Article                                                                                                                                                                                                                                                                 | IF  | CITATIONS |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 55 | A Survey on Programmable LDPC Decoders. IEEE Access, 2016, 4, 6704-6718.                                                                                                                                                                                                | 4.2 | 19        |
| 56 | Arithmetic Units for RNS Moduli $\{2n-3\}$ and $\{2n+3\}$ Operations. , 2010, , .                                                                                                                                                                                       |     | 18        |
| 57 | Design Space Exploration of LDPC Decoders Using High-Level Synthesis. IEEE Access, 2017, 5, 14600-14615.                                                                                                                                                                | 4.2 | 18        |
| 58 | Design and implementation of a stream-based distributedcomputing platform using graphics processing units. , 2007, , .                                                                                                                                                  |     | 17        |
| 59 | Sign Detection and Number Comparison on RNS 3-Moduli Sets \$\${2^n-1, 2^{n+x}, 2^n+1}\$\$ { 2 n - 1 , 2 n<br>+ x , 2 n + 1 }. Circuits, Systems, and Signal Processing, 2017, 36, 1224-1246.                                                                            | 2.0 | 17        |
| 60 | New energyâ€efficient hybrid wideâ€operand adder architecture. IET Circuits, Devices and Systems, 2019, 13,<br>1221-1231.                                                                                                                                               | 1.4 | 17        |
| 61 | Beamformed Fingerprint Learning for Accurate Millimeter Wave Positioning. , 2018, , .                                                                                                                                                                                   |     | 16        |
| 62 | Data buffering optimization methods toward a uniform programming interface for gpu-based applications. , 2007, , .                                                                                                                                                      |     | 15        |
| 63 | Noise Characteristics and Particle Detection Limits in Diode\$+\$MTJ Matrix Elements for Biochip Applications. IEEE Transactions on Magnetics, 2007, 43, 2403-2405.                                                                                                     | 2.1 | 15        |
| 64 | On Task Scheduling Accuracy: Evaluation Methodology and Results. Journal of Supercomputing, 2004, 27, 177-194.                                                                                                                                                          | 3.6 | 14        |
| 65 | An RNS based Specific Processor for Computing the Minimum Sum-of-Absolute-Differences. , 2008, , .                                                                                                                                                                      |     | 14        |
| 66 | Cooperative CPU+GPU deblocking filter parallelization for high performance HEVC video codecs. , 2014, , .                                                                                                                                                               |     | 14        |
| 67 | Arithmetic-based Binary-to-RNS Converter Modulo ⁢inline-formula> ⁢tex-math<br>notation="LaTeX">\${{2^{n}{pm}k}}\$  for<br><inline-formula> <tex-math notation="LaTeX">\$jn\$<br/></tex-math></inline-formula> -bit Dynamic Range. IEEE Transactions on Very Large Scale | 3.1 | 14        |
| 68 | Performance Analysis with Cache-Aware Roofline Model in Intel Advisor. , 2017, , .                                                                                                                                                                                      |     | 14        |
| 69 | NTT Architecture for a Linux-Ready RISC-V Fully-Homomorphic Encryption Accelerator. IEEE<br>Transactions on Circuits and Systems I: Regular Papers, 2022, 69, 2669-2682.                                                                                                | 5.4 | 14        |
| 70 | p264., 2010,,.                                                                                                                                                                                                                                                          |     | 13        |
| 71 | A Lab Project on the Design and Implementation of Programmable and Configurable Embedded Systems.<br>IEEE Transactions on Education, 2013, 56, 322-328.                                                                                                                 | 2.4 | 13        |
| 72 | Beyond the Roofline: Cache-Aware Power and Energy-Efficiency Modeling for Multi-Cores. IEEE Transactions on Computers, 2017, 66, 52-58.                                                                                                                                 | 3.4 | 13        |

| #  | Article                                                                                                                                                                                                                                                                   | IF  | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 73 | Improving the Efficiency of SVM Classification With FHE. IEEE Transactions on Information Forensics and Security, 2020, 15, 1709-1722.                                                                                                                                    | 6.9 | 13        |
| 74 | Comparison of contention aware list scheduling heuristics for cluster computing. , 0, , .                                                                                                                                                                                 |     | 12        |
| 75 | A Parallel Algorithm for Advanced Video Motion Estimation on Multicore Architectures. , 2008, , .                                                                                                                                                                         |     | 12        |
| 76 | Efficient sign identification engines for integers represented in RNS extended 3â€moduli set {2 <i><br/><sup>n</sup>  â^²</i> 1, 2 <i> <sup>n</sup> </i> <sup>+</sup> <i> <sup>k</sup> </i> , 2 <i><br/><sup>n</sup> </i> + 1}. Electronics Letters, 2014, 50, 1138-1139. | 1.0 | 12        |
| 77 | HEVC in-loop filters GPU parallelization in embedded systems. , 2015, , .                                                                                                                                                                                                 |     | 12        |
| 78 | A Framework for Application-Guided Task Management on Heterogeneous Embedded Systems.<br>Transactions on Architecture and Code Optimization, 2016, 12, 1-25.                                                                                                              | 2.0 | 12        |
| 79 | A Multifunctional Unit for Designing Efficient RNS-Based Datapaths. IEEE Access, 2017, 5, 25972-25986.                                                                                                                                                                    | 4.2 | 12        |
| 80 | Modeling and Evaluation of Service Composition in Commercial Multiclouds Using Timed Colored<br>Petri Nets. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50, 947-961.                                                                               | 9.3 | 12        |
| 81 | RNS Arithmetic Units for Modulo $\{2^n+k\}$ ., 2012,,.                                                                                                                                                                                                                    |     | 11        |
| 82 | On the Design of RNS Inter-Modulo Processing Units for the Arithmetic-Friendly Moduli Sets<br>{2 <i>n</i> + <i>k</i> , 2 <i>n</i> â^` 1, 2 <i>n</i> +1 â^` 1}. Computer Journal, 2019, 62, 292-300.                                                                       | 2.4 | 11        |
| 83 | Modeling Non-Uniform Memory Access on Large Compute Nodes with the Cache-Aware Roofline<br>Model. IEEE Transactions on Parallel and Distributed Systems, 2019, 30, 1374-1389.                                                                                             | 5.6 | 11        |
| 84 | Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study. Lecture<br>Notes in Computer Science, 2009, , 237-246.                                                                                                                           | 1.3 | 11        |
| 85 | A genetic-based approach for service placement in fog computing. Journal of Supercomputing, 2022, 78, 10854-10875.                                                                                                                                                        | 3.6 | 11        |
| 86 | Customisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs. Lecture Notes in Computer Science, 2003, , 745-754.                                                                                                                                       | 1.3 | 10        |
| 87 | Efficient Independent Component Analysis on a GPU. , 2010, , .                                                                                                                                                                                                            |     | 10        |
| 88 | Simultaneous Multi-Level Divisible Load Balancing for Heterogeneous Desktop Systems. , 2012, , .                                                                                                                                                                          |     | 10        |
| 89 | Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs. , 2013, , .                                                                                                                                                                       |     | 10        |
| 90 | SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores. Lecture Notes in Computer Science, 2014, , 230-241.                                                                                                                                            | 1.3 | 10        |

| #   | Article                                                                                                                                                 | IF  | CITATIONS |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 91  | Combining flexibility with low power: Dataflow and wide-pipeline LDPC decoding engines in the Gbit/s era. , 2014, , .                                   |     | 10        |
| 92  | GPU-assisted HEVC intra decoder. Journal of Real-Time Image Processing, 2016, 12, 531-547.                                                              | 3.5 | 10        |
| 93  | Energyâ€aware mechanism for stencilâ€based MPDATA algorithm with constraints. Concurrency<br>Computation Practice and Experience, 2017, 29, e4016.      | 2.2 | 10        |
| 94  | Data-Aided Fast Beamforming Selection for 5G. , 2018, , .                                                                                               |     | 10        |
| 95  | Retargeting Tensor Accelerators for Epistasis Detection. IEEE Transactions on Parallel and Distributed Systems, 2021, 32, 2160-2174.                    | 5.6 | 10        |
| 96  | An ASIP approach for adaptive AVC Motion Estimation. , 2007, , .                                                                                        |     | 9         |
| 97  | Neural code metrics: Analysis and application to the assessment of neural models. Neurocomputing, 2009, 72, 2337-2350.                                  | 5.9 | 9         |
| 98  | Development and evaluation of scalable video motion estimators on GPU. , 2009, , .                                                                      |     | 9         |
| 99  | An Efficient Scalable RNS Architecture for Large Dynamic Ranges. Journal of Signal Processing Systems, 2014, 77, 191-205.                               | 2.1 | 9         |
| 100 | Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs. Eurasip Journal on Advances in Signal Processing, 2014, 2014, .     | 1.7 | 9         |
| 101 | Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision. , 2015, , .                                                                        |     | 9         |
| 102 | Towards GPU HEVC intra decoding: Seizing fine-grain parallelism. , 2015, , .                                                                            |     | 9         |
| 103 | Exploiting task and data parallelism for advanced video coding on hybrid CPUÂ+ÂGPU platforms. Journal of Real-Time Image Processing, 2016, 11, 571-587. | 3.5 | 9         |
| 104 | Towards Efficient Modular Adders based on Reversible Circuits. , 2018, , .                                                                              |     | 9         |
| 105 | More efficient, provably-secure direct anonymous attestation from lattices. Future Generation Computer Systems, 2019, 99, 425-458.                      | 7.5 | 9         |
| 106 | On-the-fly attestation of reconfigurable hardware. , 2008, , .                                                                                          |     | 8         |
| 107 | Binary-to-RNS Conversion Units for moduli {2^n ± 3}. , 2011, , .                                                                                        |     | 8         |
| 108 | Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines. , 2013, , .                                              |     | 8         |

| #   | Article                                                                                                                                                                                                                  | IF              | CITATIONS |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|-----------|
| 109 | Programmable RNS lattice-based parallel cryptographic decryption. , 2015, , .                                                                                                                                            |                 | 8         |
| 110 | Ubiquitous Multimedia: Emerging Research on Multimedia Computing. IEEE MultiMedia, 2016, 23, 12-15.                                                                                                                      | 1.7             | 8         |
| 111 | An Efficient Component for Designing Signed Reverse Converters for a Class of RNS Moduli Sets of Composite Form \${2^{k}, 2^{P}-1}\$. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017, 25, 48-59. | 3.1             | 8         |
| 112 | Multiobjective Frog-Leaping Optimization for the Study of Ancestral Relationships in Protein Data.<br>IEEE Transactions on Evolutionary Computation, 2018, 22, 879-893.                                                  | 10.0            | 8         |
| 113 | Exploring the Binary Precision Capabilities of Tensor Cores for Epistasis Detection. , 2020, , .                                                                                                                         |                 | 8         |
| 114 | Application-driven Cache-Aware Roofline Model. Future Generation Computer Systems, 2020, 107, 257-273.                                                                                                                   | 7.5             | 8         |
| 115 | Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU<br>Clusters. Lecture Notes in Computer Science, 2012, , 489-501.                                                          | 1.3             | 8         |
| 116 | Application Specific Instruction Set Processor for Adaptive Video Motion Estimation. , 2006, , .                                                                                                                         |                 | 7         |
| 117 | Reconfigurable architectures and processors for real-time video motion estimation. Journal of Real-Time Image Processing, 2007, 2, 191-205.                                                                              | 3.5             | 7         |
| 118 | Compact and Flexible Microcoded Elliptic Curve Processor for Reconfigurable Devices. , 2009, , .                                                                                                                         |                 | 7         |
| 119 | On the Modeling of New Tunnel Junction Magnetoresistive Biosensors. IEEE Transactions on<br>Instrumentation and Measurement, 2010, 59, 92-100.                                                                           | 4.7             | 7         |
| 120 | Real-time DVB-S2 LDPC decoding on many-core GPU accelerators. , 2011, , .                                                                                                                                                |                 | 7         |
| 121 | EFFICIENT METHOD FOR DESIGNING MODULO $\{2n \ \hat{A} \pm k\}$ MULTIPLIERS. Journal of Circuits, Systems and Computers, 2014, 23, 1450001.                                                                               | 1.5             | 7         |
| 122 | Reconfigurable data flow engine for HEVC motion estimation. , 2014, , .                                                                                                                                                  |                 | 7         |
| 123 | High-Level Designs of Complex FIR Filters on FPGAs for the SKA. , 2016, , .                                                                                                                                              |                 | 7         |
| 124 | A Reduced-Bias Approach With a Lightweight Hard-Multiple Generator to Design a Radix-8 Modulo<br>\$2^{n} + 1\$ Multiplier. IEEE Transactions on Circuits and Systems II: Express Briefs, 2017, 64, 817-821.              | 3.0             | 7         |
| 125 | Performability-Based Workflow Scheduling in Grids. Computer Journal, 2018, 61, 1479-1495.                                                                                                                                | 2.4             | 7         |
| 126 | Sign Identifier for the Enhanced Three Moduli Set {2n + k, 2n â^ 1, 2n+ 1 â^ 1}. Journal of Signal Pro<br>Systems, 2019, 91, 953-961.                                                                                    | ocessing<br>2.1 | 7         |

- 126
- 8

| #   | Article                                                                                                                                                                                      | IF  | CITATIONS |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 127 | The Role of Non-Positional Arithmetic on Efficient Emerging Cryptographic Algorithms. IEEE Access, 2020, 8, 59533-59549.                                                                     | 4.2 | 7         |
| 128 | Parallel LDPC Decoding on the Cell/B.E. Processor. Lecture Notes in Computer Science, 2009, , 389-403.                                                                                       | 1.3 | 7         |
| 129 | Accelerating 3-Way Epistasis Detection with CPU+GPU Processing. Lecture Notes in Computer Science, 2020, , 106-126.                                                                          | 1.3 | 7         |
| 130 | On Realistic Divisible Load Scheduling in Highly Heterogeneous Distributed Systems. , 2012, , .                                                                                              |     | 6         |
| 131 | Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems. IEEE<br>Transactions on Circuits and Systems for Video Technology, 2016, 26, 597-611.                   | 8.3 | 6         |
| 132 | Special issue on real-time energy-aware circuits and systems for HEVC and for its 3D and SVC extensions. Journal of Real-Time Image Processing, 2017, 13, 1-3.                               | 3.5 | 6         |
| 133 | Temperature-aware dynamic voltage and frequency scaling enabled MPSoC modeling using Stochastic Activity Networks. Microprocessors and Microsystems, 2018, 60, 15-23.                        | 2.8 | 6         |
| 134 | Enhancing Beamformed Fingerprint Outdoor Positioning with Hierarchical Convolutional Neural Networks. , 2019, , .                                                                            |     | 6         |
| 135 | Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms. Lecture Notes in<br>Computer Science, 2013, , 165-174.                                                     | 1.3 | 6         |
| 136 | Monitoring Performance and Power for Application Characterization with the Cache-Aware Roofline<br>Model. Lecture Notes in Computer Science, 2014, , 747-760.                                | 1.3 | 6         |
| 137 | Efficient motion vector refinement architecture for sub-pixel motion estimation systems. , 0, , .                                                                                            |     | 5         |
| 138 | Adaptive Motion Estimation Algorithm for H.264/AVC. , 2007, , .                                                                                                                              |     | 5         |
| 139 | CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications. , 2009, , .                                                                                                      |     | 5         |
| 140 | Modelling and programming stream-based distributed computing based on the meta-pipeline approach.<br>International Journal of Parallel, Emergent and Distributed Systems, 2009, 24, 311-330. | 1.0 | 5         |
| 141 | Iterative induced dipoles computation for molecular mechanics on GPUs. , 2010, , .                                                                                                           |     | 5         |
| 142 | Randomised multiâ€modulo residue number system architecture for doubleâ€andâ€add to prevent power<br>analysis side channel attacks. IET Circuits, Devices and Systems, 2013, 7, 283-293.     | 1.4 | 5         |
| 143 | Base Transformation With Injective Residue Mapping for Dynamic Range Reduction in RNS. IEEE<br>Transactions on Circuits and Systems I: Regular Papers, 2015, 62, 2248-2259.                  | 5.4 | 5         |
| 144 | Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU. Signal Processing: Image Communication, 2018, 62, 93-105.                                                          | 3.2 | 5         |

| #   | Article                                                                                                                                                                      | IF  | CITATIONS |
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 145 | A methodical FHE-based cloud computing model. Future Generation Computer Systems, 2019, 95, 639-648.                                                                         | 7.5 | 5         |
| 146 | A multiobjective adaptive approach for the inference of evolutionary relationships in protein-based scenarios. Information Sciences, 2019, 485, 281-300.                     | 6.9 | 5         |
| 147 | Modeling Epidemic Routing: Capturing Frequently Visited Locations While Preserving Scalability. IEEE<br>Transactions on Vehicular Technology, 2021, 70, 2713-2727.           | 6.3 | 5         |
| 148 | Rescheduling for Optimized SHA-1 Calculation. Lecture Notes in Computer Science, 2006, , 425-434.                                                                            | 1.3 | 5         |
| 149 | Algorithm for modulo (2n+1) multiplication. Electronics Letters, 2003, 39, 752.                                                                                              | 1.0 | 4         |
| 150 | Meta-Pipeline: A New Execution Mechanism for Distributed Pipeline Processing. , 2007, , .                                                                                    |     | 4         |
| 151 | Integrated Spintronic Platforms for Biomolecular Recognition Detection. AIP Conference Proceedings, 2008, , .                                                                | 0.4 | 4         |
| 152 | Low power microarchitecture with instruction reuse. , 2008, , .                                                                                                              |     | 4         |
| 153 | Merged Computation for Whirlpool Hashing. , 2008, , .                                                                                                                        |     | 4         |
| 154 | Statistical Analysis of a Spike Train Distance in Poisson Models. IEEE Signal Processing Letters, 2008, 15, 357-360.                                                         | 3.6 | 4         |
| 155 | Efficient implementation of multi-moduli architectures for Binary-to-RNS conversion. , 2012, , .                                                                             |     | 4         |
| 156 | Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design.<br>Eurasip Journal on Wireless Communications and Networking, 2012, 2012, . | 2.4 | 4         |
| 157 | On the Evaluation of Multi-core Systems with SIMD Engines for Public-Key Cryptography. , 2014, , .                                                                           |     | 4         |
| 158 | High performance IP core for HEVC quantization. , 2015, , .                                                                                                                  |     | 4         |
| 159 | Stretching the limits of Programmable Embedded Devices for Public-key Cryptography. , 2015, , .                                                                              |     | 4         |
| 160 | Arithmetical Improvement of the Round-Off for Cryptosystems in High-Dimensional Lattices. IEEE Transactions on Computers, 2017, 66, 2005-2018.                               | 3.4 | 4         |
| 161 | MrBayes sMC3. International Journal of High Performance Computing Applications, 2018, 32, 246-265.                                                                           | 3.7 | 4         |
| 162 | Comparative assessment of GPGPU technologies to accelerate objective functions: A case study on parsimony. Journal of Parallel and Distributed Computing, 2019, 126, 67-81.  | 4.1 | 4         |

| #   | Article                                                                                                                                                             | IF  | CITATIONS |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 163 | Towards the Integration of Reverse Converters into the RNS Channels. IEEE Transactions on Computers, 2020, 69, 342-348.                                             | 3.4 | 4         |
| 164 | Raising the Abstraction Level of a Deep Learning Design on FPGAs. IEEE Access, 2020, 8, 205148-205161.                                                              | 4.2 | 4         |
| 165 | Enhancing Data Parallelism of Fully Homomorphic Encryption. Lecture Notes in Computer Science, 2017, , 194-207.                                                     | 1.3 | 4         |
| 166 | Modeling Large Compute Nodes withÂHeterogeneous Memories withÂCache-Aware Roofline Model.<br>Lecture Notes in Computer Science, 2018, , 91-113.                     | 1.3 | 4         |
| 167 | Uncertainty Estimation via Monte Carlo Dropout in CNN-Based mmWave MIMO Localization. IEEE Signal Processing Letters, 2022, 29, 269-273.                            | 3.6 | 4         |
| 168 | Fast transcoding architectures for insertion of non-regular shaped objects in the compressed DCT-domain. Signal Processing: Image Communication, 2003, 18, 659-683. | 3.2 | 3         |
| 169 | Task Scheduling: Considering the Processor Involvement in Communication. , 0, , .                                                                                   |     | 3         |
| 170 | Additive Logistic Regression Applied to Retina Modelling. , 2007, , .                                                                                               |     | 3         |
| 171 | Feature Selection for the Stochastic Integrate and Fire Model. , 2007, , .                                                                                          |     | 3         |
| 172 | Developing and Integrating Lab Projects as Important Learning Components in an Embedded Systems<br>Course. , 2007, , .                                              |     | 3         |
| 173 | Edge Stream Oriented LDPC Decoding. , 2008, , .                                                                                                                     |     | 3         |
| 174 | Efficient FPGA elliptic curve cryptographic processor over GF(2 <sup>m</sup> ). , 2008, , .                                                                         |     | 3         |
| 175 | On the design of distributed autonomous embedded systems for biomedical applications. , 2009, , .                                                                   |     | 3         |
| 176 | Exploiting SIMD extensions for linear image processing with OpenCL. , 2010, , .                                                                                     |     | 3         |
| 177 | Computation of Induced Dipoles in Molecular Mechanics Simulations Using Graphics Processors.<br>Journal of Chemical Information and Modeling, 2012, 52, 1159-1166.  | 5.4 | 3         |
| 178 | Performance-Aware Task Management and Frequency Scaling in Embedded Systems. , 2014, , .                                                                            |     | 3         |
| 179 | Method for Designing Efficient Mixed Radix Multipliers. Circuits, Systems, and Signal Processing, 2014, 33, 3165-3193.                                              | 2.0 | 3         |
| 180 | Method for designing multi-channel RNS architectures to prevent power analysis SCA. , 2014, , .                                                                     |     | 3         |

| #   | Article                                                                                                                                                                                                 | IF  | CITATIONS |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 181 | GPU acceleration of the HEVC decoder inter prediction module. , 2015, , .                                                                                                                               |     | 3         |
| 182 | CPU Parallelization of HEVC In-Loop Filters. International Journal of Parallel Programming, 2017, 45, 1515-1535.                                                                                        | 1.5 | 3         |
| 183 | Accelerating the phylogenetic parsimony function on heterogeneous systems. Concurrency Computation Practice and Experience, 2017, 29, e4046.                                                            | 2.2 | 3         |
| 184 | Introduction to Residue Number System: Structure and Teaching Methodology. , 2017, , 3-17.                                                                                                              |     | 3         |
| 185 | 3D-HEVC DMM-1 Parallelism Exploration Targeting Multicore Systems. , 2018, , .                                                                                                                          |     | 3         |
| 186 | Scalable Performance Analysis of Epidemic Routing Considering Skewed Location Visiting Preferences. , 2019, , .                                                                                         |     | 3         |
| 187 | Parallelism exploration for 3D high-efficiency video coding depth modeling mode one. Journal of Real-Time Image Processing, 2020, 17, 787-797.                                                          | 3.5 | 3         |
| 188 | Variable Latency Carry Speculative Adders with Input-based Dynamic Configuration. Computers and Electrical Engineering, 2021, 93, 107247.                                                               | 4.8 | 3         |
| 189 | Number Theoretic Transform Architecture suitable to Lattice-based Fully-Homomorphic Encryption. , 2021, , .                                                                                             |     | 3         |
| 190 | Low Power Distance Measurement Unit for Real-Time Hardware Motion Estimators. Lecture Notes in<br>Computer Science, 2006, , 247-255.                                                                    | 1.3 | 3         |
| 191 | A programmable cellular neural network circuit. , 2004, , .                                                                                                                                             |     | 2         |
| 192 | The Midlifekicker Microarchitecture Evaluation Metric. , 0, , .                                                                                                                                         |     | 2         |
| 193 | On the Implementation and Evaluation of Berkeley Sockets on Maestro2 cluster computing environment. , 0, , .                                                                                            |     | 2         |
| 194 | Corrections to "A Universal Architecture for Designing Efficient<br>Modulo <tex>\$2^n+1\$</tex> Multipliers”. IEEE Transactions on Circuits and Systems Part 1:<br>Regular Papers, 2005, 52, 1982-1982. | 0.1 | 2         |
| 195 | A Run-Time Reconfigurable Processor for Video Motion Estimation. , 2007, , .                                                                                                                            |     | 2         |
| 196 | Application Specific Programmable IP Core for Motion Estimation: Technology Comparison Targeting Efficient Embedded Co-Processing Units. , 2008, , .                                                    |     | 2         |
| 197 | Merged computation for Whirlpool hashing. , 2008, , .                                                                                                                                                   |     | 2         |
| 198 | Multi-core platforms for signal processing: source and channel coding. , 2009, , .                                                                                                                      |     | 2         |

| #   | Article                                                                                                                                                                        | IF  | CITATIONS |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 199 | A Feature Selection Algorithm for the Regularization of Neuron Models. IEEE Transactions on<br>Instrumentation and Measurement, 2009, 58, 3824-3830.                           | 4.7 | 2         |
| 200 | Measuring and Extraction of Biological Information on New Handheld Biochip-Based Microsystem.<br>IEEE Transactions on Instrumentation and Measurement, 2010, 59, 56-62.        | 4.7 | 2         |
| 201 | H.264/AVC framework for multi-core embedded video encoders. , 2010, , .                                                                                                        |     | 2         |
| 202 | Collaborative execution environment for heterogeneous parallel systems. , 2010, , .                                                                                            |     | 2         |
| 203 | Embedded multicore architectures for LDPC decoding. , 2010, , .                                                                                                                |     | 2         |
| 204 | Hardware/software co-design of H.264/AVC encoders for multi-core embedded systems. , 2010, , .                                                                                 |     | 2         |
| 205 | Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing. Journal of Signal Processing Systems, 2011, 62, 301-318. | 2.1 | 2         |
| 206 | High throughput and scalable architecture for unified transform coding in embedded H.264/AVC video coding systems. , 2011, , .                                                 |     | 2         |
| 207 | Scheduling Divisible Loads on Heterogeneous Desktop Systems with Limited Memory. Lecture Notes in Computer Science, 2012, , 491-501.                                           | 1.3 | 2         |
| 208 | High Performance Unified Architecture for Forward and Inverse Quantization in H.264/AVC. , 2012, , .                                                                           |     | 2         |
| 209 | A compact and scalable RNS architecture. , 2013, , .                                                                                                                           |     | 2         |
| 210 | A comparison of computing architectures and parallelization frameworks based on a two-dimensional FDTD. , 2013, , .                                                            |     | 2         |
| 211 | DARNS:A randomized multi-modulo RNS architecture for double-and-add in ECC to prevent power analysis side channel attacks. , 2013, , .                                         |     | 2         |
| 212 | Collaborative inter-prediction on CPU+GPU systems. , 2014, , .                                                                                                                 |     | 2         |
| 213 | FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems. , 2014, , .                                                                                   |     | 2         |
| 214 | A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS. Journal of Signal Processing Systems, 2014, 76, 249-259.                                    | 2.1 | 2         |
| 215 | Efficient HEVC decoder for heterogeneous CPU with GPU systems. , 2016, , .                                                                                                     |     | 2         |
| 216 | Method for designing two levels RNS reverse converters for large dynamic ranges. The Integration VLSI Journal, 2016, 55, 22-29.                                                | 2.1 | 2         |

| #   | Article                                                                                                                                                               | IF  | CITATIONS |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 217 | A stochastic number representation for fully homomorphic cryptography. , 2017, , .                                                                                    |     | 2         |
| 218 | HyPoRes: An Hybrid Representation System for ECC. , 2019, , .                                                                                                         |     | 2         |
| 219 | Parallel evolutionary computation for multiobjective gene interaction analysis. Journal of Computational Science, 2020, 40, 101068.                                   | 2.9 | 2         |
| 220 | Exploiting multi-level parallel metaheuristics and heterogeneous computing to boost phylogenetics.<br>Future Generation Computer Systems, 2022, 127, 208-224.         | 7.5 | 2         |
| 221 | Bioinspired Stimulus Encoder for Cortical Visual Neuroprostheses. , 2005, , 279-290.                                                                                  |     | 2         |
| 222 | A Lattice-Based Enhanced Privacy ID. Lecture Notes in Computer Science, 2020, , 15-31.                                                                                | 1.3 | 2         |
| 223 | Mansard Roofline Model: Reinforcing the Accuracy of the Roofs. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 2021, 6, 1-23.           | 0.9 | 2         |
| 224 | Fourth-Order Exhaustive Epistasis Detection for the xPU Era. , 2021, , .                                                                                              |     | 2         |
| 225 | Parallel LDPC Decoding. , 2011, , 619-628.                                                                                                                            |     | 2         |
| 226 | Efficient Reductions in Cyclotomic Rings - Application to Ring-LWE Based FHE Schemes. Lecture Notes in Computer Science, 2018, , 151-171.                             | 1.3 | 2         |
| 227 | Generic Architecture Designed for Biomedical Embedded Systems. , 2007, , 353-362.                                                                                     |     | 2         |
| 228 | Video coding by using the 3D zero-tree approach in the wavelet transform domain. , 0, , .                                                                             |     | 1         |
| 229 | Automatic Synthesis of Motion Estimation Processors Based on a New Class of Hardware Architectures. Journal of Signal Processing Systems, 2003, 34, 277-290.          | 1.0 | 1         |
| 230 | An Efficient Expectation-Maximisation Algorithm for Spike Classification. , 2007, , .                                                                                 |     | 1         |
| 231 | An improved RNS generator 2 <sup>n</sup> ± k based on threshold logic. , 2010, , .                                                                                    |     | 1         |
| 232 | A flexible architecture for the computation of direct and inverse transforms in H.264/AVC video codecs. IEEE Transactions on Consumer Electronics, 2011, 57, 936-944. | 3.6 | 1         |
| 233 | VLSI Reverse Converter for RNS Based on the Moduli Set. , 2012, , .                                                                                                   |     | 1         |
| 234 | Energy efficient stream-based configurable architecture for embedded platforms. , 2012, , .                                                                           |     | 1         |

| #   | Article                                                                                                                                                                                                                           | IF  | CITATIONS |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 235 | Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems. International<br>Journal of Parallel Programming, 2013, 41, 236-260.                                                                          | 1.5 | 1         |
| 236 | An RNS-based architecture targeting hardware accelerators for modular arithmetic. , 2013, , .                                                                                                                                     |     | 1         |
| 237 | ROM-less RNS-to-binary converter moduli {2 <sup>2n</sup> − 1, 2 <sup>2n</sup> + 1, 2 <sup>n</sup> − 3, 2 <sup>n</sup> + 3}. , 2014, , .                                                                                           |     | 1         |
| 238 | RNS reverse converters based on the new Chinese Remainder Theorem I. , 2015, , .                                                                                                                                                  |     | 1         |
| 239 | On Boosting Energy-Efficiency of Heterogeneous Embedded Systems via Game Theory. , 2017, , .                                                                                                                                      |     | 1         |
| 240 | Inter-Algorithm Multiobjective Cooperation for Phylogenetic Reconstruction on Amino Acid Data.<br>IEEE Transactions on Cybernetics, 2022, 52, 3577-3591.                                                                          | 9.5 | 1         |
| 241 | Temperatureâ€aware core management in MPSoCs: modelling and evaluation using MRMs. IET Computers and Digital Techniques, 2020, 14, 17-26.                                                                                         | 1.2 | 1         |
| 242 | GPU acceleration of Fitch's parsimony on protein data: from Kepler to Turing. Journal of<br>Supercomputing, 2020, 76, 9827-9853.                                                                                                  | 3.6 | 1         |
| 243 | BRAM-LUT Tradeoff on a Polymorphic DES Design. Lecture Notes in Computer Science, 2008, , 55-65.                                                                                                                                  | 1.3 | 1         |
| 244 | Distributed Shared Memory System Based on the Maestro2 High Performance Cluster Network. , 0, , .                                                                                                                                 |     | 0         |
| 245 | Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs. , 2008, , .                                                                                                           |     | О         |
| 246 | Distributed Web-based Platform for Computer Architecture Simulation. , 2008, , .                                                                                                                                                  |     | 0         |
| 247 | Design and implementation of a tool for modeling and programming deadlock free meta-pipeline<br>applications. Parallel and Distributed Processing Symposium (IPDPS), Proceedings of the International<br>Conference on, 2008, , . | 1.0 | Ο         |
| 248 | Magnetoresistive biochip-based portable platforms for biomolecular recognition detection. New<br>Biotechnology, 2009, 25, S358-S359.                                                                                              | 4.4 | 0         |
| 249 | Distributed Software Platform for Automation and Control of General Anaesthesia. , 2009, , .                                                                                                                                      |     | 0         |
| 250 | A quantitative analysis of firing rate estimators: Unveiling bias sources. Neurocomputing, 2010, 73, 2944-2954.                                                                                                                   | 5.9 | 0         |
| 251 | Unifying stream based and reconfigurable computing to design application accelerators. , 2010, , .                                                                                                                                |     | 0         |
| 252 | Programming Cell/BE and GPUs systems for real-time video encoding. Proceedings of SPIE, 2010, , .                                                                                                                                 | 0.8 | 0         |

| ONS |
|-----|
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |
|     |

| #   | Article                                                                                                                                                               | IF  | CITATIONS |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 271 | Software Emulation of Quantum Resistant Trusted Platform Modules. , 2020, , .                                                                                         |     | 0         |
| 272 | Massive Data Classification of Neural Responses. Advances in Medical Technologies and Clinical Practice Book Series, 0, , 278-298.                                    | 0.3 | 0         |
| 273 | Modeling and evaluation of dispatching policies in IaaS cloud data centers using SANs. Sustainable Computing: Informatics and Systems, 2022, 33, 100617.              | 2.2 | Ο         |
| 274 | Editorial on the Special Section on Algorithms, Circuits, and Systems for Signal Processing at the Edge. IEEE Open Journal of Circuits and Systems, 2021, 2, 766-768. | 1.9 | 0         |