Research: Difference between revisions

From VLSILab
Jump to navigationJump to search
 
(55 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Drexel VANDAL consists of a research group of computer engineers and electrical engineers tackling big engineering problems of building sophisticated systems. Some of the projects are in:
* Exa-scale computing systems
* Smart energy/Smart home systems
* IoT processor design
* Bio-Implantable systems
* 5G communication systems
* Algorithms and software for IoT hardware and software design, including machine learning
Some project descriptions are as follows.
== Resonant Clocking Technologies ==
== Resonant Clocking Technologies ==


Line 5: Line 17:
With improved nanoscale design characterization and design automation methodologies, resonant clocking technologies can be seamlessly integrated within the mainstream VLSI IC design flow.  The broader impacts of this project are in revolutionizing the clock synchronization methodology of digital VLSI synchronous circuits for low-power, multi-GHz operation and providing its sustainability over semiconductor technology scaling.  Proposed low-power, multi-GHz high-performance clocking operation will have a major impact on all microelectronic systems, from field-deployable low power sensors to the world's fastest supercomputers.
With improved nanoscale design characterization and design automation methodologies, resonant clocking technologies can be seamlessly integrated within the mainstream VLSI IC design flow.  The broader impacts of this project are in revolutionizing the clock synchronization methodology of digital VLSI synchronous circuits for low-power, multi-GHz operation and providing its sustainability over semiconductor technology scaling.  Proposed low-power, multi-GHz high-performance clocking operation will have a major impact on all microelectronic systems, from field-deployable low power sensors to the world's fastest supercomputers.


Ph.D. Student(s): [[Ying Teng]], [[Vinayak Honkote]] (graduated), [[Ankit More]] (graduated), [[Jianchao Lu]] (graduated)
Ph.D. Student(s): [[Ragh Kuttappa]], [[Ying Teng]] (graduated), [[Vinayak Honkote]] (graduated), [[Ankit More]] (graduated), [[Jianchao Lu]] (graduated)


Sponsor(s): National Science Foundation, ACM SIGDA, Mosis
Sponsor(s): National Science Foundation (CCF-0845270), ACM SIGDA, Mosis


== CMP-NoC Co-design ==
== CMP-NoC Co-design ==
Line 15: Line 27:
In this work we present a platform independent, dependency tracked event-based NoC evaluation methodology. Since the events track dependencies between multiple threads, the presented methodology is capable of replaying messages across the network in the correct order which ensures accuracy, while it does not require simulating the functionality of a microprocessor, like full system simulators do. In addition, the presented framework can be scaled easily to evaluate future NoCs for massive multi-core CMPs comprising of hundreds of nodes.  The methodology is used to explore the design space in the CMP-NoC co-design process.
In this work we present a platform independent, dependency tracked event-based NoC evaluation methodology. Since the events track dependencies between multiple threads, the presented methodology is capable of replaying messages across the network in the correct order which ensures accuracy, while it does not require simulating the functionality of a microprocessor, like full system simulators do. In addition, the presented framework can be scaled easily to evaluate future NoCs for massive multi-core CMPs comprising of hundreds of nodes.  The methodology is used to explore the design space in the CMP-NoC co-design process.


Ph.D. Student(s): [[Ankit More]] (graduated)
 
Check our Software Release: [[SynchroTrace]]
 
 
 
Ph.D. Student(s): Sief Atari, [[Karthik Sangaiah]] (graduated), [[Michael Lui]], [[Vasil Pano]] (graduated), [[Ankit More]] (graduated)


Sponsor(s): None
Sponsor(s): None
Collaborator(s): Mark Hempstead, Tufts (Computer Architecture)


== Design and Automation of Low Swing Clocking ==
== Design and Automation of Low Swing Clocking ==
Line 23: Line 42:
Operating the clock network with low swing is one of the techniques that is explored in order to reduce the power consumption attributed to the clock network of an high-performance architecture. Low-swing operation can be adopted at varying levels of a clock tree with different implications. However, low-swing applicability remains limited in practice due to a number of factors including (i) degradation in the skew performance, (ii) degradation in expected power reduction, (iii) degradation in data timing due to slew degradation, (iv) necessitating level shifters of varying sizes, (v) necessitating low-swing FF designs. Furthermore, the automation of low swing clocking has not been addressed.  In this research, the effectiveness of exploiting fully/partially low swing clock trees, the design of custom cell blocks needed for low swing operation and the optimal low swing voltage level determination is studied. The design flow is also targeted to be automated in order to address the different performance, architecture and physical constraints.
Operating the clock network with low swing is one of the techniques that is explored in order to reduce the power consumption attributed to the clock network of an high-performance architecture. Low-swing operation can be adopted at varying levels of a clock tree with different implications. However, low-swing applicability remains limited in practice due to a number of factors including (i) degradation in the skew performance, (ii) degradation in expected power reduction, (iii) degradation in data timing due to slew degradation, (iv) necessitating level shifters of varying sizes, (v) necessitating low-swing FF designs. Furthermore, the automation of low swing clocking has not been addressed.  In this research, the effectiveness of exploiting fully/partially low swing clock trees, the design of custom cell blocks needed for low swing operation and the optimal low swing voltage level determination is studied. The design flow is also targeted to be automated in order to address the different performance, architecture and physical constraints.


Ph.D. Student(s): [[Can Sitik]]
Ph.D. Student(s): [[Scott Lerner]], [[Leo Filippini]] (graduated), [[Can Sitik]] (graduated)


Sponsor(s): Semiconductor Research Corporation
Sponsor(s): Semiconductor Research Corporation
Collaborator(s): Emre Salman, SUNY-Stony Brook (Circuits and Systems)


== Wireless On-Chip Interconnects ==
== Wireless On-Chip Interconnects ==
Line 35: Line 56:
Also see our article titled [[WirelessInterconnect|"What is Wireless Interconnect?"]] in the February 2012 edition of the ACM SIGDA newsletter to 3000+ recipients.
Also see our article titled [[WirelessInterconnect|"What is Wireless Interconnect?"]] in the February 2012 edition of the ACM SIGDA newsletter to 3000+ recipients.


Ph.D. Student(s): [[Leo Flippini]], [[Ankit More]] (graduated)
Ph.D. Student(s): Yilmaz Gonul, Ceyhun Kayan, Sief Atari (quit), [[Vasil Pano]] (graduated), [[Ankit More]] (graduated)
 
Sponsor(s): National Science Foundation (1232164, [[2008629]]), Mosis


Sponsor(s): National Science Foundation
Collaborator(s): Kapil Dandekar (Wireless Communication Systems, Co-PI)


== Clock Tree/Mesh Synthesis ==
== Clock Tree/Mesh Synthesis ==
Line 45: Line 68:
In the traditional integrated circuit design flow, the placement and clock network synthesis stages are performed sequentially. It is desirable to combine the placement and clock network synthesis stages to provide a better physical design. In this project, the integration of placement and clock network synthesis is investigated for the purpose of reducing clock power dissipation. Moreover, various types of novel clock distribution architectures are studied.
In the traditional integrated circuit design flow, the placement and clock network synthesis stages are performed sequentially. It is desirable to combine the placement and clock network synthesis stages to provide a better physical design. In this project, the integration of placement and clock network synthesis is investigated for the purpose of reducing clock power dissipation. Moreover, various types of novel clock distribution architectures are studied.


Ph.D. Student(s): [[Can Sitik]], [[Jianchao Lu]] (graduated)
Ph.D. Student(s): [[Scott Lerner]], [[Can Sitik]] (graduated), [[Jianchao Lu]] (graduated)
 
Sponsor(s): None
 
== Ultra Low-Power Adiabatic Circuit Design  ==
 
Adiabatic switching provides the preservation of energy by circulating the switching energy back into the circuit. The recirculation of energy has significantly limited the frequency of operation.  The frequency of operation is dictated by a synchronizing clock signal called the power-clock, which also acts as the power source for the adiabatic logic.  Some adiabatic logic families, however, require multiple phases of the power-clock for pipelined operation (alternatively, logic pipelining can be sacrificed).  Also impacting the adaptation of adiabatic logic is the recovery path resistance and its impact on the Q of the LC resonator impeding the quality of synchronization and the power recovery.  Consequently, adiabatic circuit families have faced difficulties in being adapted in IC design due to:
1. The low switching frequency of the power-clock signals,
2. The difficulty in logic pipelining, primarily due to the power dissipation required to provide the complex clocking schemes with multiple phases.
 
In this project, novel synchronous circuit implementation methodologies of adiabatic logic design are explored. This methodology enables unprecedented low power operation through charge recovery on the logic and the power-clock network.  Ultimately, this research will resolve the well-known shortcomings of adiabatic logic, such as the operating frequency, and help improve the energy efficiency and applicability of adiabatic logic families.
 
Ph.D. Student(s): Yilmaz Gonul, [[Leo Filippini]] (graduated)
 
Sponsor(s): None
 
Collaborator(s): Emre Salman SUNY-Stony Brook, Diane Lim (Penn School of Medicine), Lunal Khuon - Drexel Engineering Technology (RF, analog, and biomedical ICs).
 
== Energy Efficient Computing  with OptoElectronics==
 
In order to achieve energy efficient computing for systems ranging from datacenters down to mobile electronics, novel devices, techniques, and methodologies are necessary to reduce the terawatts of power consumed by computational devices.  We are proposing an effort to bring together researchers from all levels of the device to systems hierarchy (Devices -> Circuits -> Architecture -> Systems -> Data Center) in a vertically integrated approach addressing the (energy) challenges of future computing devices.  Our vision is to build upon novel optoelectronic devices capable of computing a bit while consuming attojoules (10E-18 J) of energy, and progress to energy efficient techniques and methodologies for data centers that consume terawatts of power from the electrical grid. Energy efficient innovations at the circuits, systems/interconnect, architecture, and server/mobile/datacenter platform level have the potential to significantly reduce overall power consumption and address this grand challenge in energy needs. Our team is to leverage the energy efficiency of novel optoelectronic elements, and focus research efforts on reducing the total power consumption of electronic devices through energy efficient techniques and methodologies for IC chips, devices, and ultimately data centers that consume terawatts of power from the electrical grid. 
 
Ph.D. Student(s): [[Ragh Kuttappa]] (graduated)


Sponsor(s): None
Sponsor(s): None
Collaborator(s): Bahram Nabet (Photonics), Ioannis Savidis (Circuits and Systems), Naga Kandasamy (HPC), Lunal Khuon - Drexel Engineering Technology (RF, analog, and biomedical ICs).


= Previous Projects =  
= Previous Projects =  
== GPU System Co-design ==
Similar to CMP-NoC Design challenges, the co-design of hardware and software on GPU systems is explored.  Platform independent dependencies of threads are analyzed on GPUs, leading to the analysis of software and hardware co-design principles.
Ph.D. Student(s): [[Michael Lui]], [[Karthik Sangaiah]] (graduated)
Sponsor(s): Samsung GRO
Collaborator(s): Mark Hempstead (Computer Architecture)


== Clock Skew Scheduling ==
== Clock Skew Scheduling ==

Latest revision as of 08:37, 15 October 2024

Drexel VANDAL consists of a research group of computer engineers and electrical engineers tackling big engineering problems of building sophisticated systems. Some of the projects are in:

  • Exa-scale computing systems
  • Smart energy/Smart home systems
  • IoT processor design
  • Bio-Implantable systems
  • 5G communication systems
  • Algorithms and software for IoT hardware and software design, including machine learning


Some project descriptions are as follows.

Resonant Clocking Technologies

Achieving high quality synchronization with low power dissipation is a major objective in synchronous VLSI circuit design at high frequency regimes. In order to meet this objective, conventional clock design methodologies are constantly being improved. Also, next-generation alternatives to conventional clocking have been emerging. Resonant clocking technologies provide operating frequencies and power dissipation levels that are unprecedented in the state-of-the-art, bulk-CMOS VLSI IC implementations. These technologies must be characterized for on chip variations, have robust simulation models and be supported by specific design flows in order to be viable in high volume production. This project addresses such challenges in the design and design automation of resonant clocking technologies for high-volume IC production.

With improved nanoscale design characterization and design automation methodologies, resonant clocking technologies can be seamlessly integrated within the mainstream VLSI IC design flow. The broader impacts of this project are in revolutionizing the clock synchronization methodology of digital VLSI synchronous circuits for low-power, multi-GHz operation and providing its sustainability over semiconductor technology scaling. Proposed low-power, multi-GHz high-performance clocking operation will have a major impact on all microelectronic systems, from field-deployable low power sensors to the world's fastest supercomputers.

Ph.D. Student(s): Ragh Kuttappa, Ying Teng (graduated), Vinayak Honkote (graduated), Ankit More (graduated), Jianchao Lu (graduated)

Sponsor(s): National Science Foundation (CCF-0845270), ACM SIGDA, Mosis

CMP-NoC Co-design

The advent of multi-core architectures has increased the popularity of chip multi-processors (CMP) and the use of networks-on-chip (NoCs) as a fabric interconnecting cores in high performance computers. Traditionally, the evaluation of the NoCs design space has been carried out with traces, and a less used alternative being full system simulations. Traces do not capture the message dependencies in real applications which makes replaying a trace less accurate than a full system simulation. While full system simulations provide high accuracy, they are hindered by extremely long run times and limitations in the number of cores. Previous attempts at generating traces with message dependencies involve the generation of traces through full-system simulations which are platform dependent and extremely difficult especially for massive multi-core systems (i.e hundreds of cores).

In this work we present a platform independent, dependency tracked event-based NoC evaluation methodology. Since the events track dependencies between multiple threads, the presented methodology is capable of replaying messages across the network in the correct order which ensures accuracy, while it does not require simulating the functionality of a microprocessor, like full system simulators do. In addition, the presented framework can be scaled easily to evaluate future NoCs for massive multi-core CMPs comprising of hundreds of nodes. The methodology is used to explore the design space in the CMP-NoC co-design process.


Check our Software Release: SynchroTrace


Ph.D. Student(s): Sief Atari, Karthik Sangaiah (graduated), Michael Lui, Vasil Pano (graduated), Ankit More (graduated)

Sponsor(s): None

Collaborator(s): Mark Hempstead, Tufts (Computer Architecture)

Design and Automation of Low Swing Clocking

Operating the clock network with low swing is one of the techniques that is explored in order to reduce the power consumption attributed to the clock network of an high-performance architecture. Low-swing operation can be adopted at varying levels of a clock tree with different implications. However, low-swing applicability remains limited in practice due to a number of factors including (i) degradation in the skew performance, (ii) degradation in expected power reduction, (iii) degradation in data timing due to slew degradation, (iv) necessitating level shifters of varying sizes, (v) necessitating low-swing FF designs. Furthermore, the automation of low swing clocking has not been addressed. In this research, the effectiveness of exploiting fully/partially low swing clock trees, the design of custom cell blocks needed for low swing operation and the optimal low swing voltage level determination is studied. The design flow is also targeted to be automated in order to address the different performance, architecture and physical constraints.

Ph.D. Student(s): Scott Lerner, Leo Filippini (graduated), Can Sitik (graduated)

Sponsor(s): Semiconductor Research Corporation

Collaborator(s): Emre Salman, SUNY-Stony Brook (Circuits and Systems)

Wireless On-Chip Interconnects

Increasing functionality and complexity in design of integrated circuits (ICs) requires careful planning for on-chip resources such as area and power. Critical design decisions are often given based on the availability of these resources within increasingly stringent design budgets. Among these typical IC design budgets, wire interconnects are one of the most expensive items. Significantly impacting the timing, power and area resources, wire interconnects constitute the complex infrastructure to establish communication and synchronization within a conventional, state-of-the-art IC.

In this project, wireless communication principles are investigated in order to replace the resource-demanding, conventional, wire-based interconnect networks within integrated circuits. By implementing one or many transmitter and receiver antennas on the same chip, wireless communication principles will be used to communicate between distant components within a chip. The proposed on-chip wireless communication implementations bear a constant overhead in area and power budgets in order to implement the antennas and surrounding circuitry. However, the increasing size and complexity of conventional wire interconnects (particularly for heavy-duty global interconnects such as clock and power lines) are mitigated, solving one of the major problems in state-of-the-art IC design process. Wireless communication will provide a solution that is highly scalable into the future for the IC communication challenge, as increases in technology scaling and die size dimensions are forecast by the semiconductor industry.

Also see our article titled "What is Wireless Interconnect?" in the February 2012 edition of the ACM SIGDA newsletter to 3000+ recipients.

Ph.D. Student(s): Yilmaz Gonul, Ceyhun Kayan, Sief Atari (quit), Vasil Pano (graduated), Ankit More (graduated)

Sponsor(s): National Science Foundation (1232164, 2008629), Mosis

Collaborator(s): Kapil Dandekar (Wireless Communication Systems, Co-PI)

Clock Tree/Mesh Synthesis

In this research, the utilization of computing power to improve an essential step of integrated circuit (IC) physical design flow, clock network design, is investigated. Clock network design entails a series of computationally intensive, large-scale design and optimization tasks. Automation for conventional, zero skew, buffered clock trees is common. However, high performance clock tree design remains a tedious task with increasing requirements for higher speed through skew scheduling, variation-awareness and constrained power budgets. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research.

In the traditional integrated circuit design flow, the placement and clock network synthesis stages are performed sequentially. It is desirable to combine the placement and clock network synthesis stages to provide a better physical design. In this project, the integration of placement and clock network synthesis is investigated for the purpose of reducing clock power dissipation. Moreover, various types of novel clock distribution architectures are studied.

Ph.D. Student(s): Scott Lerner, Can Sitik (graduated), Jianchao Lu (graduated)

Sponsor(s): None

Ultra Low-Power Adiabatic Circuit Design

Adiabatic switching provides the preservation of energy by circulating the switching energy back into the circuit. The recirculation of energy has significantly limited the frequency of operation. The frequency of operation is dictated by a synchronizing clock signal called the power-clock, which also acts as the power source for the adiabatic logic. Some adiabatic logic families, however, require multiple phases of the power-clock for pipelined operation (alternatively, logic pipelining can be sacrificed). Also impacting the adaptation of adiabatic logic is the recovery path resistance and its impact on the Q of the LC resonator impeding the quality of synchronization and the power recovery. Consequently, adiabatic circuit families have faced difficulties in being adapted in IC design due to: 1. The low switching frequency of the power-clock signals, 2. The difficulty in logic pipelining, primarily due to the power dissipation required to provide the complex clocking schemes with multiple phases.

In this project, novel synchronous circuit implementation methodologies of adiabatic logic design are explored. This methodology enables unprecedented low power operation through charge recovery on the logic and the power-clock network. Ultimately, this research will resolve the well-known shortcomings of adiabatic logic, such as the operating frequency, and help improve the energy efficiency and applicability of adiabatic logic families.

Ph.D. Student(s): Yilmaz Gonul, Leo Filippini (graduated)

Sponsor(s): None

Collaborator(s): Emre Salman SUNY-Stony Brook, Diane Lim (Penn School of Medicine), Lunal Khuon - Drexel Engineering Technology (RF, analog, and biomedical ICs).

Energy Efficient Computing with OptoElectronics

In order to achieve energy efficient computing for systems ranging from datacenters down to mobile electronics, novel devices, techniques, and methodologies are necessary to reduce the terawatts of power consumed by computational devices. We are proposing an effort to bring together researchers from all levels of the device to systems hierarchy (Devices -> Circuits -> Architecture -> Systems -> Data Center) in a vertically integrated approach addressing the (energy) challenges of future computing devices. Our vision is to build upon novel optoelectronic devices capable of computing a bit while consuming attojoules (10E-18 J) of energy, and progress to energy efficient techniques and methodologies for data centers that consume terawatts of power from the electrical grid. Energy efficient innovations at the circuits, systems/interconnect, architecture, and server/mobile/datacenter platform level have the potential to significantly reduce overall power consumption and address this grand challenge in energy needs. Our team is to leverage the energy efficiency of novel optoelectronic elements, and focus research efforts on reducing the total power consumption of electronic devices through energy efficient techniques and methodologies for IC chips, devices, and ultimately data centers that consume terawatts of power from the electrical grid.

Ph.D. Student(s): Ragh Kuttappa (graduated)

Sponsor(s): None

Collaborator(s): Bahram Nabet (Photonics), Ioannis Savidis (Circuits and Systems), Naga Kandasamy (HPC), Lunal Khuon - Drexel Engineering Technology (RF, analog, and biomedical ICs).

Previous Projects

GPU System Co-design

Similar to CMP-NoC Design challenges, the co-design of hardware and software on GPU systems is explored. Platform independent dependencies of threads are analyzed on GPUs, leading to the analysis of software and hardware co-design principles.

Ph.D. Student(s): Michael Lui, Karthik Sangaiah (graduated)

Sponsor(s): Samsung GRO

Collaborator(s): Mark Hempstead (Computer Architecture)

Clock Skew Scheduling

Integrated circuits design at the sub-micron levels, particularly in the transition to 60 and lower technologies, requires paradigm shifts. In order to achieve high-performance, robust and high-yield production, design and manufacturing techniques are being investigated more carefully. A successful design at a sub-60nm technology can be achieved through employing a combination of design principles. Investigation and improvement of each design principle is important and a contributing factor to prolonging the success of Moore's Law in CMOS based IC design.

In this research, an additional design principle---clock skew scheduling---to aid the design of deep sub-micron IC design is investigated. The performance enhancing effects of clock skew scheduling has been known for over 20 years. Designers employ ad hoc tricks to delay clock signals on timing violated paths to satisfy design budgets. Due to the scalability of the conventional application techniques, however, clock skew scheduling typically cannot be used to its full advantage. The common advantages of skew scheduling are known to be fixing timing violations and improving operating frequencies of circuits. In deep sub-micron design era, skew scheduling can effectively be used to improve timing yield and enable low power design alternatives as well. Provided that the increasing computing power of multi-core systems can be applied to remedy the scalability problem and by reformulating the objectives, clock skew scheduling can be used as an additional design principle to enable high-yield IC design at 45nm and lower technologies.

Ph.D. Student(s): Jianchao Lu (graduated)

Quantum-Dot Cellular Automata (QCA) based Nanoarchitectures

It is expected that the physical barrier in the nanoscale implementation of CMOS devices will soon be reached. The development of next generation computation systems will stem from the exploration of nanoscale materials and biological systems. Properties and applications of several nanoscale technologies, such as Quantum-dot Cellular Automata (QCA) investigated in this work, are being explored intensively. Basic design methods and simulators have been developed to show the potential of QCA technology in meeting future computation needs. What is missing in the current agenda of QCA research are studies on layout optimization and system-level architecture design. The challenge in performing these studies is the necessity to address the high levels of pipelining, parallelism, and fault-tolerance required for high performance operation of QCA systems.

The objective of the proposed research is to investigate fault-tolerant QCA architectures using advanced clocking schemes for practical implementation of QCA-based nanocomputers. Towards this end, essential circuit components for such computers and system-level integration of these components will be investigated. In the project, the emphasis is on novel circuit architectures and clocking schemes to perform computations with this emerging technology. Manufacturing challenges will be addressed to capture the fault-tolerance properties for architecture design.

Ph.D. Student(s): None