Al Khanjari, Sharifa (2017) Advanced management techniques for many-core communication systems. PhD thesis, University of Glasgow.
Full text available as:
PDF
Download (8MB) |
Abstract
The way computer processors are built is changing. Nowadays, computer processor performance is increased by adding more processing cores on a single chip instead of making processors larger and faster. The traditional approach is no longer viable, due to limits in transistor scaling. Both industry and academia agree that scaling the number of processing cores to hundreds or thousands on a single chip is the only way to scale computer processor performance from now on. Consequently, the performance of these future many-core systems with thousands of cores will heavily depend on the Network-on-Chip (NoC) architecture to provide scalable communication. Therefore, as the number of cores increases the locality will only become more important. Communication locality is essential to reduce latency and increase performance. Many-core systems should be designed such that cores communicate mainly to the neighbouring cores, in order to minimise the communication cost.
We investigate the network performance of different topologies using the ITRS physical data for the year 2023. For this reason, we propose abstract synthetic traffic generation models to explore the locality behaviour in many-core NoC systems. Using the synthetic traffic models - group clustering model and ring clustering model - traffic distance metrics may be adjusted with locality parameters. We choose two many-core NoC architectures - distributed memory architecture and shared memory architecture - to examine whether enforcing locality on different architectures may have a diverse effect on the network performance of different topologies.
Distributed memory architecture uses the message passing method of communication to communicate between cores. Our results show that the degree of locality and the clustering model strongly affect the performance of the network. Scale-invariant topologies, such as the fat quadtree, perform worse than flat ones because the reduced hop count is outweighed by the longer wire delays.
In shared memory architecture, threads communicate with each other by storing data in shared cache lines. We design a hierarchical cache model that benefits from communication locality because many-core cache hierarchy that fails to exploit locality may end up having more cores delayed, thereby decreasing the network performance. Our results show that the locality model of thread placement and the distance of placing them significantly affect the NoC performance. Furthermore, they show that scale-invariant topologies perform better than flat topologies. Then, we demonstrate that implementing directory-based cache coherency has only a small overhead on the cache size. Using cache coherency protocol in our proposed hierarchical cache model, we show that network performance decreases only slightly. Hence, cache coherency scales, and it is possible to have shared memory architecture with thousands of cores.
Item Type: | Thesis (PhD) |
---|---|
Qualification Level: | Doctoral |
Keywords: | Many-core, network-on-chip, distributed-memory architecture , shared-memory architecture, quadtree. |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General) |
Colleges/Schools: | College of Science and Engineering > School of Computing Science |
Supervisor's Name: | Vanderbauwhede, Dr. Wim and Mackenzie, Dr. Lewis |
Date of Award: | 2017 |
Depositing User: | Dr. Sharifa Al Khanjari |
Unique ID: | glathesis:2017-7952 |
Copyright: | Copyright of this thesis is held by the author. |
Date Deposited: | 21 Feb 2017 11:32 |
Last Modified: | 12 Apr 2017 09:42 |
URI: | https://theses.gla.ac.uk/id/eprint/7952 |
Actions (login required)
View Item |
Downloads
Downloads per month over past year