Alnowaiser, Khaled Abdulrahman (2016) Garbage collection optimization for non uniform memory access architectures. PhD thesis, University of Glasgow.
Full text available as:| ![[thumbnail of 2016AlnowaiserPhD.pdf]](https://theses.gla.ac.uk/style/images/fileicons/other.png) | PDF Download (6MB) | 
Abstract
Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.
| Item Type: | Thesis (PhD) | 
|---|---|
| Qualification Level: | Doctoral | 
| Keywords: | Garbage collection, NUMA, locality. | 
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software | 
| Colleges/Schools: | College of Science and Engineering > School of Computing Science | 
| Supervisor's Name: | Singer, Dr. Jeremy | 
| Date of Award: | 2016 | 
| Depositing User: | Mr. Khaled Alnowaiser | 
| Unique ID: | glathesis:2016-7495 | 
| Copyright: | Copyright of this thesis is held by the author. | 
| Date Deposited: | 28 Jul 2016 08:41 | 
| Last Modified: | 22 Aug 2016 10:59 | 
| URI: | https://theses.gla.ac.uk/id/eprint/7495 | 
Actions (login required)
|  | View Item | 
Downloads
Downloads per month over past year
 
         
             Tools
 Tools Tools
 Tools