Wu, Rui (2026) Adaptive distributed event-driven reinforcement learning for the dynamic flexible job shop scheduling problem. PhD thesis, University of Glasgow.
Full text available as:|
PDF
Download (5MB) |
Abstract
Manufacturing systems face growing complexity in the era of Industry 4.0, where competitiveness depends on real-time responsiveness, efficiency, and reliability. Disruptions such as random job arrivals, machine breakdowns, and sequence-dependent setups challenge traditional scheduling. We address the Dynamic Flexible Job Shop Scheduling Problem (DFJSP) by proposing a distributed, event-driven reinforcement learning (RL) framework that enables real-time, multi-objective decision-making. The adaptive policies improve throughput, reduce delays, and enhance system resilience, demonstrating reinforcement learning’s potential as a foundation for next-generation industrial scheduling.
This thesis investigates three increasingly complex scheduling scenarios, each reflecting key challenges faced in dynamic manufacturing environments. The first scenario addresses a baseline problem of decentralized scheduling. In this problem, multiple work centers operate independently. And each work center must select job and machine priority rules without global coordination. This setting captures the reality of distributed decision-making in modern production systems, where rapid local responses are crucial. A Double Deep QNetwork (DDQN)-based agent is employed, achieving the lowest mean tardiness and the highest win rate, outperforming Short Processing Time (SPT) with statistical significance.
The second scenario introduces machine breakdowns, representing the uncertainty and disruptions inherent to physical manufacturing systems. Such breakdowns not only increase decision complexity but also require adaptive scheduling strategies that can reallocate resources in real time. Here, a Proximal Policy Optimization (PPO)-based agent with feature-weighted prioritization is used, achieving superior performance compared to PPORS and SPT.
The third scenario expands the problem to a multi-objective setting with sequence-dependent setup times, reflecting real-world trade-offs between performance metrics such as tardiness and changeover efficiency. A Universal Value Function Approximators (UVFA)-enhanced DDQN agent is applied to learn across different reward preferences, with the baseline-referenced reward (Set 4) achieving the best Pareto front.
SHapley Additive exPlanations (SHAP) analysis shows that the agent adaptively shifts attention across features based on the reward structure, prioritizing due-date and waitingtime features under tardiness objectives, and setup-related features when the goal is to minimize setup impact. The number of waiting jobs consistently remains one of the most influential indicators across all settings.
Overall, this thesis contributes a robust and generalizable RL-based scheduling architecture that effectively adapts to real-time disturbances and multi-objective trade-offs. By integrating distributed control, event-driven decision mechanisms, and interpretable learning, the proposed frameworks pave the way for scalable, intelligent scheduling systems for next-generation smart manufacturing.
| Item Type: | Thesis (PhD) |
|---|---|
| Qualification Level: | Doctoral |
| Subjects: | T Technology > T Technology (General) T Technology > TJ Mechanical engineering and machinery |
| Colleges/Schools: | College of Science and Engineering > School of Engineering |
| Supervisor's Name: | Yang, Professor Jin |
| Date of Award: | 2026 |
| Depositing User: | Theses Team |
| Unique ID: | glathesis:2026-85922 |
| Copyright: | Copyright of this thesis is held by the author. |
| Date Deposited: | 12 May 2026 14:28 |
| Last Modified: | 12 May 2026 14:28 |
| Thesis DOI: | 10.5525/gla.thesis.85922 |
| URI: | https://theses.gla.ac.uk/id/eprint/85922 |
| Related URLs: |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year

Tools
Tools