Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection

Yin, Biao; Dridi, Mahjoub; Moudni, Abdellah El

doi:10.1007/s00521-017-3066-9

Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection

Original Article
Published: 21 June 2017

Volume 31, pages 1013–1028, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

519 Accesses
Explore all metrics

Abstract

This paper presents a new method to solve the scheduling problem of adaptive traffic signal control at intersection. The method involves recursive least-squares temporal difference (RLS-TD(λ)) learning that is integrated into approximate dynamic programming. The learning mechanism of RLS-TD(λ) is to make an adaptation of linear function approximation by updating its parameters based on environmental feedback. This study investigates the method implementation after modeling a traffic dynamic system at intersection in discrete time. In the model, different traffic control schemes regarding signal phase sequence are considered, especially the defined adaptive phase sequence (APS). By simulating traffic scenarios, RLS-TD(λ) is superior to TD(λ) for updating functional parameters in the approximation, and APS outperforms other conventional control schemes on reducing traffic delay. By comparing with other traffic signal control algorithms, the proposed algorithm yields satisfying results in terms of traffic delay and computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traffic Signal Control for Urban Intersection Under Connected Vehicle Data Environment

Energy-efficient receding horizon trajectory planning of high-speed trains using real-time traffic information

Article 01 May 2020

A Modified Webster Model for a Useful Traffic Signal Timing Plan

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Khan SG, Herrmann G, Lewis FL, Pipe T, Melhuish C (2012) Reinforcement learning and optimal adaptive control: an overview and implementation examples. Annu Rev Control 36(1):42–59
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
MATH Google Scholar
Xu X, Zuo L, Huang Z (2014) Reinforcement learning algorithms with function approximation: recent advances and applications. Inform Sci 261:1–31
Article MathSciNet MATH Google Scholar
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New York
Book MATH Google Scholar
Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell M 4(2):39–47
Article Google Scholar
Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. Handbook of intelligent control: neural, fuzzy, and adaptive approaches 15:493–525
Google Scholar
Cai C, Wong CK, Heydecker BG (2009) Adaptive traffic signal control using approximate dynamic programming. Transport Res Part C Emerg Technol 17(5):456–474
Article Google Scholar
Haijema R, van der Wal J (2008) An MDP decomposition approach for traffic control at isolated signalized intersections. Proba Eng Inform Sci 22(4):587–602
Article MathSciNet MATH Google Scholar
Yu XH, Recker WW (2006) Stochastic adaptive control model for traffic signal systems. Transp Res Part C Emerg Technol 14(4):263–282
Article Google Scholar
Baird L, Moore AW (1999) Gradient descent for general reinforcement learning. In: Advances in neural information processing systems, pp 968–974
Tsitsiklis JN, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans Automat Contr 42(5):674–690
Article MathSciNet MATH Google Scholar
Xu X, He H, Hu D (2002) Efficient reinforcement learning using recursive least-squares methods. J Artif Intell Res 16(1):259–292
Article MathSciNet MATH Google Scholar
Ormoneit D, Sen Ś (2002) Kernel-based reinforcement learning. Mach Learn 49(2–3):161–178
Article MATH Google Scholar
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1–3):33–57
MATH Google Scholar
Boyan JA (2002) Technical update: least-squares temporal difference learning. Mach Learn 49(2–3):233–246
Article MATH Google Scholar
Hunt PB, Robertson DI, Bretherton RD, Winton RI (1981) SCOOT–a traffic responsive method of coordinating signals. Transport and Road Research Laboratory, Crowthorne, Technique Report
Lowrie PR (1982) The Sydney coordinated adaptive traffic system-principles, methodology, algorithms. In: Proceddings of international conference on road traffic signalling
Mladenovic MN, Stevanovic A, Kosonen I, Glavic D (2015) Adaptive traffic control systems: guidelines for development of functional requirements. mobil.TUM. Munich, Germany
Gartner NH, Pooran FJ, Andrews CM (2001) Implementation of the OPAC adaptive control strategy in a traffic signal network. In: Proceedings of IEEE conference intelligent transportation systems, pp 195–200
Henry J, Farges J, Tuffal J (1984) The PRODYN real time traffic algorithm. IFACIFIP-IFORS conference on control in transportation system. http://trid.trb.org/view.aspx?id=339694
Mirchandani P, Head L (2001) A real-time traffic signal control system: architecture, algorithms, and analysis. Transp Res Part C Emerg Technol 9(6):415–432
Article Google Scholar
Heung TH, Ho TK, Fung YF (2005) Coordinated road-junction traffic control by dynamic programming. IEEE Trans Intell Transp 6(3):341–350
Article Google Scholar
Wu J, Abbas-Turki A, El Moudni A (2009) Discrete methods for urban intersection traffic controlling. In Proceedings of IEEE vehicular technology conference, pp 1–5
Park B, Chang M (2002) Realizing benefits of adaptive signal control at an isolated intersection. Transport Res Rec 1811:115–121
Article Google Scholar
Abdulhai B, Pringle R, Karakoulas GJ (2003) Reinforcement learning for true adaptive traffic signal control. J Transp Eng-ASCE 129(3):278–285
Article Google Scholar
Lee J, Abdulhai B, Shalaby A, Chung EH (2005) Real-time optimization for adaptive traffic signal control using genetic algorithms. J Intell Transport S 9(3):111–122
Article MATH Google Scholar
Kergaye C, Stevanovic A, Martin PT (2010) Comparative evaluation of adaptive traffic control system assessments through field and microsimulation. J Intell Transport S 14(2):109–124
Article Google Scholar
Li L, Lv Y, Wang FY (2016) Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin 3(3):247–254
Article MathSciNet Google Scholar
Araghi S, Khosravi A, Creighton D (2015) A review on computational intelligence methods for controlling traffic signal timing. Expert Syst Appl 42(3):1538–1550
Article Google Scholar
García-Nieto J, Alba E, Carolina Olivera A (2012) Swarm intelligence for traffic light scheduling: application to real urban areas. Eng Appl Artif Intell 25(2):274–283
Article Google Scholar
Srinivasan D, Choy MC, Cheu RL (2006) Neural networks for real-time traffic signal control. IEEE Trans Intell Transp 7(3):261–272
Article Google Scholar
Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intell Transp Syst 4(2):128–135
Article Google Scholar
Bazzan ALC (2009) Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Auton Agent Multi-Agent Syst 18(3):342–375
Article Google Scholar
Box S, Waterson B (2013) An automated signalized junction controller that learns strategies by temporal difference reinforcement learning. Eng Appl Artif Intell 26(1):652–659
Article Google Scholar
Prashanth LA, Bhatnagar S (2011) Reinforcement learning with function approximation for traffic signal control. IEEE Trans Intell Transp 12(2):412–421
Article Google Scholar
El-Tantawy S, Abdulhai B, Abdelgawad H (2013) Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto. IEEE Trans Intell Transp 14(3):1140–1150
Article Google Scholar
Li T, Zhao D, Yi J (2008) Adaptive dynamic programming for multi-intersections traffic signal intelligent control. In: Proceedings of IEEE conference intelligent transportation systems, pp 286–291
Zhao D, Hu Z, Xia Z, Alippi C, Zhu Y, Wang D (2014) Full-range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing 125:57–67
Article Google Scholar
Huang YS, Weng YS, Zhou MC (2014) Modular design of urban traffic-light control systems based on synchronized timed Petri nets. IEEE Trans Intell Transp 15(2):530–539
Article Google Scholar
El-Tantawy S, Abdulhai B, Abdelgawad H (2014) Design of reinforcement learning parameters for seamless application of adaptive traffic signal control. J Intell Transp Syst 18(3):227–245
Article Google Scholar
Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C 38(2):156–172
Article Google Scholar
Bertsekas DP (1995) Dynamic programming and optimal control vol. 1 No 2. Athena Scientific, Belmont
Gartner NH, Tarnoff PJ, Andrews CM (1991) Evaluation of optimized policies for adaptive control strategy. Transp Res Rec 1324:105–114
Google Scholar
Yin B, Dridi M, El Moudni A (2015) Forward search algorithm based on dynamic programming for real-time adaptive traffic signal control. IET Intell Transp Syst 9(7):754–764
Article Google Scholar
Khamis MA, Gomaa W (2012) Enhanced multiagent multi-objective reinforcement learning for urban traffic light control. In: Proceedings of IEEE conference machine learning and applications, pp 586–591
Khamis MA, Gomaa W (2014) Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework. Eng Appl Artif Intell 29:134–151
Article Google Scholar
Söderström T, Stoica P (2002) Instrumental variable methods for system identification. Circ Syst Signal Process 21(1):1–9
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LVMT-City Mobility Transport Laboratory, École des Ponts ParisTech, IFSTTAR, UPEM, 77455, Champs-sur-Marne, France
Biao Yin
NIT-O2S, Université de technologie de Belfort-Montbéliard, 90000, Belfort, France
Mahjoub Dridi & Abdellah El Moudni

Authors

Biao Yin
View author publications
You can also search for this author inPubMed Google Scholar
Mahjoub Dridi
View author publications
You can also search for this author inPubMed Google Scholar
Abdellah El Moudni
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Biao Yin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix: Derivation of RLS-TD(λ) in M-step planning

In M-step planning, the objective function Eq. (11) is modified as:

$$O(\theta_{t} ) = \frac{1}{t}\sum\limits_{i = 1}^{t} {\left( {\sum\limits_{k = i}^{i + M - 1} {\gamma^{k - i} } r_{k} - (\phi_{i} - \gamma^{M} \phi_{i + M} )^{\text{T}} \theta_{t} } \right)^{2} } .$$

(32)

According to related theories [14, 47], we can rewrite Eq. (12) by using $\phi_{t}$ as the instrumental variable in LS-TD. That is,

$$\theta_{t} = \left( {\frac{1}{t}\sum\limits_{i = 1}^{t} {\phi_{i} (\phi_{i} - \gamma^{M} \phi_{i + M} )^{\text{T}} } } \right)^{ - 1} \left( {\frac{1}{t}\sum\limits_{i = 1}^{t} {\phi_{i} \sum\limits_{k = i}^{i + M - 1} {\gamma^{k - i} } r_{k} } } \right) .$$

(33)

In LS-TD(λ), $\theta_{t}$ can be estimated as

$$\begin{aligned} \theta_{t} = \left( {\frac{1}{t}\sum\limits_{i = 1}^{t} {z_{i} (\phi_{i} - \gamma^{M} \phi_{i + M} )^{\text{T}} } } \right)^{ - 1} \left( {\frac{1}{t}\sum\limits_{i = 1}^{t} {z_{i} \sum\limits_{k = i}^{i + M - 1} {\gamma^{k - i} } r_{k} } } \right) \hfill \\ \;\;\;\; \approx \left( {\sum\limits_{i = 1}^{t} {z_{i} (\phi_{i} - \gamma^{M} \phi_{i + M} )^{\text{T}} } } \right)^{ - 1} \left( {\sum\limits_{i = 1}^{t} {z_{i} \sum\limits_{k = i}^{i + M - 1} {\gamma^{k - i} } r_{k} } } \right) \hfill \\ \end{aligned}$$

(34)

where using the eligibility vector $z_{t}$ in Eq. (10) substitutes the variable $\phi_{t}$. According to matrix inverse lemma and RLS-TD(λ) [12], the parameter vector $\theta_{t}$ updated by RLS-TD(λ) in M-step planning (in Eqs. (27), (28), and (29)) can be guaranteed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, B., Dridi, M. & Moudni, A.E. Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection. Neural Comput & Applic 31 (Suppl 2), 1013–1028 (2019). https://doi.org/10.1007/s00521-017-3066-9

Download citation

Received: 21 December 2016
Accepted: 13 June 2017
Published: 21 June 2017
Issue Date: 13 February 2019
DOI: https://doi.org/10.1007/s00521-017-3066-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Traffic Signal Control for Urban Intersection Under Connected Vehicle Data Environment

Energy-efficient receding horizon trajectory planning of high-speed trains using real-time traffic information

A Modified Webster Model for a Useful Traffic Signal Timing Plan

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix: Derivation of RLS-TD(λ) in M-step planning

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Traffic Signal Control for Urban Intersection Under Connected Vehicle Data Environment

Energy-efficient receding horizon trajectory planning of high-speed trains using real-time traffic information

A Modified Webster Model for a Useful Traffic Signal Timing Plan

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix: Derivation of RLS-TD(λ) in M-step planning

Appendix: Derivation of RLS-TD(λ) in M-step planning

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.