From c922969d22fd02cc5fcb3e15c2162500f9fcbe7d Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Tue, 11 Mar 2025 01:23:32 -0400
Subject: [PATCH 01/12] Create key-concepts-in-rl

---
 .../reinforcement-learning/key-concepts-in-rl | 92 +++++++++++++++++++
 1 file changed, 92 insertions(+)
 create mode 100644 wiki/reinforcement-learning/key-concepts-in-rl

diff --git a/wiki/reinforcement-learning/key-concepts-in-rl b/wiki/reinforcement-learning/key-concepts-in-rl
new file mode 100644
index 00000000..83c1c787
--- /dev/null
+++ b/wiki/reinforcement-learning/key-concepts-in-rl
@@ -0,0 +1,92 @@
+---
+# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be
+overwritten except in special circumstances. 
+# You should set the date the article was last updated like this:
+date: 2025-03-11 # YYYY-MM-DD 
+# This will be displayed at the bottom of the article
+# You should set the article's title:
+title: Key Concepts of Reinforcement Learning
+# The 'title' is automatically displayed at the top of the page
+# and used in other parts of the site.
+---
+
+This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time.
+
+## Main Components of Reinforcement Learning
+
+### Agent and Environment
+The agent is the learner or decision-maker, while the environment represents everything the agent interacts with. The agent receives observations from the environment and takes actions that influence the environment's state.
+
+### States and Observations
+- A **state** (s) fully describes the world at a given moment.
+- An **observation** (o) is a partial view of the state.
+- Environments can be **fully observed** (complete information) or **partially observed** (limited information).
+
+### Action Spaces
+- The **action space** defines all possible actions an agent can take.
+- **Discrete action spaces** (e.g., Atari, Go) have a finite number of actions.
+- **Continuous action spaces** (e.g., robotics control) allow real-valued actions.
+
+## Policies
+A **policy** determines how an agent selects actions based on states:
+
+- **Deterministic policy**: Always selects the same action for a given state.
+  ```python
+  a_t = \mu(s_t)
+  ```
+- **Stochastic policy**: Samples actions from a probability distribution.
+  ```python
+  a_t \sim \pi(\cdot | s_t)
+  ```
+
+### Example: Deterministic Policy in PyTorch
+```python
+import torch.nn as nn
+
+pi_net = nn.Sequential(
+    nn.Linear(obs_dim, 64),
+    nn.Tanh(),
+    nn.Linear(64, 64),
+    nn.Tanh(),
+    nn.Linear(64, act_dim)
+)
+```
+
+## Trajectories
+A **trajectory (\tau)** is a sequence of states and actions:
+```math
+\tau = (s_0, a_0, s_1, a_1, ...)
+```
+State transitions follow deterministic or stochastic rules:
+```math
+s_{t+1} = f(s_t, a_t)
+```
+or
+```math
+s_{t+1} \sim P(\cdot|s_t, a_t)
+```
+
+## Reward and Return
+The **reward function (R)** determines the agent's objective:
+```math
+r_t = R(s_t, a_t, s_{t+1})
+```
+### Types of Return
+1. **Finite-horizon undiscounted return**:
+   ```math
+   R(\tau) = \sum_{t=0}^T r_t
+   ```
+2. **Infinite-horizon discounted return**:
+   ```math
+   R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t
+   ```
+   where \( \gamma \) (discount factor) balances immediate vs. future rewards.
+
+## Summary
+This tutorial introduced fundamental RL concepts, including agents, environments, policies, action spaces, trajectories, and rewards. These components are essential for designing RL algorithms.
+
+## Further Reading
+- Sutton, R. S., & Barto, A. G. (2018). *Reinforcement Learning: An Introduction*.
+
+## References
+- [Reinforcement Learning Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning)

From e4c7ec22575224b44769c4f162f4394ac6fb0e98 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Tue, 11 Mar 2025 01:24:38 -0400
Subject: [PATCH 02/12] Rename key-concepts-in-rl to key-concepts-in-rl.md

---
 .../{key-concepts-in-rl => key-concepts-in-rl.md}                 | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename wiki/reinforcement-learning/{key-concepts-in-rl => key-concepts-in-rl.md} (100%)

diff --git a/wiki/reinforcement-learning/key-concepts-in-rl b/wiki/reinforcement-learning/key-concepts-in-rl.md
similarity index 100%
rename from wiki/reinforcement-learning/key-concepts-in-rl
rename to wiki/reinforcement-learning/key-concepts-in-rl.md

From 5d9d56ed89e15863238a03954361ec9a24ade1d1 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Tue, 11 Mar 2025 01:35:08 -0400
Subject: [PATCH 03/12] Update key-concepts-in-rl.md

---
 .../key-concepts-in-rl.md                     | 21 +++++++------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/wiki/reinforcement-learning/key-concepts-in-rl.md b/wiki/reinforcement-learning/key-concepts-in-rl.md
index 83c1c787..6335929e 100644
--- a/wiki/reinforcement-learning/key-concepts-in-rl.md
+++ b/wiki/reinforcement-learning/key-concepts-in-rl.md
@@ -1,13 +1,6 @@
 ---
-# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be
-overwritten except in special circumstances. 
-# You should set the date the article was last updated like this:
-date: 2025-03-11 # YYYY-MM-DD 
-# This will be displayed at the bottom of the article
-# You should set the article's title:
+date: 2025-03-11 # YYYY-MM-DD
 title: Key Concepts of Reinforcement Learning
-# The 'title' is automatically displayed at the top of the page
-# and used in other parts of the site.
 ---
 
 This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time.
@@ -31,13 +24,13 @@ The agent is the learner or decision-maker, while the environment represents eve
 A **policy** determines how an agent selects actions based on states:
 
 - **Deterministic policy**: Always selects the same action for a given state.
-  ```python
-  a_t = \mu(s_t)
-  ```
+  
+  $a_t = \mu(s_t)$
+  
 - **Stochastic policy**: Samples actions from a probability distribution.
-  ```python
-  a_t \sim \pi(\cdot | s_t)
-  ```
+  
+  $a_t \sim \pi(\cdot | s_t)$
+  
 
 ### Example: Deterministic Policy in PyTorch
 ```python

From 231624c165c963c5f0a1a720488ec14c728f3575 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Tue, 11 Mar 2025 01:42:11 -0400
Subject: [PATCH 04/12] Added link of RL

---
 _data/navigation.yml | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 6600d2a4..6c7408d9 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -181,6 +181,11 @@ wiki:
         url: /wiki/machine-learning/mediapipe-live-ml-anywhere.md/
       - title: NLP for robotics
         url: /wiki/machine-learning/nlp_for_robotics.md/
+  - title: Reinforcement Learning
+    url: /wiki/reinforcemnet-learning
+    children:
+      - title: Key Concepts in Reinforcemnet Learning (RL)
+        url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
   - title: State Estimation
     url: /wiki/state-estimation/
     children:

From e8d62b2afb83cd9d93e3cf6e1c047a1d8519dbef Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sat, 26 Apr 2025 20:14:27 -0400
Subject: [PATCH 05/12] Added page of A Comprehensive Overview of Humanoid
 Robot Planning, Control, and Skill Learning in robotics-project-guide

---
 _data/navigation.yml                          |   2 +
 assets/images/Humanoid robot.drawio.png       | Bin 0 -> 33402 bytes
 assets/images/multi_contact_planning.png      | Bin 0 -> 14138 bytes
 wiki/robotics-project-guide/humanoid-robot.md | 185 ++++++++++++++++++
 4 files changed, 187 insertions(+)
 create mode 100644 assets/images/Humanoid robot.drawio.png
 create mode 100644 assets/images/multi_contact_planning.png
 create mode 100644 wiki/robotics-project-guide/humanoid-robot.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 6c7408d9..979b7834 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -29,6 +29,8 @@ wiki:
         url: /wiki/robotics-project-guide/test-and-debug/
       - title: Demo day!
         url: /wiki/robotics-project-guide/demo-day/
+      - title: A Comprehensive Overview of Humanoid Robot Planning, Control, and Skill Learning
+        url: /wiki/robotics-project-guide/humanoid-robot.md
   - title: System Design & Development
     url: /wiki/system-design-development/
     children:
diff --git a/assets/images/Humanoid robot.drawio.png b/assets/images/Humanoid robot.drawio.png
new file mode 100644
index 0000000000000000000000000000000000000000..a698662e28b9e4ab50c8fe0a6af3218bb8bbeee0
GIT binary patch
literal 33402
zcmeFZ2Ut_vwl5BdqM`zVq9R3*Cek}dZ%Xe-2}mGxi1ZSQ3Q7kNq!SdRMnDK1LQp_J
znhKE)N-qJF5^CVBAfDqX@80`=@4oln=e+%GzwOFgbB#61Z<IOaT!w3Fs+>K2<unly
z(b+q<6?KV-j=_kCh)qtO07mk%6pewuh~0Hn<cW&Aug((@v5I>r-Su$xv9Wi85U~m>
z{Jdfn;Io6ed$0;BvI+=T-oMXdZEtDoX6fwC;{x#jhJgE)wh))22^vs8dnYGLRsm&U
z9zNg}hmpODHPqW3xYX1K{>RS;3=7HwSHKMsp`VvNB0`+Nh}`}AP7osqSj`@wt1K)i
z$|EQO40EX6*4I#H6;J@Ko$MVUz+Wm5D@Q2d5d}Lps53C4%r6MMkMJKbsAp+o>1O{M
z2?CE^5I1*wsLM|=1Vn(T{QN(;cek>1g8Upm8UaJytRZechX8^Ctb$6cd~yI0!XH6}
zBLx6I?5%%Jy6@(3+g8(7M_)<XOIbnAO5eo_5SH*-7fXP>L)B>mTH-!@!SB?5(RB7v
zakIQ{rvbHwI027Y`y4SOCMx=qkhSm65n%#Z0%cpbUvEF+nlPYe@ApfTL&kw=p7z!d
z_akaQpY(u2ojmOC|Lvd^)Wrp2b!5dOik5C}1k)X|e>7wRbvlyyXy85|*5Aw^4CwuX
zN(rMz_SPPDKj(@G9En93zYVdswfpsEei8AXTsm9+ntUXQyPYL~q@U0KgH?X&@s~E;
zpiqF`Z$|t5_OD6KJ_-;gg0g?<@#sAOaQ?qP`xkAv{lyjjcgP7?^4VJ1DTwG9tBM%w
z>3V7@*g(1Yk1X%*;rk1)5Nkl$M?ax%9(GV$sEehO(qBjA-JqT>))0ct2!8k1IqFd8
zePECu7<7Pmc=#S6$I{aS3JlqKI6EEPh4|Qe+$B8DBMjUi{4)OSjv@i51j>XPiUf=j
ze*Cl`fjM9R`0-D)b2Q`*^>nj3n)T0xKEkY}n=KG-4ypaqd560Bk4^qp!>d4msNvz}
z3mD%G;$-Pz@AbEgS{{Mk_Am1Y9`|1X1~BIcSN}0Dgy{Nr@cXA?@eBTx|Br8T0QYXb
zzX65tlYo>%a1eg{6)K0n{(r=Z5J9K^v3>p}vGNltUY1UtM{vIF>1^o&1>&0kpDq*(
z#KgnQ@f()@8XX9Mz}wE=1EP1|^3Wl@fqVpv{T2y;y1+>R>ID6n83k=@AR<;)z!Y~6
zH>e}zw>#FN;$XhN#sMJpdO&_9w4W^8xBP3IJd6bY3)VQIC@6dsXbFG<s0#=k{r3KA
zy5T37li<R3zoi<Hqp?4bvmXLefbTa0{Xql#AHdm<lA*_cmKcNymj7RX*B?DG9Ln{t
zfR&$~_uo<s{8w!bEF1nUZGI@%zk<#Gh2q2ELge2sJ{-BqZ^ZzsfQS=NLCag)djXfW
zgiC!lOKW=%U|DQQSO;l4S-QB`yV(Ak2-LE_&HXj{+mNLTp|Vhb0`(jeAn=p?-#<l2
zyT2v0|DB4%+7cpW^G_=d5i2nW*and3XR!G{TXFF7|82z~DEhBkaU3nfe>2b@wBq=0
z%pyls)_+<#i2a*pkwdxu#wq`_`Coc0hpznZmk#)kx@v!XNB;wq4*%5v@w0>c?+2D6
z`G2b&)FGB`gwlW;Z0Qa(69s@mRM5u~$hSbx-VJJLWe2pF|2umNe~?TsEcS1c>5rC+
zzu8p76W9gdCRA^hK;QrO^6$U1WD|ynSqlL``G+Qvm;hK%MCAWc$;K`Cw}uoJIMT<z
zUX$o(4fwkuk2a08-5}QXR)@{(->UL|KV$xV{rxX&%%k%AH)9&vL7gDnU?{NP_it@P
zfxm0p{tFxNsGk1Kh`JDa7aJ(BvE>X@Gk_g`rBC3JLsuQxxc>KZN}*pZ_M<lZzpy=j
zE$@Fb=&$C+Z_ST8&h{R_&LN>)@;COqL$EKO4R9{u*F%5b5&4HU%+cNqpOVjgh?_mc
z1=xT2ztBwiV`S#PFR}iWadWh%{Tpr+pw3`>7edASJ4Q4hz)rpOe{;qB$1tbpuidvl
zVi*tvju0vTPtJxM-us(VIKQv!4)6Xt_w)0{(F26TQa`ts)Z%yV5fL#F-BFa&_c32W
zk#Er(_^zOJZzh>hJ-ko-{ed075(Q-h)5$zMSDtQjaHViXn41C96&E*ZZF!pOiX?Ky
zk2p^h;dpg<6M9c85wkW@ojFPTLFwdm;;AG$d2aC@$!eT2x}_64i0tpI?X;EA$^-`r
zI-*u)12+9Oy}v--zXsKL6-^u?J}FB?l5zd}hc*R=7MA`(40@Y(W3?@K0drtBc;_6y
z7+xjlmRynO{F1{++IR_@%K?AI@~f(rdm<w8L?o0C9;lF5QNkGJdxc4vh=^lJ6UX>L
z(Kj2EovSxO`?fU3v%h41H)MVENGq9CG3>x)y2$}kmRUfXxd%@zZ@T8NS~vT|{-&WE
zfs`yIDJ4uUgqQ^&bh2FK>!xF3ejZdpyGYc<v0{EWM(%4x=$Nq<^IoEqN2#ChM!ZAE
zCa7eL&x_pFxtEgg`jf8|h)R<*Z~h=9kYgp$GDD|bImiWJ(kf-jKW~`P@I0JltWhoU
z+vln1dg%Sc@p|Inn;(dhgh@f?<}VQ*+a<N0pBXYx>$B0*v`4%9IfXPLBO6}5j~ai(
zni2gmM~9#>S87>o3%$(iP$E($?UV;s->+_M=wWcXV~wA?KJ>jqYbSb0yR8vB3?HL)
zvs2sp_D-JgRMB}^Y(+#`H-XhHmhU5@nMNhWKDy33xcU947qiMIZ^G^u?_`IYBBH&;
z>*8?sGyp?Hq!ZU43`7dL+&v2z>)t(*G9zew6+`iRm-4m7O*e%B(l7P*V@>wIU%-x%
zORC=`17^MOBnd-(iR-;a(25M{#ULwYH8K1PXuT0C!96s##IBxK0wByp^w8%d6K*2>
zLL$MKmm(U2jbe152aO<winm?g&ii~1tt+fztQ0XjR(j@;6%+Oc1CI3K^n~f(0cFpI
z!c0-jxFKc@nL|#m0i3cM-aSp=lmp=5<@e)3f?(b)N=qif+rFG)!kMUOh7#uPQOaVs
zzl|A|grO>L_fq^LGelr0hyUC0r~z_m#Y-puQ7TGUV09IS3tYned7Xe`CI;YzKH$eW
z1nN(i9}Li)w)if}f_p?kb%MaJT?P@j#B<7nK&p{~xEap4?7R^%;20@{2W&$b7D@!>
zh2OR0ByjQ)P|K5dw0m7)sILkk4+#|i-x2)`3l46)lRgx1pwQQX)YJ{yI9M7*PLBAF
zYg63Zr`U)lNAo2(YmC%8Y+R{0c$;a^%D7RgBcf(;u!n{kw5I#PceW{tb<(uPs{2v6
zwf4MBddO$kI10J*cCV!H$K?BqsU}}6Qi8n4Hyu6aTx}7ILGH^OrY9he2Ivwl^(z3z
zxSa_Y1Al3f0N{sFvi;5G#sj<IQhI$3RGTsft?&hRm7uGot(&I~QJF=69h?58L)eLt
zXhcd;X(M@2m7>RTZSgZ*J7ywHIQZyT&Wu@uY?1H0_d-V=eOlMa?t^X$%okIaFKL!Y
z+b`$#+sMj{6{L16zP<XfFuqdFe$2H=lxMDM;LbRi|6VnAeB);NUX?hK-5HC2*@WEB
z88vl5*WI@%89!rV98{TU!jy`4%xqCk;eVc@7_X3VTM%g!dS#P<t=nmUI7?d-^aODj
zh{wTNXX0|6ijD^wv2`kbrw?dT5s+g;+ROw+%AR*!f2jq3(XhF(CdjCmpW!<4Z4f)u
zD&wB(H&I|)hGXzs5642sYdsQ&ykkK~T;-sHdyZ1>&B+nZJbFQ$jd=y>&vod+0i-Ri
z+>UEurIbC%n%S-W^R(1%RdRj%;*RK&?PYcgeL_5<w0i`&#HqsaQv~;sCE;m?qmnd6
z`jAk(4Q^fFHY_&vWEH*l7PQ2anG@-Mu(pzmK=5q&E9;S?L0&uYPCXfLC7K_0-5d+7
zsRy`N3%$^c`(Bl{TFr@^`4PDG3rmB&(h>BPnO;FNZdyyj7TAhmfPu$1!Mu&UXV?g_
z_x1xoGY^?2NC=v_L24F;Dw~eRH^Zela@4UY9V{(*rF5@LT<~KTQee4E0=uc+UEY1M
zL2jnd?R=MmI-~SqYM2|^mq|^gG!HLiORHr4K_xK{x2VMk&Tbitk*tU4A5=b7!Wy*S
zD}Hzw@#KIQbb|H$2EnJ9SwB_|P`_khbQjMFd%;}l{XJBMZ=tzIK(24S6VB<NGy-z#
z^7h-^E$jxEkx3bhTvrPoJbV%egps!%BvTOL(v62A&12&Xo(>|^d7Jr5$X1?;7I@fe
zUi14(<0+^cFx<MiG^$xQ#7+-}yUyUp`_jCy9gZqA{@_U<#q{+V5x9p{W9SiZP09B#
zw*_9!!>pS5#<y`7XUJt(RD2UR_4NXk4|}G&j`EN49)Ddtw%cN6xuzoxL^mbWZ6*Dq
zf5C`CI;*(O+He_$!%i1|&b(OW*K<!NXvRK$Cj6yup6dt6{dH-%AA^A_<4p6-_UXYN
zCrk(!7-f1eu&A8=<2>L=joK@N$1JvhH2kggB>6Q$8rC2x?H6&~2_~=^-pD5igRAEb
z69dul#DkW@Kr0~+%!renWCILH+-yE1{$M~&vccyt3!MW7=p9B5gR?vkoKFUeJZKL?
z5#Mt?46-pm2JpQ8v6_HBAa8q}W5Ruo;T<EuA|J>tznmxRDFBgoFV5Z*07HrHnh~7*
z1q;9<{V5?ALR^={=42cTLzPcPdK1*S01RMWzLz?*5+QT)*&R!c8lWS(m_(R8cM@Pq
zg(NzSaGw%pKl__!N&o|%`L2h#=p+CW<}*V?Y%Kq;i~6u-JUPE|eDlNxPzn-X3a9ef
zkTEJ$agml?`u^iWxbG9mLlpC!A#NVyTd$@CfF!F(3a<_nt!Ou;FtQRI5@$y6SDbz*
z%sY~uDC7TlXr@sTVb(2t_}E7(02nc(S40VLyXCz(i$QR~_0t*xB>2v=yurS$S%_tG
zDH0i*#x~z-qCbpUm!1Hn!6m;DazfN%CMk)!^@;%<ctcH`CBpcE-p2WEX9~0YFB$8V
zH>&QRCBX8eM+gw-SRTIDAUrRBZ=Svvb5!9G-)1I-gVtGvZT2n_s!_HHDi=azeDHt^
zs0od(+mjK*xlU@mGPWhUR5(_Se;Ic-Dn*GD9yE2MVqv>(0HG3b_<$@AaWg|WF#(-c
zq=9IgaKD}0V;H~5O2f2<rAwN#QGEMl9`1t*w1^}lnSx6H@r{MER#jmi$<?16#-&^U
zaH(wSy9BL}urjGhs9dzj78%)5aM;UT(OJ~;;d;mW@ou;=^Y_52Rm%#8!JfWItw1e}
zp>LFlE+?+?suO&ak_15aNu9<M1VP~^+g@^WG1M3q?RTN`lT;WwNsnjpood^q>Q7tu
z5?=0FF`}+IxD;9%a+&S*Vep{=p4R@>c<E5h_m%E-Ci6Qi4OJAUX8+Xu16dEiMLs<~
z6HKs{?2FUD{0pjPKUWc)Kuw%;735Fg<D(^D)pKNw{fG0ISRM?pP!S9v%XA00alI)*
z?<Zz}kTFK|ruwMP{jGW=BIyGBSh(rA;UR@aL7-lnCHg3JXigs>_^vUKoFN2HqK9li
zG)jG;dg#zhl+*xA>Yv^j9*U5!4up}*7ibBN|3Eei;641r#i~O@5Q@RY%MT6?`2g4>
z`;$EYK7VQtz~~<ed*v`v+^DyOX9stcM~~~RC$c1@A|v+|HYOr@*}k6Qk!7MM*~KDb
zYqw|8dR<RyF;R*E4)B%u-7Pd}^5yRxnPWdakej3rQ$7(c`VDZ>$BZmD*Ka?_nxc{h
z5sK{xvTmgLi1x=cp(KD~k>8R{o=qZBYUd5|BP?%;9+scfHtLOF9}CeUDy0pKdh?Nm
zIO3VgmIq-yN<`95^`e-KR`=@ZbRd&{C$fZFkkGKA_vYyV!bGH>nPov5TyP(1X@IRQ
z{&(`Rq#UXY-+s0WPF`1eP@8IY+m+)!5EUAah4==Z(oWc0&GSC=-rJExp29^w6B@rN
z5~=eqIDA92`VW2f=~dzd2e)f2FB^ftNkTz6v&|~2v-+pNe_HsTLb{*l;ULl$(W@z`
zkpWw-SrE+Jj{2-L|BjqjpFP9lS*oyOqQw5z<gjqW*kL9hB8fZyqFDKgSf?gXZt$jP
zPlS<^a2XZr(?RP?O>8qf`)U`95z5NSeXs7s_i>ffXPF0V7*zOgy9ryqBc%m>I&qk)
zWSM*c&zn++3VsN9-f;$|PRGo~sLO(V21cNqSF$OY$ZiR>4BwfQB|62}LtGO*3_GQ5
z#Bz=ySB2nj0ViD>g^8(r<Jq+gc~R@SCG#8o78cy5l~Kl0{cr3k4!!IdJK$x~&0?4M
z055BN`&KB36r1<7b74F3xYSP%-vd1SM=K@8uU^t9AMqOJ)CoGL7l}Ne53kLeI_(*L
zIxH(dO0?4Fq6k20;Ky5uSkfuZse0eFvR?5)#4JDMk=x&)1l*oVIXx^2&`#L{OBWbn
z9)IiAY@UWKu+7F$lcv^BKji@YH0xnY(F?#&ADz>>`ks4t=}N`-el^T7tvy5T0Ly5<
z?dRjImOf-fX3$a5{C2u%14)-PW#J7iHiX8wSY<HH{mX<XH2}z9+1gqjQ+q}0b*%#@
zxC9aBJ+<i#4OGPMJ+$phSQ0TBcBzRQ@@aBaJs~Y^d7kpx{zxPOGpU7saZzd~YAB$!
z3lq9IDy?o+dYV9y4AAuCX$eo^Vjs^>Ce7xw*~^WyahQ_)4(Q!iY#NskIG<TlN}qGd
zm7^gv`NJy#>d~*wvSY;VMWhP3j8<yAkU0!gTgLzq_X=b|(K1gs{eT4!e<AnQ`t!wQ
z9aEPbWns?`U%1zmX*JH@t)9EHv0gS_NG((A9zXg~<PahWfF^U}8Tv3D;FL$~VJMoc
zfijA+=BdXRw9&3+fMbVLFG!En#eD0{klhH~-OqzG_l6z=fmvG~2%hGCKx~e*f6Z_W
z2aZl>uY>e%`I0XihF*O`rzR@Z@*>;?GT$yNI#QsST1g&SDMXwsLP!WSgm<P~>B9(3
zUdc8JLtV<A`82~gwx4X$FoEuaAK<weie4<TogBdhG3ceKo2dlcqW}++S`38%FMdjh
z+Aik6i=&;@;lPU{42dc~z#kRpRT`MONN^H4^1Wq!n3oit|2UGtb_K4MUw)iWEm8nl
zs^yd?z>DAa$v`Bh3Gsm6yZX!mg>%Ut&UX`mzYT@-C#K)jpJl`|^lS7o9IEG#)ii)r
z4=cKO+b~pe)`bT}t=%<B8kv<wM2Yw0sYz#ewhC(P-b-r;Y5c4LpDF+xdYJ$m#z4=E
z0UU<u%8KYFkepUNmaCm`<$-EE0h+3WyrxO9;4t(f`mJQM`5B$lZORe4vTyAgX(Duw
zY1<Rl#6X#d6qf}pIn4#TUM;0n=a}FlEu~dSUog3&Q>YfRO9&QyEV7`YR_?^Qd_<|@
z4Ui-iZ2o;ERS$=1^u4pWIuW{jFB>Tj7p?gSd$Lr3Gk>e^=Spc65=hG8vS_sVuJAq1
z>Lb!mB`EjMs$K-E{zt4+!ZNx5ZQ3d@MoosH?#WYyUJFBsP3ZTlM(BdvMh;is48Q~_
zm(md>P&BLV{80FSnrLOTkFt&fsUu&sZo;Ee$DaV{oBkT$?~)&Z>R^&C*c(7X7R}RW
zh|)Z}C|;J{^ZUb^FXj>lej74lRYBZZJm{@ko{Q+wd>%CJgy$E|*1{$4E$WSzv3n>9
zkD+~sFX>!;2^n67>>3A(xD}PF?z%<N5iTYk<5CNKmqQVVi+i%p0={cwkd7qjZ0Xe`
zSRt#HjQGNrV+T7c7+?1iL#53cK25P8yd$mkt@kl>D_QO4bTq7aV)lggv#bRPPYb+H
ze)~4P!`fu+OpQy)4f28#i(8P|o$go{I)gRDSf)3<rzfyBTm(8>J;z(lRp?Z3)Z7%y
z9q64<2D(zX)YEb@rQYw8W^jtwR1BhzicwNFU{$ZRKtptx+330mBoE=msakyd^EpE0
z<3L*Y1#_dQ<~7P-$8q6Gv$>wNeN`l|i~vRLP)Zwdt9reSd$>ENaPEU2gP@H%t`BRp
zIVR=6_AwpbiwvQ2;xk;@9z$z$DVt-vu+GROuZ5V*eZ^7V)*{vY{jQbqtQiYm9jJ}g
z&Q2Z0a<;jjee@8xqX?!o_o}M|4zsao3f>$at~0VZQK20#GL0Or+w9)ija8vd0@_hv
zt>s?9C>RzG1TE^fe*2wz$jXW?KV>&e1eND7`oPhqWOSuDJYGO%I(s`=5GK%@;*Aai
zyW@0*K&9YUqSPh{K+$!r;p)g?N9DxBLeC1-UI@4(UqrReS9k&KT%~{`a<`sR+W2^E
zvw)oUAaQ2~la$XrRv%rv&v<Z6kR2)<*@w$AF|FXSFRfb4M5%ywH+-`fY~oP5dR-Tk
zZSYl&_1mzfDdznV-(fV+4rLvzX1fBs)lUAxla~gOy6^#NVv*Uh<AdtS7lYP{xAUAX
z=ZeGFNz^RrcHz6$Eo^inTsJ2ZrZdXw-G<m>J|ucX(5nTmDSEB+K<ikklLXopy>jf@
z-DrEwK~air7rH*C^v_KfC~e;PCYba%IiNjg@m-n6yr2|1$HMad`j5QXotH_j5d$4{
z7ES`YR1Ir3J)I$?-<7eCe+0ZcO^D?Niw+lbmB2m4#$L{y2p!vi`K;Ib3las#<BMB$
z^=z+-b0fG`KbbPD2ag0+RKDi1C&ulgB{&eDW}u|=qCI2OT4}-#EwLjkjCfPoebb@y
za`{W}TfS@kcpF)u2`N+@9uDNoT)X4<95Gq?Ldd+9jHL*1Q`TcOHSAsR_8Y@U+n)TC
zEVn_Vn(~E{oTRZg_cs=CBKpQL#^tsvA-%Sc$2m@?A_hy^+n0@O<b_L^rI~Ra#DXga
zz23_Lg>wcV+=otcA`1a^Aizl|y>I{(k>uRQ4#GYa_Hl6IaQkccvf5n7sQqypP-zRC
z^SwtGvgNKZsEqMs63Dc#VFp4zT6Sf;c~xh@;d$qDDjT6I@ha`!@S(Mk6cTZc>$;S3
zIZ1rxunkt!F=K1-2_`Ufm)`3t>s%+vHE<72T<)AelIz@?cD;5v8(io8wBuGITc-wl
zbx>FJihI>8I)fn}PHU$O^pf=^gex_hKeA9Z1G*2-n6xX34P7hMv4dp!^W4Y%YH~r@
z<pc2nb7dyYulZNe9nI$EMN(UDtdL7rcdq67qMcg!lAe61n1P;H8)AlSe6#bT3*FZ!
z^}~D^Ifj|~_8Ol?Tx!JHQOza-KyH9}W^k-fsUULznJG{yj7&0m5CJz&q%VI~6@|Qw
z4Oo#p;8U$15`-y%yKZ^}M@ZS=ZUyPg5sD&NB?2s-&wOY$|HRpiyJlCbTyqiD)QL?I
z45%94DS0f6vF?Mu*D9`Pp$C_!Cc0`k0U@GPsS}AZi2Te1*6Q&OcPy*YlzAL4<zBpr
zHK2m|1y><br1Y@%<6gsSC=}|EMwk>1>${(I*x`5>aTs^H3`UXc@^)`ANj7A0G#HM4
zaL*W{1*8W)R^@V9*L@?_SZ`^tF-r>O#~Pb{?M0B)Tu#A^wBMSin!x}qs7avPa?Z7c
z9B5<F(n$6xW3u|Ruqo_T?Yht6dv+pfUG}F@n}aFr4gm|uyuwgMvsc}jqUcRHDmGxE
zckpwY1%9>oq2}aGllS&GcD0++EX<u_$MO}V+B)GQ+2if(dBODJc=v_S{vcT2L&2HA
zu=>j_a<;hj^K_#tt<t{bwTrYelS19+0*qWQ0K03!vY1+v7CIO#WW)-dT#zJ|nkSzI
z``C451JhBWmnk)oJ-8TRc5{-Z)D~x8n1t;W>xvhS#6AygVJ6*Z@Atc$=R{VjHQBej
zxPgVwFpF^SZ<O?<tD)OrxWemK`xWJYT3IRi9sg&D&goXfI(NZN)?K)4ZYH@8u-_M=
z^a9~o;Q%_#WfE+v0HRO7>|M?JV8FdYaQjQGNdlPsHJ+x|Kf-i|T#2*twiPeX36XqQ
z+=r?4<iu_18d=5O+rzQoCSolS3?4Zg6{m$?baSW@A3vrC<OoedDMfGdA6F5wr+{QU
zG%p9F+6I1~(1FNns>Fl0bxw0!I{UimdGoKc0Lk(?4n#LJes;AzMac3I8D|6dVt_r&
zQ%YB#06Vo|dLS-NRnK-;MnZb>0}A4e>J~RzPvOMdIgadL-q8V(3FFsF1>=bGo!RNf
z(|xVoK~LRCD+5E#F(T_HyaM_>F}r3U6&*LFXI_NTMW5v>vRF04WHpVw%!s(1>$E&R
zxrFibVE3)SK8!Z9P}1ufu9n)79lmo@wMk5y4Yxj?8!*Aax>6S(yG_zx`}laNsCx6y
zrmyJ9twO{qPaSWjul$MaDc<ywTx_}fn?@?w+b1WQn1PLkNT%D@QsxjGDZEm&3`yN+
zX;?dAZ`Y_tN>E|_D}M9F+J$2?{5=ACn|<-I2*Cy+(Yl_!sZ)}$CmIo1f%ApiZv9#N
zQ1pfcjMwXVQk4t3DQ1D3oe^Qax3spPHD2L}quH3P8qJE4*lO&gk&5Ry;O33Awc&mB
zy7eL<xOY)~H!#Y<&G)cG1{F=^%&wMN*tvW$;I3uU9MZ-wMvHQQC|aX4pJIyR;%G4n
z6eCHPxzWuL_T4JUm1<!}qkiY@m8{g_-Tn>3e&vEBi-V!#AC|r2+tbj?kLPV<X4Gaj
zI;PVitphtcu@xHW`%ZxGxq+ETq$8fuz_TgDZ+F$rWR|hV=8E+`YQK5YwB9%EwOvYw
zUgx5EX`a&PSWtrn)Uf9SxM%M|1%f<n5N-aR^Re^$xZXZdbdcIi@Wohn9@CFpVQ%Bb
z7PTi!MYWaP_z5_uKDCvP(B0J(+qt(fKdEUH&kilNxw_%)(48}`&kXkn@H?=ePr!Vx
zPvxTEhp<NVDQwRT<t%3IrGdCS&nBQRVAe4R!$O6;r4pO28fL_EzsFIx_@0MmD^-{e
z(U^9b)L{o%Q)ClD5v!l}E2V<N2O@WOQyy&`u2r%q0P&XKHR%e9BeYs6#SLrO!)45@
zN_XJeLs<oOyk^g(+;%u;4afsOa)s1t-{Egxw20Ku!>05Ps_9Mzb&SsjZL)bTxT^3W
z`@x#M7YIdvCaHg)L4jkE8J}K;3^cIPNJ|kDxXYMO<X_78=EE^9?nZ2C-r{77QJ^FO
zwDYzAWZJ>-dPo>VF62)BxH!(MNajbm@d8UwjV%-tGg#6rR8mxF6+PgXCWapbLi1F!
zEU7;YD?a4=PS9Q`ya@OC9ID1F5>-aRH`fUVS!3VXUZfrkf)5}|>Ndt*u+9-x<ZK;c
zWU6XBgkC)*H!-89fdgh+P>a4ro!7mb@qQwvK5OWCmDbous|!yF4U2~c8n>X9U`$u>
z5O++s?hCK)PP<OST#8Ds79Q%%ei~fjJh2`Mbe*3tf6e9?1S7DVUtb8qNV4bnF5+Lg
z_jXlk=XHyjUFgDMs9>Ggv(KN5oFR12FOxX;V{TouIW0l#$b%HJ-NB}yvJ5yOg0Qmd
zNV}ZKu{7Jvb0%!KPDrNG9UaI8Exl8Z2OJ}AE*Y{mcA6XLnk7-$)p6X(1nS|mM!`5-
zsDfrhv*D*P>D>p3S-oPkW&!q<xAdjU)D<(o*smz$v)~4ht+a%-G7-TfyTpY9_jr&a
zRx9YXbJrzzwtA5PzL1K(F{Ev=D(4W=_Ox<c?m=sIq{=Cc(T1D++=z;a;Me*5j*Bdj
zjBS;kZRj)@%^<RMv)GYPL`QBt-QS^v%@m}%-56-M4NwA!9;`a$E&)X}=q?eE0;Xw7
zD71E`7BwenuH#_1PHNHV6-b_IVz`DeTej9@`R2Lgyg1h@vrog@mk2BZ`?+2q42f!H
zLqTbHetc-UU=Lb9Lv)Qs*mdF+_Ig(tb$*9Kl5ubjMS!Y}u|^VV)3@Ri{D)wb`T%}z
zI#ual%qtauA+X_z1nQo)@P{YMK?x@WVj@G?WOc4wxpo!^B9xcHWbHgLfXEWmYrD%Q
zPDeCF79h@9e%9v($_$f%T9}@lQbApu-Fx>4T|qN@Ao)FzrF!^r;Rd+Jxmed|boEum
zTY&lr+b{Ey@?5GJ=ai3i>;1VX{ZALA?47$Q5>pQf98jL)lur$kCmuC+l<b*<+?(V*
z_dl(;e!A*RFwX<o<3Pvl^iyV3NbME-ujd8nN|Wx&rg_bB^^a!fY7QDY*UZ1lt;@nE
zp1Or{Qsg#Uua@|cEm<`pI7--gp5q~ODN6H-bLobYZz*6~Xo`rO@ZXkV3#PgD)t0u$
z$Im|yPX79=(fMGOzoNa_vyTkNRLN8G`63|_3jnY7SPKT)BTiPUXG&@H%1PdM4nq6$
z*Dj6}ofE$&n&@_dlER3szOrvh)AMsVs4o|Kb+OQR=;zr4X&|B7=brQw?y7!2cLB`%
zKHs-;yG~Rl%ls}LR<Y7o<yRT945F&`{nq4=?F!Xq9P@@5X&kOqfmD7S=t7*lk=2op
zpgPXwsz(J2C#&Y%w2pn7nAV`9kx}NU2m0uuMtYIQPge?st-@j)!$n64Dr*D!7aHum
z`H1;uf#X_~Fiyo!!4xbA3!S9#H@7uXfmQ5}?y5JhdKJz*S<+2R9tm8ra<HDh;@M2B
z*Z0bG@!atB&7WOH4M6<VuOzk6wO6$H_;OT;N>v2TD9I+m_x`>=cM|<FOaz9c$(l7c
zD`E)RIbJdM$+mA<(xv-zmrY!+1VNb(f&L8HyJ<Xlp>UJ(o-8AfO?9p9NXlybzm+5v
zlEJr%Om^N~VLit9b5oNYDrC4+-9?A#lS%2cylMI56apX;L;a%oeLPVg#s`csP@{r@
zqHhHnG4t7UUaIg~8dB|dqaq|6LUk|=DBt=d=@3vDIwpytl$0N2(HJ-K5%#l*NFTAu
zg3e0Zx|GBOv_&7uwXk{v`xwi2KcD~k^n+T}bFwc$KmhRa?KQOq?!_HH<NkB&#ST!)
zi>6@LQ^4sz)=T<kZYsdmOAIUb;n71P;H6VUf4=lj_5KN@KV9ZuC^$Uq4oC6&e<MHj
zgeobO`TiGUTh476Y07-nubqNDul+dt<r>L}7hcd}J$A@Q`V4g6iaspH%y)-N`DQi8
zdXHNmeRlFVTBs3}H=p|(*LgXg%)NJv^z<ohvODK8ftV;g-|J1g=)vB?AoG5!%!swo
zExzzowdTpEyi;;#NZWcZWhE3xPg&HKM!!Y!%rJs^pL8}WLrH7JIDCRLWLu}yNaU~&
z$fxYd9$kGCIc3rKcK(4qs8godoa{Uu?IQKu#0%}ri{8(8r)2-6_NU(dM981M@MjG9
zKN~I4Q!Qh_=SUd)yo}s#sZFzulpcTOc_s^=Z;uyV#cS#!ZMt+@8H4=EKGV@6D48q@
zwI{A<Jn+9p3ucmqzo*RcaK3)<IokOQJT6F;l^17NY*rvL=3`aSEwPyI*$*;`j*7~^
z6VFy)WbUhk+F3w2Y_Co`ybC$SKgA_dDf__Rp4d$u^pyA1aY0L?w%(9nx`&Qa7Nl~j
z1B$C9$G}>v-tVI>FBBq*b<f{la|hB{fnd)q!EZFJ6%wRvNfLdvFpBNZqXVr|;=Wh(
z%0DBY@2T<T^85B}?5PFXXg$+W9Ghy7$Efk5`Z~J)azX=@+)%_lsM=6YDiP<>mZV6M
zQcX8yKS6wML~0c{E5+(!-oI8t4d#V<#TuiU(Sr)Lwlc?=EuN63ZY@ozO%Q7~ORWmU
z$to|L*ro-Rvix-&+x?eIZ3}nKiVVndgK86uQHl+X8r5<QLUO{RBJD{YR%y>{r|eZf
zMfK3G@-`&rSi2^zl?Vb`5X)U@qP&tB@hCsefX8V+l;9+Byj^cJN)1+7Tk>;fseMz;
z)vZ>;_JCda`N1K~-eST4wLwrPe`|guK*t!h-q6%7#cYwzqERci$~ym@ZStybsyauu
zN622JfK&UWQOsV)1z1msdEqi_VRqpYWD_xKB!e*D#&@)nN(ifb325?L(3lj_8+P6~
zSJ7)&hY*cU8>x+)Qd?|u^uFEDr`+F8r_e@g@%3`Qe;i}|IL%tZ9^EYp|53U9ZDuXB
z{0r56lqPnxdd_rVEYHOVI%pj+<n+1H;qyQ=hOq?6pm>b+d*;DL*%jlpbU&{t*NuTy
zrG*UBrI)z0)v6+6wTAL1Lw8iBBB`S??3=wCwn)0FXCnPkfqET!t0m?PTeXpUYfA9I
z$Fk4&7Q<1`(R4cF=3-jWAZV6ft09c(J9Fo?Q(Jhg(c&-l_44?xzHCg$2D|Xg@$Ju-
zhQ3^3voKM}P*icAiR8T(<=D_xw<tPui}he+`e8|8n~cTOyb(G27X3$Rn9)A)<%Zqf
z#f2qP^~ZOoNEyt^v{7#y!X=za^yt<?{O!*;FP@F(En<nSZ4y~+p=Ek6Y%w*uNoNIr
z7PP$9XcRe>$-BDFMT*_)Xolas>|@#Gz5p7=lIJL%+*=J#!zf#cx>(;~-+p%c*(H~!
zyz?#PcVbgJlh~Gc9MfF8Xt!r}qB*tN(KD-!Mjc0_I``<-Mhb2F&9{5q`vN8D-+k^1
zSyf0pKjJekyzmC4>F#G2>2!tN;=F2tOtpN&E81Q*DTf;tspU1VoeA8fEr})qvZi(&
zvnYxWYG~^%q2ybGc4{%y2QIA|s9_R{a3Pvt(u-1_X+zd~))gzm(sP~2G{&6<08ih}
z6kc0p&A(3yRj1yb=&(N|xFY(5K(ObSh1AoZ1V3(z&P15@7}qt;)D=dvJDo~tK6$HZ
z)xKgYr7nGWG+yfXlo~jEmCU;>NoaB?!AN6DO|KH?nB-c-_|oTqK>^U-j~OGGj&^il
z$QCS~)MGz-do?x|)N#I>O=s89RO?%y3%aH|N^JA}{LO7oWx>}^+LGQ(P}K@+Ohxi$
z5-<sE7{9cdu;(aE;lCmK6O*#D<H@}HYgCRDsS@-z!q0EzJ$A>;ZlXI=yloG@-d1W5
zYr1TvFr{|?+NmL?en329f_TI8qU(ToL+!^bP)FiD-BZA_!pV!tE=TANol<CcMQ_nR
zR?|(}k27$NbuGG4AXY*y1(;22t*{U?WdSsD`$>UD1hxJ#$KDMl1)bA38)LezSzJA)
zQ7;a>hn#jYm3MV%oz4%wuvd4`K-M>F>u(OK{Hj~gKBaNvE&oc*H%ZZa6<V1>o30E4
zzY^b`vYZaGW#sE0dU<GE;CP^jzMIQ~$b~nH{q^O(^T(44-Dd-gyKX^phg?FJ`d+qP
zEu0$9V-x^&p=^r%S~Vqcsm4gZ=$(;I?1=0bNrygk>$4!d&-YC+pwl@KH7sjb$6&ps
zBSg7B$HQikMV?|544RGPwY@~pYu@<BHIGZBw&&{?CJ-ks!VHuknA_XuVFpVGzYOk?
z=8{0L){w~Yi2+wD<TAJ&UFcNJ!vnNhswWIOO!|Y*$V50izLuKSmINFt2zgn&){>-0
zaKWlx=$vmVJ8nH{m?x+*(&YQbaR>a_FGfgSNsF#q(^oe-dwcRJ_BV(Ju_^FRJ6myN
z2)(?_L}E-(-E^*L-zPH0-r!r%{gws#75~tkISb8>p#lGnE(~qAj9M4mS573&=K!^d
zsF^iS03Ce9Q$PZ`5BB$*5*@ZbPos8o8(>Bi;1*R+Tsl&v>t1^R1gzyGwG~W;7)Pzw
zy<S-iEPo*y%qhR|D8;?3!4nM~bTQqSj}0)1j(V*f$$0lQPqlw#?efB6-$m${e82%C
z^wKm!969@r!qU5!!eyaL+wcM`_3bV$D065wV3!Fuv>nl_b_xIm+2<zDxJebAp*#OG
z97Q(6PiAW|9uzpuzX*#C+MZ1l_`%AEiqhUU<;4pnxelqFRZ_>Vj|Ud=)Zor<3}nY{
zF7pHDCT^d1<;c2UCFR`_l6aDf>UnNhd?il`z5-4ugv`2)wJ-JANz&|4d}>xTC8{O>
zTdrT~R&RFrkvA-cSIdhnld~CaX4{_IjgeUmVW94M`<yZN9-Sg~P@X?=FLG@-8iNN;
z^6yZoeq90lAdrUjDNO*J$koqrh~H~IWAR5zj&pE%OSQ<!qOIbyQd)ozHN2f>S)A|Y
z+w4^!f{wtcjhJUApE@>pX;_rnM(~WzM)lgP@AM-5YlGZ-QrC^I<6Z?Jm&&BPYAt}1
z5Ibuqfwzbj`2M)kRYmOCNtLm&IyvA2xCB9fo0YWa76X8rLhXpXD~Awc)(VrJiRZO%
z?XU0#AXX7zFl=RpP`%N;DBsgt21(y}lPCe!B8@O#=7@`*P69rlD`!f;+tl`;C9zi7
z=dSZBePIIknCafY4Ri?8j=37~zP08oya)^RUwG;yq8%xP2M#!l@dZw81G~d(b+MRw
zzn9eFUu)ZbWOFQ)U$gieQ{uQ{HXG@<TUgmzZTMV2z5g*h4>&deV=G$dG>6}I6Gqxr
zI@>&}Y3`#*DQ+=0kVzhlT!Y16wn<IW27A*&HgYYUW6Ybl4~9qb9n&18yW+Z9{I(yA
z)gv-jr)un>+S1GGmj~S*)B;<SCI&-Sar-v{lKZQ9+?w_|u87b&9V`V9DfxHG=V`S)
zbL=`hIfLR^RTO#Ku^*x#GF-m$>5B%#W2xKSA&I=f1Kkx%)b}%ZmtfybVw++deZA7i
zOp=58kz6xk`i5)k*C>1g=7u-~nY{uwTFNxh8duZCt?6G+S`A8RM50_LTcd{-4J&PV
zwXnjawpyfPo<>1jyG!h_vm1f$fSpQPvbWgES^Zf-Sn$qUIO>zJBSRKTP#h`)GTF;C
z9$&`)Vs3MoX(_FLh=O)~Ji@#{rz*2>b5QV$!3x3;gaEH#<82G?K+}a5@cic+>w|c?
z{BsA8NfZN{cXs{Oip?BLHj7VxsCT(rDuTra4`Fv&2j0j;M1akO0WE;Sq;><udBFM@
z9`~o0*YT>sGxE<D6*`h`h`2|~kPyNTJ0LRP?u@su^;hT1c}Yq9$fw2_@!(abH_Nuc
zW?y*0E4MvA_RXW>g(35~`2D&Y9by}Aug{$C3zP{bBD&g7_;&$r`%*G>uxd#xUf%$A
z9l8U9+f_)P`vaR|0PwfJgNNAj3hIJfy}tn;_<h!yZ@ML}pZVU{<8xm6XWy;$CQNL}
zL7c?F>fk~MON>!*B5>#lv7mzP6{}#BDA5SAZnt&Fb4{HvGw<i0gQf>gYW7;mwt-6+
z(<~3h3J2net>3*0UwC1EfY`h)+d!G-T$)ll9drO&85pmBsH`;|v`)>~6*Gzn?w=m7
zH3O|r*PtFYlIrT~FNFx$?yplD*{fvALl{;NOCRwa1$Wal$6k3~Efvpi4@wJ;ICIb9
zVA)_HYO;H7uq0N9LF6f9vu>YX1RKkY56QLQX}kA~md^Kzi*#3Fved>@@8Fr#Tb4H@
zhH9K_dM?6d^e$R*QCtW1{3C%qf8iW&lH@ViJ?R|YvJUv5pmV}5+UG4ThF!*gZZYQF
z{=#$jGXGNYjIng{X4_={)7%zMemsVyv454YNUah>hg!B)6oR#ik>igI4TGoZN?K#X
zsoAY76r^!fJ3FDdd)zB_Fg#z^)mCdS6${50D+ms|XDf+-nu*9rN>~`mw2{~6D5tj0
zrA@x5G>Gk5l|^(??1}{%ZBzQeMdE^r2c|2mc&VJrO4x>Z*0Sf0&*QOoen`QT-gWPR
za)LzDtP-|SB|E37)+F$B#+%tKL1PHfR|jQAusK(a_aCIj>OxJ#%b0hiY)jV5f+!Yz
zXrmohH|vek!&Y$BTvukw$a#BGg)E%G#kx1ItZbYqgo7}F>d9gan|@ETMw~#{J1>2_
z+L0-i&u*<M>PDp0_bZ_%fiG*@D-4oc4O3z;;n!A*`C~7C$<k~bY~78hMkd8AFpk?~
z;?DPG=@spqz1p-t>KR8@$uZCCj5d3km%6IZ920H+#FAm~b!^0!6$-#E;!w1#IXwBG
za&HilZ4<f%c4E-TuubB`U5C$b+W7TuF2m50n;cGS9L_MoT80FVVtdIa?I|BuA(t$w
zt9${M$?=WEM@a2nv;O1*UN(64qe=RaMXX@iltwK3&f7^9nIBcC?47P3ZD!Np5_7)R
zm-Vi!C2Sw6xgKc+HnuhCfXKRfFLhyrO~O)3&VR(EbesmEr%ZQ%BoSJK&<KKJ8Ea>5
z8ukX%IR$A0xyLj`N{NIYw})U}UNW*zE@{pO3{bO-587x@XGvD$p2l2V-38)M*7%Iy
zViewi0)MY#PpV)z{|4^Ed*R(aGt_lstVk<}`}C4gnm+wX3&m&>!wRGZ`TfRjIS(#I
z`iGtH!D^89Z1=7k&isO}Wls94bpu^X&cI^T&2IS8r&C3?Usq!`g<IgLsPZ{$4x~@b
zU1qViud{wnFIRHd&jimrqED-b(7(%>v$HiG)4dLS(tGK1jCrIBg&WQUx*M(s2jvcM
zAr)p3_%W8Hc;AxF`-{X%C%B9phk!sN=a%6(OhEY1=GO{_Q2PE=EDx|Zzg&b6fVXW~
z?kDr17jp-w={i$wWy%Fw-)H^E%9NT3)<E)dU!>S-BOX+x#W!}DvFgl_fmbRZ>HeRu
zQ|^cjLiZLoci)Y<+er64BO5fZxCS=2-8W1OUCtfo*mdUV*T>^U+f;emZ^LYmPO%w-
zmH7R(dLSB0yt@cfnndj_$1he?BaOmo0$?Vzu(Fb`k(rXF^qwO+3u#8?h{|D0Nl3Td
zfK14ecTqwYHvGO&lG%{T@maYy<Z0#;_Pm*b@!c1JPpOLnCm`(bWyaFYp}N4G=|?q|
zrp-|?U9W3q^R3QFi)^-r@bO3t8B<&gRv%V_;a=Q>Y9~5c&wpxJ<GoSdK?AF_kl>9t
zXBUTFWZ&G8WFb3qVNl2Gqt0`Ig<6gcE@lI#iPZb2s=H||0RPRyQH0bRkSdK^;Tu1h
zy)TIU%6lH>)z!XR`l=P3b?kKEfE7>IXOThhN=jbl<qN7Q#6pfJ^W4nMF;N?we*pPh
z6(`Bd*m(SVbf6&2x8v5DA$PlX-G?zf6ub5TuC&D*@b;<HtnnZj^hS&-MQfxXuu7`#
zCELD$no#Vyz2GXkvS|$;5M+Fnve|hHVw0HlHd4y<Zo)iLEfFWOnX1L&T$-1sp@A{=
zylndkE`{>fZ+<dr@d>LFH_+Rgzr)EU2*Zb}3c8ZDKQ080HmXQc>|9rY>8?+g=XFKG
zod#vToX8lAm(yBt4%m9DQbxuqkZraRu6Ym^)$q<!XTeDEL`Cdk6ElD%f8TyZ_za$?
zj6$4ahhbO{tke?pG)2nY>1NuXVCL0DpMl9V)aED+Oo?~6dzd$N02Y0pdV6-UP7z(g
zPJj>ZK?Dvv4=J_{obDV6_OL$D9oVJ^oK(5Y4TC%qM;cwn74GlTt#95NydKRRvXp`<
za}|ZkcSujZ-S4{E(g_bt$-9}>8wyJ95S<ErV?ORstWv~+TYs%SW}of{)k90cd{05T
zv4dbO67P0w3Fq3^i`=9e@%rLXlgQAnHmR1vaJkPLOubO;x!K^9YCn2UbDo4H;NQn_
zqg@&l47&R~{B_pZ8i~+OUIc!s)hy42rznA4Vu)w0gDWrvKTmdL#?5AzMnnqc#E_+J
zqcRw`jGySWX-MWRxKOvgsk1Ww>42@idw0n9aVY=u?LO%CoiP-T=5Yi}Hw8~x`hg#h
z2Rpbgd!zPH3pyWxkQd>sVJ3*JPPElCKL_v;A1dZ|i(l*7O|Jc5?BJqkOgIHDBKa2V
zgAL9a^RT9><#j0}@J*E9J_P7)G_@pSn!QJC(*ST8yhj5kcXpSkE-ueB6$+HvE-!Hn
zBaPOFN6h41hPXuL(DQ5JQ{yL$=M994?@0P>=i?)&`vqa2u!GcZcIa<N$gRRD7kW{+
zfP)5yjY5O$FUSR1{6bQs{-AyOPNXw0vV2_0dVf?pT9EhUy5*(M=rD;ChK74H>96Zp
zWMCn+v$UJ*np&Nbmv6HNH=xHCS)S0pFryGKIPthpreSwl-rQ+bb5ckPCzO(w)<zca
z%xWeUU*1-Zn4^(sXLU9ji@mxlq}3(ABOLoqBv$OpcuM_<muCbhZTp$`jhE^EwUe5X
zHoJaZs4yJWhyj|GKU|9YT&EZ<cNJm*Z5BGPsO31|T2wgDOYwA7WT|6(ee-UAxLl&E
zD);3np^2;?!cc#~rFnK7R)KF_H?ItU>>iyLr=R&vvsg#Gpu-!k)Hd|no0pasYT1p>
z^*i43&U+Sw6M(J-G~+eDVbmISxhMO8QVDJEL0=U_HX!Jtgg#iWMg)1cLp_Z-bAoK=
z)CE@h4|0Q47RL8?%5|3N=Bw}#@%5Nsd^am6jDrmw?r|n0_RH<JS3jQbF%pnc15pT+
z(&nr)W4~mJtg)`h^+<aA8sT{Ik+bQerp2jJ&v%=`gK(c#1mc-8Z^*b!ONa}9l)f^=
zmcR5tl4OYIih()m+HN=`sI1c9gyZ(yLD6mxwU=+lY(&hLB)#ZqD~5R|>8NKD*S++8
zpNHAxF#248q%8WorgdRVfh9}#jrUv%J-x|mJ^OJ%Z>N#SB?%T5uD%Jfi?*Q|^P%gW
zvD2~qv+8N>=mC~XaC~q*vZCUc$-AaL`bfa4{IogCil~M>pwYE0bQhi)usjzJ3l(iA
zx8I;3rMimyOulw|GEk*^k==e~#>Q9IuDdw7x%_^oB#Fez$=Se?v8R<4XIICEwA>BR
z9~zx|_6Aizi;$)nN9%QjYWYApzPL7lzSQVIvL$2xR5e4t8H@%75@u8#oQ!>Q1$TEp
z*GRS|CU2BsyI<&=OZhl72W*_du+|bJ4dzza&rRbfCQBKJ)UA#4nlv8X2EzwDf!;-o
zA9n|^6trHMNN7q}7bvS|NorXuW0_U5o2Tucr9V|-VOV%-X!lx&rtoXG`%kal$iY9w
zCj%@(A;8&1oVwlMr=1zy%kGzL2W{L?PKG$f<UF|{f3P1Iv|;-8=US`=U{x<n>R`1Q
z&VT@2?T8(4&v4Bij+8n`&7lrj?8LS(HZ?E+i>WiG1L9Ny<c7DIClnIJKd8gg1z6SS
zHcnZPG5**@ZMvP-H-2R7aXM)`FyoeME|4MOzJD&k4_(x|mAW*jB4*^v8W#8#fz$CX
zO}cNGGKhR^g3jL9C;(LsAUDq>uQ=ZH!=Dj^=>!!?LhBb@yM#XuEII%gq}5^V>lvwD
zTeKendpEEdz8Ix|6;=~^{e=o<*^*7=XhvHD=#AwmwL|%mKh;}-*>rJ-=?DYD5<o4S
z?-}E{vMF8MN|8y80JSo%_gh(i`V{@(tv1eTT&KA?YzF4OYV*zsrj(EGeRLqxL4)5p
zBk3G*&Wfs(P5#qZHOR(ju+w}{=cAI~^DtCM!xX(o%SQFg6-%sVg*tX4Z4lw6%L4B4
ze18p>%6i3cF%0!);!~J#aikemser2Ol<<&O!(0H2OU2N-LwJMyd5^fKq1XA6tH*gC
zH>hJVz=xg71|6or@`u!85RS?+dUSi&4?SuK{39{7lW@-+nvD0pqnL4BH~E+TQ>T1C
z>Gb>N`A-9}KlUrFg{_5GAk^X&R!+#m8#pr`((sku^N=~KL+0jF>RHjo&!Pin0v)&M
z03Jooj+tY(MXOs!Ygz7=CVB4_geWt5n#_uA+;~}32F+EKyat-AZEF$Y7-%K!D7Xia
zG23k^THftX{m>tX`?6gZ>))RCsVwPcVmPW$rJI~$O-OTXz%KrIz?D3_Nu)#x=_eVV
zPCfCp2`jX1&vM;|g6th18&s5D5XHKEI;7-0rypKjho`1HNkV*a0-oC{yhg9VgDo0T
z9_2^84nTF9#Be8F_Cb-Fj9WFV=3)cl{Z4tKt(p)!CSIU9-yx6vntOM@D9e<JVoUcr
zvBGq3Dc7|6CGZS3s8-Od?(_bnl*vr|{)B|$VuMVTrUlQ+DHH4<Z!OPo=EQkb2j?0d
zw`=E4`|@-75$8+4$kt3>+`li6)hG#;EP!D*iE}2>fmRJ|62O-QxJOuPH^9L+FogoT
zzPlFj9)wc#O%A<p;D1I+E3F%SBdR)AoMdYJtJ8+a<8V4neL=K@py@Xj#_i2s8Cb2!
zkBQU5o2fOZpq@qI(#<Q+^MRG`)a|H#tvpZuPh`AvNx6$mTIpinJXbK!7{`5AUVY}(
z5VEjN^G2_iWF`XVstN0-R6f^whsjgZFDx$i2dke^Dwvy(<g5*`8k6jgSXI{4Phj<U
zW-Bnv%quN3xh06a4}6@<W@^K(6%#S`(55H%%$3&4;uUa%P?w?1%f0m<nSP^hZ)$9g
zdh7x@+#VJ=1oboHqJ}9UD{M024k|5{5#v_5<Bn_{(5~PouNYEfyrzz$eLb}T6;lL1
z2-MYqTRx1{@o3oKoXp@BHkocE0b-$paB(#|xaOh+1<*^%T?4c>L>EQ0EqaNUd%-%q
zq<+N<$;}q)r?mu>eT9TYKQ!tCox;X|)xJ%d)Y_xhsN)tEAL+Cz!d};cXA9Uf4R^jh
zj|##iOIGJ~jZ-}z=#Y&1y4PWgppFqUpJxf|awN^HS;_q%S<o49!|js%Cwv)HrRUZH
zUcPQg{6cBJf4AYnTXT|R?_Q=A{3qDi>&t#Y4=;L^^|z128$e_XSlMrZrfNBUu)Nn}
zGF|xM*-&ncTdcL0AhhK0_dYNrb?Vu8($#9tZ%QtZy`0b=gmX~AAS>X6T_vUu!JTS0
zK^k2^r?g*6xXO0oNe1A(r_&;W4xuUB^4tqS#dJ{UTE<A)zEXr1--7l5tqdKJ%6648
zJ+9E2_Fx2%CyUmh_w^n;$J_K$fVxU63pQuZEqe-jlXBzM&xFs{j=39@-|Uut*<*Iv
zVT^??q>H%_SmU>tf5P{J0zh8s_3(B<tp-On{S8x3skUbAgdOfQthdBnoe}3!Tl4-E
zdb>_6rpI#mOwt1Y%t?6~v4rkoAfG{{{z$PN4%WGY3f2WaM;eXaG-eQA${*8P&0=l!
zeb?z<R%B(nELc?5niHLq{&Lp77bt;YO6U?mrV~A+QWv}T76upU1-m=W-S1r~kop!i
ziHT~8?Rfusx`&5OxcCD1XacN5=wX1WCt9zvB|AB=|M<49=kA=lq=wTsHZ86V6<*J{
z184ZE^?1CuTLw2SVmV1;eup!EFQ|F!tHl)C%A6=NY4cdKe~tH{G#33UAm~o@K3f1|
zrE|YhI3s_P@^$Pkq5BR^>+S{W#V@=+#?6aszQ@%0-L+186rk&X>3<eB(1C!tO%!dj
z<ln|l<Ll;)#_E`<U>Jeg`8|ZqaU)?Rrgc|=<~>2Bw0ig*be<ubNhg^;GcMx2wjS94
zS$T4*EZhCnF;Pj!W$}gN;MgVZ;qq3KHOj`UW8<kG3gm{)`q}hIzi9DLC%c2@FxDDR
zM8P_82CQs&vByLs0`JY?uMrjsPTM7c^N4L0$R=tWwDt6bI+h=qB(olJSMkK<>ot|>
zvU4)v!uKWfx3~cp)CyhFb67J3nt!?<_N~@fzxd^L@0IFZ#{n1<Wb_{@5t73ibTsC!
zm84!U!_YkZ*xF9H)-wjZxl-`lNT^+m-IimHUlaJq>yn9=tXc{ihu^r<r~lE%wZ}8P
z{_*;qepII%D(QxkWENVKiWw>D2opktmHV}uObk;_7q?<$xopxenaHr_lFN>;Nan<5
zi0vF^?w5wGS+?J^>UCbfzkc8UpYQAae4gj^e4fwy{r)^jtAz!;g8cr&qFqN!DDQ}k
z6>UUfIBOn~8C5ede@K4S4h$fpB!>1PY`m;9pVgq8^5F1zknWc;8<$hen?%=PN+3}y
zwk7Jy1R}goF{6WX4cxw0yiz@0R|b+p`fBUTk<u}PLSq!hF~E!VN-0{DP0$U~NWaVp
zrJ2&zD`jk_Vdtv6iXxIdSmM`<N-;RsC3qkeB>b5iUy@0l+fj<!_-dU|3#lf8N)_F@
zmP{U2Y3$GRzi!ltZvJ4BbLvdIaD8MqY_21Vwx^<pX@p$W(3^Bqpu}-$SB3uyLXMW@
zC3g)4f_#%7KwjZ!Qgo*@J$+u~)1eSJP>p&xDSxCOS*6Yw^7i)7T-eqnj7LwTt~ac8
z*8xTvM$2XNHfZws9Uo&J#c<$LOTjnpJ6M>(NhdVz0?2kGI0X{fX_giND9bd{Nk*#8
zW6<2Ch*FH@C2*kh0C<cHQGC7WbP-az4Bt~cbF<gf7_{l>T}&IB#;BjAff^@LKn|{c
z(ZBmxyeB&5jd|94hV^2N8)rwZD6dPZbRLlw0qDwswTAVxbr*|7!lPX!XC=+ajR)xW
zZ`gCp@16_mO(y2#rfY4fHYiVVdSI<Rv%b}zFdh4O3=zE|Ysb$<qgU6}r?t4}k`l`o
zUjrbR*}G*ylcF_F@wHMOy}<96SN?RXxeknSh_s?B51|haEDUof@xJ56QgW{@li+r=
zhd!E7Bu85ej(Het7d(7_OcVV?|GjsL(Qit>n<M)=XqI2cJ!qio!C0J^m~^DD;LpCW
z@{gaum3q_0w0oVHQC^Bj*~&jb4k{M^&RFs;j#iXV9xyuN{#M6VGlmF$0Z4?H{bxT;
zC-Bbw+&I`4RvMI&sE<GF5N<rmnm>n7afWC_`k<dZ(|5@3mP>R?S4*_uJky-$>}k2?
zG86f$AS376>fk;MDw}wjId7jm@N46e`y3VJ;M0V1Z+*_$t%q^%in7Y}REUH4C5zcS
z(c|i2;9IU6+vyhy?$)N((BBU2cx(%N8HNgb0WF{~`Du)r8hSAJ9@}*y+5;k6X%s%=
z7GCC0eq0$Odz+jA4;b#OyN~a{4H;G9MQ`*MEo;|TQyhpT=%L^<k#h|&Jrz4^vIZ3X
zZ^!D1ZH>bAq4AfW@T&qLJf~A>v3CI89_qN_9}BQ_2TC>MFUlRS4EnGIq;kSD=97pc
zGx=-8BYT2-HopOsO~pTY;*T{naJ9SbTX2cgMl`6>HeI1wt+sm0nF7f6LCK3SxF7AM
zIAiAtNo$xhGr7Mv@c4wp&z;YvVOY8<p_D^+$xvbP!zC=xH<hw#p?vO%HMRYE;3X=z
zud_GjV$w^Tvj#tBd3yr4is!i*<Zi(ItC{?x@B%GdXoQqd)e=9<EW<<M$rfDO&3Zz=
zXr4Tc-^Vr{zsP_}ehRmIDH8#5%BdJ>VSov)HYgK(?YaLsslOMZotxvb?3Lw{2^D%0
zo)LVc)WxnMH8rq_4(cj!?oC_1Kss*Rqr%_0u3fF?vN9l2{nMMgygO_8xPV*XKpu(|
zbhi-^aR)+rLGw%;F^YpOelw|fY*{$cG2@=0j<cx>?A}%-w3)WX2(7fI;?Tkk)%*|M
zoR6!5JK+)pgP$RNp<9NVh4N9Vv^t$lKy*Dc@+V34+fFEPR16-1t`)RF8NA&Q1Xxpo
zVYyg~i==_HNBG*g6(f{*mlEkF%fs~niuiW(nrG}SlUNxGrsl6$^aRw=^8=YJ@dq1L
zPDVBlsHNr>_1d1Q0W6@gXegyh`__c?{`nmB6=ggq*%22>c+^y;@$$#jXfi~_BKf$g
z`lLlr--{>MM5DiAuplDR{+d3@w}^yvxts3FT_ZL+@)l_<XRexkJr}K;2=SomE$5>M
zqE%|9;T6nY*^s;wNB)%IsOlICOsRZ(k!~WYB}h=D@LGvG0!9}4{m)2EA2Q+VJloc7
zk*tY7FMZf=yC1hd{#!{G4B0Hkp<}%;<EALjPi|BMo%-_mNW($7%s=v>^rdSi*2U(!
z*Vc7Lm7+0C&3ZJApyY^cV8qxjS>|bM_&SblEW6S$ziYW}m2XNJ8QNW}CurNrc)Cjq
zb!X$51^ST;SW|w09E&*Kw;lLZ#AXYupTy%SW~&?560pAIwafWbcvw^E;H|gW={Bi<
zMCcW`rWDlKs#mj17-sc?V#+r0$fm$^n=@dRclX`$%>_mq<YD=8#nKFUP%%4J<IB|T
z?*^Cz2F=hSdSQjy>1N>KTs^qeC`skQ5;Hs`^K&Mv>8j-E;qEK(6O!cE>-P?mfJ@@(
zTa^Ywxl5w>|F|T^T@m%d_XWh3o`zPzNrz#LYIZQN1L?pQ=fo^qRXMmr09^C$S{qO0
zzE?vrzJUv)Rm#DIX?zJG`~a7O_3bP7su$9&xlXXMXk=q(UfkZS^|w*x*C~TRGxQJ|
z6gxe%Pta!KG`|@K<D61{yw)DLa+KwkfMe{dcF||0#(2dh4;@)-MlovdB|b_=$kX)i
z`f_@-=G;}T&w(s0$iiQ^yHj^vZ1d+>l#Fy(1|;dSxfqAk|FrNtEjGC2QG>bspo!vM
zO$C3e12z={+;^Kw#UdFZbd_GQio^VEW{hr3no_?w+dP@C!cVaq;K-P>Azht%Os*Zm
z!~O!+$=5p_mR1=4>aw87m}zs9830^Kbl^&AsE_w$I19SHz`T>b&-;X<bh(vC*S6g-
z7dM4o&bEI+>e2}>)kz!P@weJFC7VlU`pk-n>d~Hv>e!5!aaYWds+*McYsFix-QYbr
z-PEcI<Myq96H+RVD~ilz0w+r;2*aG#w=ftitV|}blMr=|M}OvunBA=Zbk4bTljB_*
z1rq$;UAng;xfR5i-r6oEI<M)}>}%E>@)zHdDI>Ad<1E>9yPC}KLB}0sU{`|4k3Yrv
nV{KsPD}ca6{V=$Ya9Eidac9~U7u@^sHQ=+cMp~7hymaTk9}0Qy

literal 0
HcmV?d00001

diff --git a/assets/images/multi_contact_planning.png b/assets/images/multi_contact_planning.png
new file mode 100644
index 0000000000000000000000000000000000000000..1e501fd771b994165691730d0e1e4089adfe7d66
GIT binary patch
literal 14138
zcmeHu2|Uzq+b>d-B~nQdSyIV1V+q+pl(mG0EE6+hoe?u*-x4Y!JIPXtVk{Y3_7)^N
z85wKYjV)^t&OP#f{yoq0Kkw(9^S<Z2+xg^U=67B9eSNR}zU~>_Gtkpy*>`jw4Gj&;
zB`x)<G&Hmn@P3?e5BRNg(H9L~XkD*rUZBZs;hv<SVQ0r`T*o@OBkdd!G(u3dU+;t<
z61EsutPoUP2m*mSJB!)a!BH-7Cs#2v0t-lh4@V);I}3C%o_3Coa3RP=X)y`FI%Q&q
zw!yf$g136tz%NM&K!#ob?*Kza>epL$8L88NqU!AIh%iCi)V2e1FG@pY#h@~Pd`esE
zny!uzL=C)mv~xg!7fpn<1BSXq&DI6u1Sl6Jp#YEi4-gIENVtpLZz8BV;t?*cb{O<8
zGaxcxsifpD?Om<mj)-6M9m-9Niw(l%7YPW22thT3BvgSA>JL<H#{uBM&gR!5XBVs%
zN)L7AnuY=XqMEApHMBJ_mI{l81NF9DX8^n;x()EVv_E7zxof(>oo#h7HV8+s&BlF4
zAqnYi1pvk47e$&XmMR(L@(22k)>ML_o#!8>?Ce9~>}(LOJJNpb#9}axSUcyxCt73B
zXoU4n6n7-TU0kT4+g5*vgv2=R*t|n<24?-;3M#?ypIl0%n%LQ3ZGWvj2a)}y3ze>g
zutVAYfh;K__lx5M|6}ovC9bw`kV(JxZ<GGnkAHa61%m<dehb<!#_zwW>8*xvq&oW#
zKkndw#QERf`!{d6{H+xJHR2#}36!<1nv5arvJA}dDo$Swi4lc>YM@5m73=Xw!Xj*d
zvv(dbE?8R(3WJ6_YWzvP;DW)SZ4lI$QS0u{H98oKGayO=qCEnO_1MWAI1Y;eBwMVL
z;|>?$Zil^2-7Y2#7}O`&Zya@MqEaPO8S2y=r9S-9ld2pbfQP^E&JM{HgLARoS@qXO
z?<~Z^T~MIkY>WHrg-|N$zZ~>G@&i;G0`v{6iwAJ03&IhOwZs3ts_>n(NBwOb)${)=
zsR7pPWY)i|%U>kmU)y!=KTFg>=DT?OmMYXoYOZXj2le64WZ8cDcg+^5|DxHl(>?!x
zkNC?D3X1r@WRKWsl6bfyZl{mw;=ma%s)j*h;oww<NEkT6(P%p~>Ytj)p9ci>EO4{6
z!y*iw;oCjg4U8m!_S<0qMngw6j3egPa0IoHMx3*e0!v)6E*J;IZ=7?Gn@|~<za0f&
z`obdqm}!2g;0*ussN7cnUv8Iw*}8WmLZ!E>86rXLC_uW5?9SQi_UFVQNfkit;kLg`
z8!|hz|3KY-J43;^`e!5!5pWl4ThR+}S8$B~UkW5b5`<FruL3D2BMXK9zY3(}9|743
z<T<JTyn);vNB%C5`p#H8Cp%9tY+%s;EyDSab@`t;oI4%nx4P5;bxG~0|1Cl(4c)24
z|A>_d1s@Gu{GZ(Uwt0VdyZ-&0+2;OnpZ~?!*+Bge<kxND4Aa%yG&FoPm(*3Sxm(O6
zd&C&kCfECMo5by5uv4+=b8dsOEvZZ1*0zb$isKMYKY1ebR;Y+7d;F<K34H2LOZiG~
zGu?Yc?|J}wNw7u7S!(uCIz8mVdnY#7!GMf6qLXqs-O7U{gA}*9^nuDE?=|t~6yuGl
z#dXr$de-cWTR9Gk#WE_h`Kt)iVT$tNG(=J!jylABFq_F9soXyh6qQbo@ADPzDH>Pd
z>hMblJL3Ifn(XSI9(mv}{ZWB779yIN+ayTW>djZ)hDDL%(UoQ)jWtSMmr{>!*o4gB
z_jz?3Z=Kyxt`<2aQ++yHrh;EQ&-^R{^Knt@5Wn_eE_QkP;*<WcbVkM_^yGmaK4o9e
z;-$|oOPG(HKN`4V!1J**p5U${Q(b<gN@Bs$uIF_`2$9c*(TZeCLnpnr$;0N+(u}m$
z3a`|G%PAc0@X^i!bmwKc<-6MFQ}=#RQ@N8Yea64Wa_@`uIb70U%&t}LUC}Cx=U1mL
zO!lVpYJLqEfsWZ(kt9*6uT2<VgfxtuhZxm`pNmYN^T$33f6sZg%urA9CS-r|?wIZ)
znYUr}3AnModo^#LuQrl(vk~8i-lcII38)o09!s*=Wz*zaPcH+_;NYNblFx92c8kp!
z?eCwln(Qu6h@lghNA^+*rXQe6-FU8BSSPU}dCck(F-CNqoH|2{=M9`D(&t0#T9~oH
zIwE`M5^&5gWH@fG&X3Vam+23CalP#|`gF=Zo*6e*63u=xr@T19cb+Ajk?$c}4dLBk
zIwTL?*E)emfpVmviJNkuhCtz8D5FKr;MN#ZlBJLO*M!hZQ1WB_VIj$bN09X`^1)}J
z^<{uocDDlU5A#Y^x{X3T*uz)TBEN?PIK)P7TCRY|s!-h|CP0fPajANcnTab>x9*sq
z@P`TPVRUYlcL(d<WS;xJ_a=Br*o!;?_A&|i^;(fO$HG4-3xcT9Gaf7?`cAwF8JXs&
zYm`jgBIqQ^<-GNonJR+J8d_f7+#0YP6v)|SnmuOYx{j2ih@cwop;Au0V?A;*({GYF
z(PidfkJkr_SU=?{4o#y<<A@wTkAdeLqcPtyr5<}7A|ChqeAln3zqQyR)lH>IUR;t~
zevyWV7ZrGKxG#VoPx>g4>Akpba?#CaB+d(+o8Xe;ryuwNL0WFQx~x38w1^nkSX3|S
z+Ti}ysww68x!1n9*VD%$(u(A9uYEw8E}_VQ>qN>`#)Alw+W<DRaA{Js-gk0PWA3(i
zqVqjwZYhJ~Ypq+Cbi5lDwbhHtzlF$Nj!03onI*g~@a}PWz2p^tQeonj@PlZtxl)r9
z-=ADPuh&~#C=uw`W4VtLr%O8(D5iB4HtoEl1^42Ol%x_A<|5Ns_3Dh3?JHFMl(lBJ
z8pNZvOfu=@Srx0+*xLM599-n*>nNCO<jKxmJ}zm#CB3ud4Ku6Fv6$r5oXbDg)YPFI
z5ir?kF&|D<8lu>3i1j)BG%e!hM7`3PY~$R^;csIj&7JFZ-)nDq?5lQ?*y75ZtK;$Z
z9*50XGBny8SH#;jcE?)pAy>%_(GYIDR&83*>1>A>59Q?v!&sg&$oP_E#lXuX>x_tr
zWrm1*(Zc9XNlyz^_&++#j)bywE^aDcW`Ovt#NMM-`rgPe+L-D4yr46ugxKOh8e=Ux
zdZQ>?#%*D0E3eeu9>GA+{d~SmrdfXDM2%x+UABx;{m}Cdk5v^xd+>UE{z8c+u%l{f
zEPJG!Q#TUYqM$aLvZ%%d!QSQe962R|%n2JjUtr8jH#I}(La#<{`95=*&W-qxm*nc?
zaw8|AyBtDU+T}RqZoz}eGcP(Rxw#bRnoy)T(_zRB%HZ5W(^(4JsSZL{WI@?@h5e|#
z$4Ub#TDYv4$JP^-Gi#U4>vJ|4g)*n46e;<d@N#zyKX(IZWIC8vwd55Y?KL({fT7v)
z^=XL3tue<H1q}*e)fGQuN0h#4vT~<P+jebHhHoNL)+U{t?u2m56r3O$zkIPQRqE-i
z(=VIn)+(O|%T8W<07YNMlk5b#%fHWAESk%$e!ADd5bLQ=?0v|wN^mlG76I97MM89-
z98ru1ok%lD<mk-I`sAhBk=D!(eN=XTp?r(6zJ7$9XILSITo?<ZFud8E@B1p2x~JQ}
zv;Na$_44eL-Ml?AhnL%MSj5ox>zDe`M<Thp!3rCFog=IIn^i8}xysBH#L;9Mr75in
zR9;F$pyS#a@|X;Ars=A{!#x0><@RtvKI1_+3C_OgAd^sJ`GcR_LNDQPjW*zR)f?uv
zQT7@_qjl>CJm}N-a~+}M^q*l~DK+w6tV!^3+29^H=r4(KBq4uTx5fx=R}-1C#C~3U
zL1h=)_lGn{p5g<bRV^NSH~kq_{}3^DJ)Q<>&KcRwBH@tNsX0GQXs4m?>xU)zE!d1*
zsvdxBNIxldv8(FOCW**8JuteaoPfLXj4KlcKxISEZg^_c)F$JK;++!)Su?6WcK+1E
z<oR=QpGBHWI0qB;K)k|ZSbYSCLSaee7PYXXqRpt-C6P1B@i|Mg{ZSQBS2vncWJP|M
z8WcfIFl$$pwd2p~T9mx38hY<-XiTZuY>>@1Dz`Attcuj};ksXjU#i#C)}FIyD0%vN
zB@k8~$NsfJn}*n;9k!x+V7`{n++REZv-F`b%WV~2x;k?x_6%&wLAUi07P+8nH<i?G
zygB&sqlkE`65RXdK$pdXmsPKI5^*_{E|X_d@plp|L+`^FUaKcuYXC_b#ZfblW^?W$
z#9qnDo0RfvO1@QOJ--wx)l`B`Tz%a|k!zN{Iq9aGB+h&VX-@H6hD41v#+l$d);ycY
zEV>*T*QtlXLWAQ`!J#ae+p=e4;kPXFg3WeK^3+!I>ZQy)$>yp!Sa;dx{m?wE;Lfoy
znCvMfu|O5-`Lst%YHzB4D>7%m3b$}8Q!CEo8d=H#Px_L_gq&cdZ6YV%V7qjW{sP2d
z@lb4YAI!D2O#KZ6iv^GqS6l96v(NVsy4Lf%#7fA(+ns$SeXhhpR$JW^nr_X(sg&hn
z?!eWFp7aH&(jRB^XJN!TKC4t3q*Ll?MNh374dTk)^v!6<hI*O9tHn&ST*Mvs1#s#u
z9!U3j2oQeS8I@xfpA=gwZNY@M>P(xNi7ly?xz3?9$Qfg$yBUJOfJ;M6M-LOl7^U7x
zJ)lAMOBG_L!NE<=bR3O1L~Lmd6Y1mkhn2_fe|q{JNWcT%ZwJ>BnCO}%bOrojvMi!J
z>*_SbmD6TFtZNC2(BXY0AlcEQfwq^yu6X*YG`J+?whC0I6OeV)wCj~XP568?-n=eJ
z`Po$ZRur5qW<U5IMFF-@fVl1TrZT=Z=1wV~S7%$tD}7K+sRgmh#!d3d(BIVt!@|Q?
z0(vn##l0_|F&-&~(GP4bmKRvHC%d$NPHaAU+xJ8pmMVlD>c5{Jj5E5yv-tcaho1fr
z=X;V20kXXo@d(U$fz0E#E`?{??xX!v?R@9SjRJh9J(tq={3w2_&q(&`6<@E|%}A@%
zO}pt3gVQV-rILZ8u<x0Nb_JA>O&T9-X1E-jRV#3L|01GBU-4!Em<(0IMeKf)DwuEe
zEyAXMOm?(Ju|DR=Zna*)`MzwX=a>jz2>v^#6JFbr?bKu7-*YZ|u!;U<Zv7?Qd_8_`
zTJi4$i<#*yuKy?`^cF^Wb#Yy9fA0gFwl2s2mTqrhB~jJ)=aS>(D?<#IjJuS62<L|Z
zlEKu3SHt~!o5k{Dqf7#=adFA(pQj%A@);bZZ7P#mVxET3;siH;f)#V5FOd9Ji;hf9
z<*?o_cTNWm<p0ox6@5A6;W2Yl&T8H^yo#rV^U0{NKWrRTcPNt0V1Kfx>9;#)Es5#_
z_Lf3(KJ#cn=ql!s0=haHSe?c^sjPF6St%x@;ivWq_HOEs#LlR$m1b6g?r3jk|I(BQ
zq5la+qV+`Cr@=KcAQ?*b@^Y%BIfGbz`M||SH_8EeW-Beb3g3P(J?<AJU*KZ{JidE2
zN87=Omwri_BdBH&WXlU$I$yAT<0EJLg_?aJVfS5>4^V-Fq<?DwWuxx{(B;ug`~fP5
z!S=(K3j<WF0I%l2N9=824MXb9lZ*ksnl)`T5a&A@YJ0*e2vpW?fn=3539t$Nf{T%E
z7J&9Q?2c67V*>gx36WL!s64B^2>CZa)j)kojyp7~)L!fj#IC2PehpFCMQyn>548CV
z_<_UKzcaH06aWZBB<cpE6qw7<VaH5VYaBonyC(&{GlG+Z{bcfKH5%f@U2OBz(ln{M
zdzj7`Y%!MT*h{|;6dGsnqX534op!R9z5r-NG<eEFPX~AmaRwP2=Ye+L4?<NLvD9nC
zKF04FpejWy)tP8TKn;l;nWU!+0`6i`YgS=%4gnZ?Vt483U;qPls(lY#0O0M?KGnH5
z1spuy_Y?W(OQ~b<kOFY*D)Rwl`lDbZE~B-*K^JfU=qvOVWy=@?AjCq?4_`==1Rk;O
zE8RzD0bEk4$NNT8i-s63F{#5vmjH%te@mc3^@2trTmA)b&fJdF<1;t{JR|XmnV!}S
zs4zQnTZ``@aDH;o0Vh5-z?09v14$kNNYs(~LK>8|>8O!PjW}=_IAl#3uY>5Cz0ne8
zI~4|M=7{~C+x`jw!_7riknID&Xt{7thK&*Mc&=Ya)SCx^A2MvBKT)ITAVotv2*g&Q
z9)3U3MgvBA7M0-xj4?o8F?J<2wY0Oq6xawCC*1?!kz>PkMr`wsfVaT?LFocu-MNJN
z2z(F1{S)-J^;q$zLH>t;R)@l{bOCRHDM3$=e2k?amUv-Kb-IAF0Qzi<?XG|tzzYzH
znWP7k5vx+I4jWxEP~!#NzdDI&XJP5IjNiclzM>fTet*Co#(W~0=4iZ1nmW+;-aOY)
zIs`DKk-xQ+%@%m1|A=hvZU1iIE$S4(4h{<e9eKlG8YDH`oFLqJ-t40E1Yl&2)iVyp
zgBSVaiN~SHu=7i8YC*KXI29T?wtVfO7wO?qK}?+FnNQyao<be)2g608=J%bbkv~Nb
z4V1d461G-LND@3*7UbVZfAcI}Y&`Ju0{h)7)d@JI2W>~`RN3yn`n)^x+{s8bDp4Ag
z%^m7G_6)htr|Vx>HpyR?gGw@fI>~1JN^Sp_?M1kAYlxF<A9VQYZpL%5vLp+$rTa=c
zj3X}jgMkweQK|fjaT<IVH*S&;FCW}GVZa}7B-H#~DHj<MaAY|YW_FmpU$usTPV%q<
z@4|(!9>(`noFp0B-2rKg@2^kazHBuTeUGZ*$WzqXtph;A+Qdl-6c2!*NE9^@_}YYk
zM4l+dkZtqcwTnXoRFFVkd6Tp4#QALy1#xY<JN`aYkoAL;ug=m*PH_^&w4K73K;;0C
z%kP`p;`UOJ$pv>OSm~6gj`;DA^Jde<ZOwZMRYe%-HGnys*XRUNb*O$6;8g64h*k-!
z6>HH6k)u#;33-V??hX*6N_?n(j{fE+aU@T;Z{l8h2nUdd2v$=Y{{`X!JHSUjO(phZ
zYDFM^g#c!6eFG#t05Rc52N&yXBgbi|yRpp&0GO7L3N~w*ZOvOUSWH#<PEl=4J8<NO
z%{I{MiA3MW)M(v8$~VEbqopGX6=B3)0un<6-=3fbhT^=`sKfU#k*YS7`>4GBHb_fj
z6wL{%R4Ry)hUeMPZAUg<JqEgnIu+!=EodQP8@c!_e_0?nya9~TNYD}DFU@6~+V;_@
zvIBE=$JTrh=AZ(7&}dy676geB3hjiq$W!AFw@-wrrfE~7Ajr<T9hi`GOC()D6IEW~
zm$o*KZ4kUg96clLI+eI`y2Bc_?MU-D@)1rtA!__%?<PGF<)CJ05rVy)hu)hHz$~5+
zf1mJRTXUjapfMZceX5Nw=r1ZIQbVID=9$4OOijJViL9<9YW!IZ8=LmgnNua6xaa&U
z@3=Q09E>*9{BC3p32ylXvRELf!uVMj5Q+Mc#~b;#9a-?Hu9WRmBml`(?_+iX9tAle
z8hUq@4-6njLX-D<oTzi$);wAFSmA9I1FAWyLJ|uN+X?Q{4&69MO_Z-q*aTwumFhQ1
z%VAtln}v}Eh|d~UZv3&Uy6}%RN?tsZc!fcRZqmu<le+s1+kwV@LN)R#`;<6Ts`=j}
z;mpjEmMGo&#l2{So?G=ih<Cv(9hv&P9&?qeBjSEPLJ8*O#-^sG-XzIaYH4D$Bvo^D
zDdz;5wFfnX*v*rAk59Rdw;YOfUL0R&6pzRFt@pBO$F%WTMbJ=cj%$0KNl}?zWAR1h
z2tL>GtnsHUKs3hPe~TaRK*5Memh!2u-P{6k>i#%`1Ma~^7x?fS7vZEXeD`!#4uMzx
zTbPQc+gthhvjLGI$B4e^k`9Z8u+`aq#PU?vbe4eScS*fe9Gf-iU7~B#+Jid6@~r2|
zbWcO)m%eOP6fHE`oBq@69o)kRuAKf8Y}vl)5QpWLjz7PDgE)}R;=5)Eiu|V&Xeen`
zL(HgH6~bDMJ&Dp?+)#kBjO(^=S0m;1pPH*)4>0<SMMl)HDmaOk7p}x9t_<o1A1^GK
z?)aGcq~~d)e7;7(kBxj*dFaLJ=I-Pi_JWC$XxP>)s`VpNYlWcKw5-<J{sAM?wUsCy
zeBXev&XRRv&H9`VpA$u<;n5*XgM+ro$4!Pgfe^o0zQ*+T-8__<yvIb=;ug*|7X9i&
zMD$>(#_`e(_+)#B`HF%qFEcsshH!h<?V_cgcpKN=!#_qEG%vRf^eaBrczuG%`f(Iv
zy_C>RyH@lh&1L<?^TGSDLAQR)Gt8D!!CZh>?6K477-#@zs+R^Sr-!dT*=tE=^^h3@
zA<!9ed{|zk`pcB8qyx^T;FftgGq1GN<HU<f(X|oc(E~nSE`lj)U?F$@dAp;XU}?%-
zh-N~DZ+T@U(z@QKTlAr1eZ<!Z1qSu*q^$Iqphe#J;TE1oH56CW^W$}J9vhlbSoC!g
z?b@<V0@pj~<=&YFhLpMM)fu`g%I;cqjdo*90h}o?vGOHMq*`70ywFI;S8w|_{BWL_
z;OdvV>hoFKFor_c?8i;3`^PNH)2s*Cy)F+1l*b%p3+xrnM|XL8+-lG6D{t{P?x$E&
z;o#9nsBi`eTxz7X8GNsh@+9~$Qb4-((WF3(Vdn8U*hp=N0&2k<z57r^<wJJs>kJbm
zY^P%(cS)F%W5WHN-zKB$V-C;<%B0rhp)F6LEEIEp?D9~0t2;g@LfF+B{aNGtle}F^
z$8zyqO);Qn<-0x><Ra##+T1tyxu{ga=);p{nc4+Y#y6EI3Cx+@v3TYvboVAI^jXrz
zohsJ0yreSXp+iWg&}Z*9sJthcpL_Txk;+t=l#E$NjUZSr|D?0>6VB6S6Q}XTT4g=@
z6BL?_vX=^GSKy>#E%`qE35}13J?|%rMxj?Y)<96uQ!U2V5{g8lSXVhbBMX93{*B<I
z`w86ybJ6;+^+JO8@Uq~TYWa-V>fpM_<e+>Fi92hhv+A|Eo)RT{VVR@#p`Y9ohk1Bx
z_!a5dDp$8e<+<^NNO<#xo|EEAjCN+%1L75{C!8-YzJ8Y1Xga+~dBQLnL##G_{WED?
zHx8=LWhon`d=@6iX@O7a!!`C7cs5Uc^h235+4uF%R>08J-xxZGnyqESi7YWKRrs~|
z(RIVzNS~GCQ65e<JR=p_CqL-VrB)j)5^fZPU&-#*bwxF(Mf@xli*M^zHQpc%=U_d_
zccMo~iEdJrn~ASRUiK`HrTNVTiV<6$zeMIdn7DIG*!X^;>Gj3gO5$X2vhcSqW~c-0
zhka$n3QeA7d&%WW+OM=JCKSh2)xgn5cF=O>Iw=ZakCOvF113AX>&2P-Q<~6I1w$zF
z>;#;Q^r6IZM|SCfj|}F>WOKvjGSiOEv1ep%zqi#5ac;#}GA>)geN@^o`t?L|?!Y0(
z&-tpGZaut++3E5qo|ixNUf$?^p~k9>-!Gu-dAI2`h4gI8G<7MP3t5(*;b!Q0!<>-n
zzV=N}&y{)ck(y?BmiK~e-j%IQ^8M~R$vEQ%nLKkvVlKjctU7z*)2zpcDsOc8?CZ*>
zy|fz)*YpxTO?Jc%zP_CMSW>zNA#S?hU0^-i-=TT@=ge&zo>Pipsn|N>vMvS6rNL4y
zy>m`C78Ewpm%9e4?6RW?))8_R88_A{ZA4E!$igiJn$5*#V-hUna9Krry4;~iUuCYZ
z5!&%s4|0sy{mK1}lbs@m;$`rzGmo|gjGWpWw42B8w)4h6a60xGX0}$=m)j(7K8NMG
zcJ!r@Kv%V;%;-%IuZ>=p8co)iyl{m}$Uzq$Ve>>`rCp{BCn+!7<_7MeyCUS`R`w)s
z#l^*C@y7YIAxQU0*N8G44!22fp6gx>@bMYFZ_}>0`6=C{=TQ7L{Ls-dvGZek7lmf^
z8{>r<I!=0Z;z}Ac1|P*RKVLVj%_FztbCQdCGCj`P_(r9=&GE#(GEyEa9nyLIB`Yb)
zTF>Xf;V0p8iu$-`rVsQ<f+PpdweY6{KZ%=FDX?z&V3NG@)Lp-*cl#9~KjweDOjnMc
zX#ez4TU<QPlR3A^)Ir7X!!_QaPc!_>3YDydFEb<F#HMV*KN!ft`!3x(oqg`ps|pyl
zD<sIien=pg=V3y*(ni*ET^^Ghjyv`WM|?&GiKx6sw{k4Xvs?=E7O=A#!-%(B92$P5
z?YuXeOAl(+x|70^M0h_g%g;@%s7z4OQhbTIfw@iOCZg&7vW%=Mhb}Fn#%6N1Zhzmk
z)8>=pT(tQaqOqLPIQV=?G;=YpzC)f1_hCW@>4biJqa8u2km@4f*vx&uLYQA?TVHVc
zApfL{uS<SbhUvv+yh*)(*DTBphv`2{&VVJgb!xh<J%X7G@(;HvR~BS9xZX&WemRNX
zsr77=>9R5ufyFSf2v(RiQ`Wgk&{Z?8A1)JjDkU#vCF00}&2ov>-4KV$z$=n3Uo~II
z3H$W3#?kkb9x)q_$tN)wK*c%<y6L)jYiWD6M`|RmA5ZcEJ=dDc^z0LHo`#)9GZ5FP
z0r%&wNx3s*LNo4@QS-8SJZ$mmU0BkaXQ2T<(!%ZIm1aqrd{b^?ktx2;1d`&QeZ8|M
z>Q;Tf>zV!UUMN+|-Et`?mxy@Ud0UrLVDY+w4}D(vXtRK~qTe#D)hwZWB<7oCc32P$
z6>GoP+a-dBHJc`7L#hc#C#7?7U4-oZY+XMI66IwaOe8DgG#>G+?3C{mN9xP-R+-A5
zEIejHO0t*Q@oVo~a!NmKxVq*{^lDA6Af3MVbL__~t8gJVzK~yJ4iZFOq`)X4K{!8o
zRFVBNh=bwaq<0s@!IdYm`v$(Y*37Qi%^caRZP;u|ySW@#-8bd_R!aDgO-`m+XV44N
z3$dq3y|AQ=T=12HzpnH<{d&8@g{HM{&SJCDCakegu1R-E3gN1gZ@Fi!7)8;+RnzLQ
zl60WGeqg5CeOQxe^2D=Le?7|Qx)$<{)MdOx226PDTX(PF<o<U>=a4xplglA6P86xg
z-qy3^@c^t>in3&vF;~`P+VjYs)TPr_(HdDLj!?2&@3d^nJ1DTx7FH$>ALwrLi$WaV
zJ@xh6;<_w)nN2s&dfrsQVey>7;JAT*SEbtWY5jijPnt6zsj~Moj45&w+p_eN`-m-P
zy2pyMQ6#c|Q#DdcnI)I^e#a`gV1U>{BpUCZ(5akKNZx$PUpesUbN8UT?!x1X>8pfe
z(hjaDpJ`OY;MB;+u*RPT*|{%qtz-hrnf+^mI)mEXlt?d<b+JdJO(iO)C^v;iLf^JS
zx?`oqRFT*=q3FPgOO%G|N;@1<U-NZdn~QH2R%#wh?0%VEOsrN?-5d@`)m}@<JDN`B
zxbMk6-UG{waLZEsmUFXW>22faEN`oJ%1Bp`nxz?KQfu+J`<k8e^1*q|aGv@>7|O^g
znv9TikjAdvCH-Wg+2{-Mz2DVTPAZ&ACwC65y?N!SKxyYyxRINP>oPvLbZ$y5ca!Np
z=CN=1`Hl6=sBwWMPxsCJt)ret&qu&Aw_3>F1-<8|au1snRcE{~I@Td)upzq9dG$4V
zjUcwQ`Phrpb2z-g-0})#<r%38*4ZIlFaX`-5%H#r0g@y9eBF=7wb0QXll{X#ky6$a
zcPKI=2}e?~?LJRO9X&=09^p)eV}F!<Dn56I3?`78U|*j`pS2#w!LO|x;>hHXxJ)NC
z@vFrhH`70Zv7`WAo!ZtNz_LNkL|4}@HN<(`w)T^9doDGBqoLW|Nc~#?&-dW-H;qpI
zy_H%$be!6+h_#I+z85SRCc0%c#zHxXE%l9!Br5i$C;vtxQOhG-kutAAv`qDv7wz4<
z@w@Zxil2TJTS1^sUIi{Mx?5^**2}+$FPg6255~5^YP%Dk<OgQzF0Ep{*G8PStk<R+
zGkhCOG1DD`X@(SqAj&GYWoTPyOioyt>2$~3;g*Gda&f|)Pumk(@H0}2>n%@Ive?z4
zs_=ZW18uXcvyoS`X-@Z+^suxOLi^MbKO!_wyfz%ycD+swSiXwMyTwuz?nNZ(Rnx2U
z1X3_k*(;_hMb8{6k<C9J=`Sw(HM{GYSwF8GFrs?qnfGZtA+Qc|oI3S|2<6>$B2i?H
z875a|gL)K6?_X$%%Ry#Ww?G>kr9K4}SrFrVoy?$?ok(%>2g?UNBkjX3k7wTy#S?ZZ
zaIsP?rj3oSe<o`#V^LzK8T4TFYavMJH_dh#gHt!F><#@Qa_lWB(hig2@vZ9{g+&&f
z8J6sy-7imDa)B{Zqp1scFF*QA8@PFF%qzU>>SgUjCa>a>a!I!^;cv1NmzU;R9^Mk>
zwe(QN=BzSgyD_;flZz`ZwJbi8QFxy5&EkUi)d$f%YnM|)c_?XB`>4qvTMb(`t`7uX
zvMhUnFImPtbj>y(9eszX<}Y111I%y9qR!ZZKan$LJga0vm&5L^%yj#_-P~>R`F8a~
z9u}R*MxCQG!b|*Jr{b4zN}LKLRi8uD*&E|aT*(Pnup=$UrznWt@Y6K|lXTpH8)xyw
zW5(*rgh2jN-N$!rppW_SE``EmA6lncbOysv+7h*5$1<0Q0%Pxz#j0~9mT>;Q@47(^
zDA!w<vjb1WGXLW+-m*Cvm$SDcLJI&GhyOc|@&AXn2BVv#67z-VaB#!>b&aiF8fFXt
z<G;$`7QY*M$(|Jg?5~g2J#JVG!VRH)i-iiPH?RJ~L8U%9;4bx}TWsR9+xwIbw;?gK
zrVjS?UV>SXg~g4fEN-{i_Sbn2W^S+Nu0P0UecR;83@$jqFU_Szat6`*O7q_JYtl{y
zTW{3fCObWYdVNV8EzsUNEN3%)S8K0L>WU%L;k^_-1@TnVI&eSC)4$)unGFC-x*exW
z=gM|-(~b>~dpr1GUHV>IBjQPi$A?7~YL*>(@8AsZho>WW8$t1`6b)a90QXqx9}E4Y
zUfk<&GZ61j2Ap37_At>OR<6?VT%9>hy`M8x*0%4a@Kv0zF9aF%c}=&;RQF2%;zVzC
z#I)Ct#h2!D1zeTi`$P)R&3k5zWFNYbRCl#;R|?jFEa5q1y$k+)*-VnM^1y5)pqDr9
zxM)A3I&kI7<ouqYIG=m+5ZKzZcEM|>Q168if#;~XSwzT4|98T!xvQkpL)0rhiwXEV
zfMj@0^R)Q+#i;35gM5?9?k0AVy-dUSb6Z&wMLE5K4#pKid7|d0*Lf@UYYrY>sFlyB
zc~v8Jox$wg#p>PnTi+VMmU<fqU8Pg-AVh1`*}Tg6V1i}&l>xuiT})rvG@y*r`JF2T
z+76f=>#cs9301E}iA0SLgk7N$NqvVXX;7=BUYmB<v{I<=7H^?ZN@s2s-v;>4a5R@R
L^we`NSl#{~oO0du

literal 0
HcmV?d00001

diff --git a/wiki/robotics-project-guide/humanoid-robot.md b/wiki/robotics-project-guide/humanoid-robot.md
new file mode 100644
index 00000000..4a384d6b
--- /dev/null
+++ b/wiki/robotics-project-guide/humanoid-robot.md
@@ -0,0 +1,185 @@
+---
+# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be
+# overwritten except in special circumstances. 
+# You should set the date the article was last updated like this:
+date: 2025-04-26 # YYYY-MM-DD
+# This will be displayed at the bottom of the article
+# You should set the article's title:
+title: A Comprehensive Overview of Humanoid Robot Planning, Control, and Skill Learning
+# The 'title' is automatically displayed at the top of the page
+# and used in other parts of the site.
+---
+Humanoid robots are uniquely well-suited for executing human-level tasks, as they are designed to closely replicate human motions across a variety of activities. This capability enables them to perform whole-body loco-manipulation tasks, ranging from industrial manufacturing operations to service-oriented applications. Their anthropomorphic structure resembling the human form provides a natural advantage when interacting with environments built for humans, setting them apart from other robotic platforms.
+
+Humanoids are particularly valuable for physical collaboration tasks with humans, such as jointly moving a heavy table upstairs or providing direct human assistance in daily living and healthcare scenarios. However, achieving these intricate tasks is far from straightforward. It requires managing the robot’s highly complex dynamics while ensuring safety and robustness, especially in unstructured, unpredictable environments.
+
+One promising path to address these challenges is to leverage the abundance of human-generated data including motion demonstrations, sensory feedback, and task strategies to accelerate the acquisition of motor and cognitive skills in humanoid robots. By learning from human behavior, humanoids can potentially develop adaptive, generalized capabilities more quickly. Thus, leveraging human knowledge for humanoid embodiment is seen as a fast and effective route toward achieving true embodied intelligence, bridging the gap between current robotic capabilities and natural human-like autonomy.
+
+This blog focuses on a subset of the vast humanoid robotics field. As illustrated in Figure 1 below, we specifically explore two major components critical to whole-body loco-manipulation:
+- Traditional Planning and Control Approaches
+- Emerging Learning-Based Methods
+
+Humanoid robotics spans a much larger landscape, including mechanical design, perception, and decision-making, but here we narrow the scope to planning, control, and skill learning.
+
+![Scope of This Blog: Traditional Planning and Control vs. Learning-Based Approaches](/assets/images/Humanoid%20robot.drawio.png)
+
+*Figure 1: Scope of this blog. Humanoid robots are complex systems. We focus on two key pillars: Traditional Planning and Control (multi-contact planning, model predictive control, whole-body control) and Learning-Based Approaches (reinforcement learning, imitation learning, and combined learning methods).*
+
+## Foundations of Humanoid Loco-Manipulation (HLM)
+
+Model-based methods serve as the cornerstone for enabling humanoid loco-manipulation (HLM) capabilities. These approaches depend critically on accurate physical models, which greatly influence the quality, speed, and safety guarantees of motion generation and control. Over the past decade, planning and control strategies have converged toward a predictive-reactive control hierarchy, employing a model predictive controller (MPC) at the high level and a whole-body controller (WBC) at the low level.
+
+These techniques are typically formulated as optimal control problems (OCPs) and solved using numerical optimization methods. While these methods are well-established, ongoing research continues to focus on improving computational efficiency, numerical stability, and scalability to high-dimensional humanoid systems.
+
+In parallel, learning-based approaches have witnessed a rapid surge in humanoid robotics, achieving impressive results that are attracting a growing research community. Among them, reinforcement learning (RL) has demonstrated the ability to develop robust motor skills through trial-and-error interactions. However, pure RL remains prohibitively inefficient for HLM tasks, given the high degrees of freedom and sparse reward settings typical in humanoids. To address this, RL is often trained in simulation environments and later transferred to real-world systems, though this introduces challenges in bridging the sim-to-real gap.
+
+On the other hand, imitation learning (IL) from expert demonstrations has proven to be an efficient method for acquiring diverse motor skills. Techniques such as behavior cloning (BC) have shown remarkable capabilities in mimicking a wide array of behaviors. As the quest for versatile and generalizable policies continues, researchers are increasingly focusing on scaling data.
+
+Although collecting robot experience data is highly valuable, it is expensive and time-consuming. Thus, learning from human data abundantly available from Internet videos and public datasets has emerged as a pivotal strategy for humanoid robotics. Leveraging human demonstrations is a unique advantage of humanoid embodiments, as their anthropomorphic form makes direct learning from human behavior feasible.
+
+
+## Planning and Control
+
+### Multi-Contact Planning
+
+Multi-contact planning is a fundamental and challenging aspect of humanoid loco-manipulation. Planners must compute not only robot state trajectories but also determine contact locations, timings, and forces while maintaining balance and interaction with diverse environments and objects.
+
+The field has produced significant progress over the past decade, but most state-of-the-art (SOTA) methods still rely on pre-planned contact sequences. Solving contact planning and trajectory generation simultaneously, known as contact-implicit planning (CIP), remains computationally intensive due to the combinatorial complexity of contact modes.
+
+As illustrated in Figure 2, multi-contact planning methods can be broadly categorized into three major groups:
+
+- **Search-Based Planning**:  
+  These methods expand robot states by exploring feasible contact sequences using heuristics. Techniques like Monte Carlo Tree Search and graph-based search offer practical solutions but face challenges with high computational demands and limited exploration horizons.
+
+- **Optimization-Based Planning**:  
+  Contact-Implicit Trajectory Optimization (CITO) integrates contact dynamics directly into trajectory planning, allowing simultaneous optimization of contact modes, forces, and full-body motions. While CITO has been applied to quadrupeds and robotic arms, extending it to humanoid robots in real time remains a significant challenge.
+
+- **Learning-Based Planning**:  
+  Learning-based approaches utilize reinforcement learning or supervised learning to predict contact sequences, task goals, or dynamic models, enhancing planning efficiency and enabling more flexible real-time (re)planning.
+
+Additionally, **Pose Optimization (PO)** plays a complementary role by computing optimal whole-body static or quasi-static poses to maximize interaction forces or stability during manipulation tasks. While PO techniques are effective for discrete tasks like object pushing, they are limited when it comes to dynamic, continuous motions — motivating the adoption of dynamic optimization approaches such as model predictive control.
+
+![Multi-Contact Planning Categories](/assets/images/multi_contact_planning.png)
+
+*Figure 2: An overview of multi-contact planning categories: Search-Based, Optimization-Based, and Learning-Based methods, each addressing the challenges of humanoid loco-manipulation planning with different strengths and limitations.*
+
+### Model Predictive Control for Humanoid Loco-Manipulation
+
+Model Predictive Control (MPC) has become a cornerstone of trajectory planning for humanoid loco-manipulation, valued for its flexibility in defining motion objectives, mathematical rigor, and the availability of efficient optimization solvers.
+
+MPC formulates motion planning as an Optimal Control Problem (OCP) over a finite future horizon, optimizing system states, control inputs, and constraint forces while ensuring dynamic feasibility and adherence to task-specific constraints.
+
+Depending on the dynamics model and problem structure, MPC methods generally fall into two categories:
+
+- **Simplified Models (Linear MPC)**:  
+  Simplified dynamics such as the Single Rigid-Body Model (SRBM) and the Linear Inverted Pendulum Model (LIPM) allow high-frequency online planning through efficient convex optimization. These models enable dynamic locomotion and loco-manipulation but may sacrifice modeling accuracy.
+
+- **Nonlinear Models (NMPC)**:  
+  Nonlinear MPC uses more detailed dynamics, such as Centroidal Dynamics (CD) and Whole-Body Dynamics (WBD), capturing full-body inertia and multi-contact behaviors. Although they improve motion fidelity, these models impose significant computational demands. Accelerated optimization methods like Sequential Quadratic Programming (SQP) and Differential Dynamic Programming (DDP) are often used to make NMPC practical for real-time control.
+
+### Whole-Body Control
+
+Whole-Body Control (WBC) refers to controllers that generate generalized accelerations, joint torques, and constraint forces to achieve dynamic tasks in humanoid robots. WBC is essential when trajectories are generated from reduced-order models, full-order plans are too computationally intensive to track directly, or when disturbances from environment uncertainty must be compensated in real time.
+
+The WBC solves an instantaneous control problem based on Euler-Lagrangian dynamics, optimizing decision variables such as accelerations, constraint forces, and joint torques. Due to the underactuated nature of humanoids, WBC must satisfy contact constraints while maintaining dynamic balance.
+
+Dynamic tasks within WBC are formulated as linear equations of the decision variables and can represent objectives such as:
+- Joint acceleration tracking
+- End-effector motion tracking
+- Centroidal momentum regulation
+- Capture point stabilization
+- Reaction force optimization
+- Collision avoidance
+
+Dynamic tasks can be precomputed (e.g., from MPC), generated online, or commanded interactively through teleoperation, including VR-based interfaces with haptic feedback.
+
+WBC approaches fall into two main categories:
+
+- **Closed-Form Methods**:  
+  Early methods like inverse dynamics control and operational space control project system dynamics into constraint-free manifolds, enabling efficient task tracking and strict task prioritization. However, closed-form methods struggle with incorporating inequality constraints such as joint limits and obstacle avoidance.
+
+- **Optimization-Based Methods**:  
+  Modern WBC often formulates control as a quadratic programming (QP) problem, offering flexibility to handle multiple equality and inequality tasks. Conflicting tasks are resolved through strict hierarchies or soft weightings. Optimization-based WBC allows better handling of complex task requirements and robot constraints.
+
+Both approaches have contributed to advances in humanoid loco-manipulation, with optimization-based methods gaining popularity for their robustness and extensibility.
+
+## Learning-based Approches
+### Reinforcement Learning
+
+Reinforcement Learning (RL), empowered by deep learning algorithms, offers a model-free framework to acquire complex motor skills by rewarding desirable behaviors through environment interaction. RL eliminates the need for prior expert demonstrations but introduces significant challenges related to sample efficiency, reward design, and real-world deployment.
+
+A key advantage of RL is its end-to-end nature: policies can map raw sensory input directly to actuation commands in real time. However, learning from scratch is often inefficient and sensitive to the design of reward functions. Algorithms such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) have struggled to achieve whole-body loco-manipulation in humanoids due to the high degrees of freedom, unstable dynamics, and sparse rewards typical in these tasks.
+
+Several strategies have been developed to address RL challenges:
+- **Curriculum Learning**: Gradually increases task difficulty during training to accelerate learning.
+- **Curiosity-Driven Exploration**: Encourages exploration of novel states to improve learning without heavily engineered rewards.
+- **Constrained RL**: Replaces complex reward tuning with task-specific constraints, achieving better locomotion performance.
+
+The sim-to-real gap remains a critical bottleneck in RL for humanoid robotics. While quadrupeds have achieved notable sim-to-real success due to their simpler, more stable dynamics, humanoids face greater challenges including dynamic instability, high-dimensional control, and complex manipulation environments.
+
+Common strategies to bridge the sim-to-real gap include:
+- **Domain Randomization**: Training with randomized physical parameters to improve real-world robustness, though it requires careful tuning.
+- **System Identification**: Using real-world data to refine simulation parameters for greater fidelity.
+- **Domain Adaptation**: Fine-tuning simulation-trained policies on real hardware data with safety measures in place.
+
+RL offers a pathway to learn novel, complex behaviors for humanoid loco-manipulation but remains impractical for direct real-world training due to its inefficiency and sim-to-real difficulties. Consequently, RL is predominantly used in simulation, with real-world deployment heavily relying on additional techniques like imitation learning (IL) to close the gap.
+
+## Imitation Learning from Robot Experience
+
+Imitation Learning (IL) encompasses a range of techniques, including supervised, reinforcement, and unsupervised learning methods, that train policies using expert demonstrations. IL is particularly effective for complex tasks that are difficult to specify explicitly.
+
+The process typically involves three steps: capturing expert demonstrations, retargeting those demonstrations into robot-compatible motions if needed, and training policies using the retargeted data. Demonstrations can come from different sources, with robot experience data divided into two types:  
+(i) policy execution and (ii) teleoperation.
+
+**Policy Execution**:  
+Executing expert policies, either model-based or learned, can generate training data efficiently in simulation environments. While scalable, the fidelity gap between simulation and hardware introduces sim-to-real challenges.
+
+**Teleoperation**:  
+Teleoperation enables human experts to command robot motions in real time, providing smooth, natural, and diverse demonstrations across tasks such as object manipulation, locomotion, and handovers. Full-body teleoperation, however, often requires specialized hardware like IMU suits or exoskeletons and remains challenging for capturing dynamic motions like walking.
+
+Key differences exist between teleoperation and policy execution data:
+- Teleoperation data is often **multimodal**, representing multiple valid ways to perform a task.
+- Policy execution data tends to be **unimodal**, producing more consistent behaviors.
+
+From these datasets, several IL techniques are applied:
+- **Behavior Cloning (BC)**: Casts IL as supervised learning, effective for direct skill transfer.
+- **Diffusion Policies**: Learn multimodal, highly versatile skills by modeling action distributions.
+- **Inverse Reinforcement Learning (IRL)**: Infers reward structures from expert demonstrations and retrains policies to generalize across environments.
+- **Action Chunking Transformers (ACT)**: Address distribution shift and error compounding by predicting sequences of future actions.
+
+Although collecting high-quality robot data is resource-intensive, IL remains a reliable method to achieve expert-level performance. Scaling teleoperation data collection has become a major focus for industrial labs and companies like Tesla, Toyota Research, and Sanctuary, aiming to create large, diverse datasets for training increasingly versatile robotic policies.
+
+### Combining Imitation Learning (IL) and Reinforcement Learning (RL)
+
+One effective approach to sim-to-real transfer combines imitation learning (IL) and pure reinforcement learning (RL) through a two-stage teacher-student paradigm. In this framework, a teacher policy is first trained using pure RL with privileged observations in simulation. A student policy then clones the teacher's behavior using only onboard, partial observations, making it suitable for direct deployment on real hardware.
+
+An alternative two-stage framework reverses the order: IL is used to pre-train a policy from expert demonstrations, which is then fine-tuned using RL to surpass expert-level performance and adapt to varying tasks and environments. This combination leverages the efficiency of IL with the adaptability and performance potential of RL.
+
+
+## Summary
+
+Humanoid robots offer tremendous potential for performing complex whole-body loco-manipulation tasks, but achieving robust and adaptable behavior remains a significant challenge. In this article, we explored two primary avenues: traditional model-based planning and control methods, and emerging learning-based approaches.
+
+Model-based methods, including multi-contact planning, model predictive control (MPC), and whole-body control (WBC), provide strong theoretical guarantees and precise physical feasibility but face challenges in computational scalability and adaptability to complex environments. Learning-based methods, particularly reinforcement learning (RL) and imitation learning (IL), offer flexibility and scalability but encounter issues such as sample inefficiency and the sim-to-real gap.
+
+Combining these two paradigms — using learning to enhance planning or using model-based structures to guide learning — is emerging as a powerful strategy. Techniques like curriculum learning, domain randomization, and hybrid IL-RL training frameworks show promise for bridging the gap between simulation and real-world deployment, especially for complex humanoid tasks.
+
+Going forward, advancing humanoid robotics will require innovations in both scalable learning techniques and efficient, adaptive control frameworks. Additionally, systematic approaches to address the sim-to-real gap and leverage large-scale data collection will be crucial to enable reliable humanoid performance in real-world, dynamic environments.
+
+
+## See Also:
+- ## See Also:
+- [Model Predictive Control (MPC) for Robotics](https://roboticsknowledgebase.com/wiki/actuation/model-predictive-control/)
+
+## Further Reading
+- [Sim-to-Real Transfer for Reinforcement Learning Policies in Robotics](https://arxiv.org/abs/2009.13303) — A detailed exploration of strategies to overcome the sim-to-real gap in robotic learning.
+- [Curriculum Learning in Deep Reinforcement Learning: A Survey](https://arxiv.org/abs/2003.04664) — A review on curriculum learning approaches that accelerate policy training in complex environments.
+- [How Tesla Trains its Humanoid Robot (Tesla AI Day Summary, 2022)](https://www.tesla.com/AI) — Insights into Tesla's Optimus humanoid, focusing on scalable learning from teleoperation and simulation.
+- [Deep Reinforcement Learning: Pong from Pixels (OpenAI Blog)](https://openai.com/research/pong-from-pixels) — A simple but powerful introduction to how deep RL can learn complex behavior from raw visual inputs.
+
+
+## References
+[1] Z. Gu, A. Shamsah, J. Li, W. Shen, Z. Xie, S. McCrory, R. Griffin, X. Cheng, C. K. Liu, A. Kheddar, X. B. Peng, G. Shi, X. Wang, and W. Yu, "Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning," arXiv preprint arXiv:2501.02116, 2025.
+
+[2] J. Achiam, "Spinning Up in Deep Reinforcement Learning," OpenAI, 2018. [Online]. Available: https://spinningup.openai.com/en/latest/
+

From 8d1d955221594a082a66ff5c00695dde2ac87bc8 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sat, 26 Apr 2025 20:24:07 -0400
Subject: [PATCH 06/12] Small correction in humanoid robot blog

---
 wiki/robotics-project-guide/humanoid-robot.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/wiki/robotics-project-guide/humanoid-robot.md b/wiki/robotics-project-guide/humanoid-robot.md
index 4a384d6b..8aeb44bf 100644
--- a/wiki/robotics-project-guide/humanoid-robot.md
+++ b/wiki/robotics-project-guide/humanoid-robot.md
@@ -124,7 +124,7 @@ Common strategies to bridge the sim-to-real gap include:
 
 RL offers a pathway to learn novel, complex behaviors for humanoid loco-manipulation but remains impractical for direct real-world training due to its inefficiency and sim-to-real difficulties. Consequently, RL is predominantly used in simulation, with real-world deployment heavily relying on additional techniques like imitation learning (IL) to close the gap.
 
-## Imitation Learning from Robot Experience
+### Imitation Learning from Robot Experience
 
 Imitation Learning (IL) encompasses a range of techniques, including supervised, reinforcement, and unsupervised learning methods, that train policies using expert demonstrations. IL is particularly effective for complex tasks that are difficult to specify explicitly.
 
@@ -168,7 +168,6 @@ Going forward, advancing humanoid robotics will require innovations in both scal
 
 
 ## See Also:
-- ## See Also:
 - [Model Predictive Control (MPC) for Robotics](https://roboticsknowledgebase.com/wiki/actuation/model-predictive-control/)
 
 ## Further Reading

From 67da8a84dce276ef71567e8a8dae1f638acc5a36 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sat, 26 Apr 2025 20:37:27 -0400
Subject: [PATCH 07/12] Separating Team Entry and Bonus attempt

---
 _data/navigation.yml                          |  5 --
 .../key-concepts-in-rl.md                     | 85 -------------------
 2 files changed, 90 deletions(-)
 delete mode 100644 wiki/reinforcement-learning/key-concepts-in-rl.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 979b7834..1999a7ea 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -183,11 +183,6 @@ wiki:
         url: /wiki/machine-learning/mediapipe-live-ml-anywhere.md/
       - title: NLP for robotics
         url: /wiki/machine-learning/nlp_for_robotics.md/
-  - title: Reinforcement Learning
-    url: /wiki/reinforcemnet-learning
-    children:
-      - title: Key Concepts in Reinforcemnet Learning (RL)
-        url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
   - title: State Estimation
     url: /wiki/state-estimation/
     children:
diff --git a/wiki/reinforcement-learning/key-concepts-in-rl.md b/wiki/reinforcement-learning/key-concepts-in-rl.md
deleted file mode 100644
index 6335929e..00000000
--- a/wiki/reinforcement-learning/key-concepts-in-rl.md
+++ /dev/null
@@ -1,85 +0,0 @@
----
-date: 2025-03-11 # YYYY-MM-DD
-title: Key Concepts of Reinforcement Learning
----
-
-This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time.
-
-## Main Components of Reinforcement Learning
-
-### Agent and Environment
-The agent is the learner or decision-maker, while the environment represents everything the agent interacts with. The agent receives observations from the environment and takes actions that influence the environment's state.
-
-### States and Observations
-- A **state** (s) fully describes the world at a given moment.
-- An **observation** (o) is a partial view of the state.
-- Environments can be **fully observed** (complete information) or **partially observed** (limited information).
-
-### Action Spaces
-- The **action space** defines all possible actions an agent can take.
-- **Discrete action spaces** (e.g., Atari, Go) have a finite number of actions.
-- **Continuous action spaces** (e.g., robotics control) allow real-valued actions.
-
-## Policies
-A **policy** determines how an agent selects actions based on states:
-
-- **Deterministic policy**: Always selects the same action for a given state.
-  
-  $a_t = \mu(s_t)$
-  
-- **Stochastic policy**: Samples actions from a probability distribution.
-  
-  $a_t \sim \pi(\cdot | s_t)$
-  
-
-### Example: Deterministic Policy in PyTorch
-```python
-import torch.nn as nn
-
-pi_net = nn.Sequential(
-    nn.Linear(obs_dim, 64),
-    nn.Tanh(),
-    nn.Linear(64, 64),
-    nn.Tanh(),
-    nn.Linear(64, act_dim)
-)
-```
-
-## Trajectories
-A **trajectory (\tau)** is a sequence of states and actions:
-```math
-\tau = (s_0, a_0, s_1, a_1, ...)
-```
-State transitions follow deterministic or stochastic rules:
-```math
-s_{t+1} = f(s_t, a_t)
-```
-or
-```math
-s_{t+1} \sim P(\cdot|s_t, a_t)
-```
-
-## Reward and Return
-The **reward function (R)** determines the agent's objective:
-```math
-r_t = R(s_t, a_t, s_{t+1})
-```
-### Types of Return
-1. **Finite-horizon undiscounted return**:
-   ```math
-   R(\tau) = \sum_{t=0}^T r_t
-   ```
-2. **Infinite-horizon discounted return**:
-   ```math
-   R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t
-   ```
-   where \( \gamma \) (discount factor) balances immediate vs. future rewards.
-
-## Summary
-This tutorial introduced fundamental RL concepts, including agents, environments, policies, action spaces, trajectories, and rewards. These components are essential for designing RL algorithms.
-
-## Further Reading
-- Sutton, R. S., & Barto, A. G. (2018). *Reinforcement Learning: An Introduction*.
-
-## References
-- [Reinforcement Learning Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning)

From fb3bb6d044a04dca5737d46751072d73adbf8445 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sat, 26 Apr 2025 20:55:50 -0400
Subject: [PATCH 08/12] Separated Team entry and Bonus in different branch

---
 _data/navigation.yml                          |  5 ++
 .../key-concepts-in-rl.md                     | 85 +++++++++++++++++++
 2 files changed, 90 insertions(+)
 create mode 100644 wiki/reinforcement-learning/key-concepts-in-rl.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 1999a7ea..979b7834 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -183,6 +183,11 @@ wiki:
         url: /wiki/machine-learning/mediapipe-live-ml-anywhere.md/
       - title: NLP for robotics
         url: /wiki/machine-learning/nlp_for_robotics.md/
+  - title: Reinforcement Learning
+    url: /wiki/reinforcemnet-learning
+    children:
+      - title: Key Concepts in Reinforcemnet Learning (RL)
+        url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
   - title: State Estimation
     url: /wiki/state-estimation/
     children:
diff --git a/wiki/reinforcement-learning/key-concepts-in-rl.md b/wiki/reinforcement-learning/key-concepts-in-rl.md
new file mode 100644
index 00000000..6335929e
--- /dev/null
+++ b/wiki/reinforcement-learning/key-concepts-in-rl.md
@@ -0,0 +1,85 @@
+---
+date: 2025-03-11 # YYYY-MM-DD
+title: Key Concepts of Reinforcement Learning
+---
+
+This tutorial provides an introduction to the fundamental concepts of Reinforcement Learning (RL). RL involves an agent interacting with an environment to learn optimal behaviors through trial and feedback. The main objective of RL is to maximize cumulative rewards over time.
+
+## Main Components of Reinforcement Learning
+
+### Agent and Environment
+The agent is the learner or decision-maker, while the environment represents everything the agent interacts with. The agent receives observations from the environment and takes actions that influence the environment's state.
+
+### States and Observations
+- A **state** (s) fully describes the world at a given moment.
+- An **observation** (o) is a partial view of the state.
+- Environments can be **fully observed** (complete information) or **partially observed** (limited information).
+
+### Action Spaces
+- The **action space** defines all possible actions an agent can take.
+- **Discrete action spaces** (e.g., Atari, Go) have a finite number of actions.
+- **Continuous action spaces** (e.g., robotics control) allow real-valued actions.
+
+## Policies
+A **policy** determines how an agent selects actions based on states:
+
+- **Deterministic policy**: Always selects the same action for a given state.
+  
+  $a_t = \mu(s_t)$
+  
+- **Stochastic policy**: Samples actions from a probability distribution.
+  
+  $a_t \sim \pi(\cdot | s_t)$
+  
+
+### Example: Deterministic Policy in PyTorch
+```python
+import torch.nn as nn
+
+pi_net = nn.Sequential(
+    nn.Linear(obs_dim, 64),
+    nn.Tanh(),
+    nn.Linear(64, 64),
+    nn.Tanh(),
+    nn.Linear(64, act_dim)
+)
+```
+
+## Trajectories
+A **trajectory (\tau)** is a sequence of states and actions:
+```math
+\tau = (s_0, a_0, s_1, a_1, ...)
+```
+State transitions follow deterministic or stochastic rules:
+```math
+s_{t+1} = f(s_t, a_t)
+```
+or
+```math
+s_{t+1} \sim P(\cdot|s_t, a_t)
+```
+
+## Reward and Return
+The **reward function (R)** determines the agent's objective:
+```math
+r_t = R(s_t, a_t, s_{t+1})
+```
+### Types of Return
+1. **Finite-horizon undiscounted return**:
+   ```math
+   R(\tau) = \sum_{t=0}^T r_t
+   ```
+2. **Infinite-horizon discounted return**:
+   ```math
+   R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t
+   ```
+   where \( \gamma \) (discount factor) balances immediate vs. future rewards.
+
+## Summary
+This tutorial introduced fundamental RL concepts, including agents, environments, policies, action spaces, trajectories, and rewards. These components are essential for designing RL algorithms.
+
+## Further Reading
+- Sutton, R. S., & Barto, A. G. (2018). *Reinforcement Learning: An Introduction*.
+
+## References
+- [Reinforcement Learning Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning)

From d605640bd915ce2cbd1c22f6c7086d2991a0e4aa Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sat, 26 Apr 2025 20:56:08 -0400
Subject: [PATCH 09/12] Separated Team entry and Bonus in different branch

---
 _data/navigation.yml                          |   2 -
 wiki/robotics-project-guide/humanoid-robot.md | 184 ------------------
 2 files changed, 186 deletions(-)
 delete mode 100644 wiki/robotics-project-guide/humanoid-robot.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 979b7834..6c7408d9 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -29,8 +29,6 @@ wiki:
         url: /wiki/robotics-project-guide/test-and-debug/
       - title: Demo day!
         url: /wiki/robotics-project-guide/demo-day/
-      - title: A Comprehensive Overview of Humanoid Robot Planning, Control, and Skill Learning
-        url: /wiki/robotics-project-guide/humanoid-robot.md
   - title: System Design & Development
     url: /wiki/system-design-development/
     children:
diff --git a/wiki/robotics-project-guide/humanoid-robot.md b/wiki/robotics-project-guide/humanoid-robot.md
deleted file mode 100644
index 8aeb44bf..00000000
--- a/wiki/robotics-project-guide/humanoid-robot.md
+++ /dev/null
@@ -1,184 +0,0 @@
----
-# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be
-# overwritten except in special circumstances. 
-# You should set the date the article was last updated like this:
-date: 2025-04-26 # YYYY-MM-DD
-# This will be displayed at the bottom of the article
-# You should set the article's title:
-title: A Comprehensive Overview of Humanoid Robot Planning, Control, and Skill Learning
-# The 'title' is automatically displayed at the top of the page
-# and used in other parts of the site.
----
-Humanoid robots are uniquely well-suited for executing human-level tasks, as they are designed to closely replicate human motions across a variety of activities. This capability enables them to perform whole-body loco-manipulation tasks, ranging from industrial manufacturing operations to service-oriented applications. Their anthropomorphic structure resembling the human form provides a natural advantage when interacting with environments built for humans, setting them apart from other robotic platforms.
-
-Humanoids are particularly valuable for physical collaboration tasks with humans, such as jointly moving a heavy table upstairs or providing direct human assistance in daily living and healthcare scenarios. However, achieving these intricate tasks is far from straightforward. It requires managing the robot’s highly complex dynamics while ensuring safety and robustness, especially in unstructured, unpredictable environments.
-
-One promising path to address these challenges is to leverage the abundance of human-generated data including motion demonstrations, sensory feedback, and task strategies to accelerate the acquisition of motor and cognitive skills in humanoid robots. By learning from human behavior, humanoids can potentially develop adaptive, generalized capabilities more quickly. Thus, leveraging human knowledge for humanoid embodiment is seen as a fast and effective route toward achieving true embodied intelligence, bridging the gap between current robotic capabilities and natural human-like autonomy.
-
-This blog focuses on a subset of the vast humanoid robotics field. As illustrated in Figure 1 below, we specifically explore two major components critical to whole-body loco-manipulation:
-- Traditional Planning and Control Approaches
-- Emerging Learning-Based Methods
-
-Humanoid robotics spans a much larger landscape, including mechanical design, perception, and decision-making, but here we narrow the scope to planning, control, and skill learning.
-
-![Scope of This Blog: Traditional Planning and Control vs. Learning-Based Approaches](/assets/images/Humanoid%20robot.drawio.png)
-
-*Figure 1: Scope of this blog. Humanoid robots are complex systems. We focus on two key pillars: Traditional Planning and Control (multi-contact planning, model predictive control, whole-body control) and Learning-Based Approaches (reinforcement learning, imitation learning, and combined learning methods).*
-
-## Foundations of Humanoid Loco-Manipulation (HLM)
-
-Model-based methods serve as the cornerstone for enabling humanoid loco-manipulation (HLM) capabilities. These approaches depend critically on accurate physical models, which greatly influence the quality, speed, and safety guarantees of motion generation and control. Over the past decade, planning and control strategies have converged toward a predictive-reactive control hierarchy, employing a model predictive controller (MPC) at the high level and a whole-body controller (WBC) at the low level.
-
-These techniques are typically formulated as optimal control problems (OCPs) and solved using numerical optimization methods. While these methods are well-established, ongoing research continues to focus on improving computational efficiency, numerical stability, and scalability to high-dimensional humanoid systems.
-
-In parallel, learning-based approaches have witnessed a rapid surge in humanoid robotics, achieving impressive results that are attracting a growing research community. Among them, reinforcement learning (RL) has demonstrated the ability to develop robust motor skills through trial-and-error interactions. However, pure RL remains prohibitively inefficient for HLM tasks, given the high degrees of freedom and sparse reward settings typical in humanoids. To address this, RL is often trained in simulation environments and later transferred to real-world systems, though this introduces challenges in bridging the sim-to-real gap.
-
-On the other hand, imitation learning (IL) from expert demonstrations has proven to be an efficient method for acquiring diverse motor skills. Techniques such as behavior cloning (BC) have shown remarkable capabilities in mimicking a wide array of behaviors. As the quest for versatile and generalizable policies continues, researchers are increasingly focusing on scaling data.
-
-Although collecting robot experience data is highly valuable, it is expensive and time-consuming. Thus, learning from human data abundantly available from Internet videos and public datasets has emerged as a pivotal strategy for humanoid robotics. Leveraging human demonstrations is a unique advantage of humanoid embodiments, as their anthropomorphic form makes direct learning from human behavior feasible.
-
-
-## Planning and Control
-
-### Multi-Contact Planning
-
-Multi-contact planning is a fundamental and challenging aspect of humanoid loco-manipulation. Planners must compute not only robot state trajectories but also determine contact locations, timings, and forces while maintaining balance and interaction with diverse environments and objects.
-
-The field has produced significant progress over the past decade, but most state-of-the-art (SOTA) methods still rely on pre-planned contact sequences. Solving contact planning and trajectory generation simultaneously, known as contact-implicit planning (CIP), remains computationally intensive due to the combinatorial complexity of contact modes.
-
-As illustrated in Figure 2, multi-contact planning methods can be broadly categorized into three major groups:
-
-- **Search-Based Planning**:  
-  These methods expand robot states by exploring feasible contact sequences using heuristics. Techniques like Monte Carlo Tree Search and graph-based search offer practical solutions but face challenges with high computational demands and limited exploration horizons.
-
-- **Optimization-Based Planning**:  
-  Contact-Implicit Trajectory Optimization (CITO) integrates contact dynamics directly into trajectory planning, allowing simultaneous optimization of contact modes, forces, and full-body motions. While CITO has been applied to quadrupeds and robotic arms, extending it to humanoid robots in real time remains a significant challenge.
-
-- **Learning-Based Planning**:  
-  Learning-based approaches utilize reinforcement learning or supervised learning to predict contact sequences, task goals, or dynamic models, enhancing planning efficiency and enabling more flexible real-time (re)planning.
-
-Additionally, **Pose Optimization (PO)** plays a complementary role by computing optimal whole-body static or quasi-static poses to maximize interaction forces or stability during manipulation tasks. While PO techniques are effective for discrete tasks like object pushing, they are limited when it comes to dynamic, continuous motions — motivating the adoption of dynamic optimization approaches such as model predictive control.
-
-![Multi-Contact Planning Categories](/assets/images/multi_contact_planning.png)
-
-*Figure 2: An overview of multi-contact planning categories: Search-Based, Optimization-Based, and Learning-Based methods, each addressing the challenges of humanoid loco-manipulation planning with different strengths and limitations.*
-
-### Model Predictive Control for Humanoid Loco-Manipulation
-
-Model Predictive Control (MPC) has become a cornerstone of trajectory planning for humanoid loco-manipulation, valued for its flexibility in defining motion objectives, mathematical rigor, and the availability of efficient optimization solvers.
-
-MPC formulates motion planning as an Optimal Control Problem (OCP) over a finite future horizon, optimizing system states, control inputs, and constraint forces while ensuring dynamic feasibility and adherence to task-specific constraints.
-
-Depending on the dynamics model and problem structure, MPC methods generally fall into two categories:
-
-- **Simplified Models (Linear MPC)**:  
-  Simplified dynamics such as the Single Rigid-Body Model (SRBM) and the Linear Inverted Pendulum Model (LIPM) allow high-frequency online planning through efficient convex optimization. These models enable dynamic locomotion and loco-manipulation but may sacrifice modeling accuracy.
-
-- **Nonlinear Models (NMPC)**:  
-  Nonlinear MPC uses more detailed dynamics, such as Centroidal Dynamics (CD) and Whole-Body Dynamics (WBD), capturing full-body inertia and multi-contact behaviors. Although they improve motion fidelity, these models impose significant computational demands. Accelerated optimization methods like Sequential Quadratic Programming (SQP) and Differential Dynamic Programming (DDP) are often used to make NMPC practical for real-time control.
-
-### Whole-Body Control
-
-Whole-Body Control (WBC) refers to controllers that generate generalized accelerations, joint torques, and constraint forces to achieve dynamic tasks in humanoid robots. WBC is essential when trajectories are generated from reduced-order models, full-order plans are too computationally intensive to track directly, or when disturbances from environment uncertainty must be compensated in real time.
-
-The WBC solves an instantaneous control problem based on Euler-Lagrangian dynamics, optimizing decision variables such as accelerations, constraint forces, and joint torques. Due to the underactuated nature of humanoids, WBC must satisfy contact constraints while maintaining dynamic balance.
-
-Dynamic tasks within WBC are formulated as linear equations of the decision variables and can represent objectives such as:
-- Joint acceleration tracking
-- End-effector motion tracking
-- Centroidal momentum regulation
-- Capture point stabilization
-- Reaction force optimization
-- Collision avoidance
-
-Dynamic tasks can be precomputed (e.g., from MPC), generated online, or commanded interactively through teleoperation, including VR-based interfaces with haptic feedback.
-
-WBC approaches fall into two main categories:
-
-- **Closed-Form Methods**:  
-  Early methods like inverse dynamics control and operational space control project system dynamics into constraint-free manifolds, enabling efficient task tracking and strict task prioritization. However, closed-form methods struggle with incorporating inequality constraints such as joint limits and obstacle avoidance.
-
-- **Optimization-Based Methods**:  
-  Modern WBC often formulates control as a quadratic programming (QP) problem, offering flexibility to handle multiple equality and inequality tasks. Conflicting tasks are resolved through strict hierarchies or soft weightings. Optimization-based WBC allows better handling of complex task requirements and robot constraints.
-
-Both approaches have contributed to advances in humanoid loco-manipulation, with optimization-based methods gaining popularity for their robustness and extensibility.
-
-## Learning-based Approches
-### Reinforcement Learning
-
-Reinforcement Learning (RL), empowered by deep learning algorithms, offers a model-free framework to acquire complex motor skills by rewarding desirable behaviors through environment interaction. RL eliminates the need for prior expert demonstrations but introduces significant challenges related to sample efficiency, reward design, and real-world deployment.
-
-A key advantage of RL is its end-to-end nature: policies can map raw sensory input directly to actuation commands in real time. However, learning from scratch is often inefficient and sensitive to the design of reward functions. Algorithms such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) have struggled to achieve whole-body loco-manipulation in humanoids due to the high degrees of freedom, unstable dynamics, and sparse rewards typical in these tasks.
-
-Several strategies have been developed to address RL challenges:
-- **Curriculum Learning**: Gradually increases task difficulty during training to accelerate learning.
-- **Curiosity-Driven Exploration**: Encourages exploration of novel states to improve learning without heavily engineered rewards.
-- **Constrained RL**: Replaces complex reward tuning with task-specific constraints, achieving better locomotion performance.
-
-The sim-to-real gap remains a critical bottleneck in RL for humanoid robotics. While quadrupeds have achieved notable sim-to-real success due to their simpler, more stable dynamics, humanoids face greater challenges including dynamic instability, high-dimensional control, and complex manipulation environments.
-
-Common strategies to bridge the sim-to-real gap include:
-- **Domain Randomization**: Training with randomized physical parameters to improve real-world robustness, though it requires careful tuning.
-- **System Identification**: Using real-world data to refine simulation parameters for greater fidelity.
-- **Domain Adaptation**: Fine-tuning simulation-trained policies on real hardware data with safety measures in place.
-
-RL offers a pathway to learn novel, complex behaviors for humanoid loco-manipulation but remains impractical for direct real-world training due to its inefficiency and sim-to-real difficulties. Consequently, RL is predominantly used in simulation, with real-world deployment heavily relying on additional techniques like imitation learning (IL) to close the gap.
-
-### Imitation Learning from Robot Experience
-
-Imitation Learning (IL) encompasses a range of techniques, including supervised, reinforcement, and unsupervised learning methods, that train policies using expert demonstrations. IL is particularly effective for complex tasks that are difficult to specify explicitly.
-
-The process typically involves three steps: capturing expert demonstrations, retargeting those demonstrations into robot-compatible motions if needed, and training policies using the retargeted data. Demonstrations can come from different sources, with robot experience data divided into two types:  
-(i) policy execution and (ii) teleoperation.
-
-**Policy Execution**:  
-Executing expert policies, either model-based or learned, can generate training data efficiently in simulation environments. While scalable, the fidelity gap between simulation and hardware introduces sim-to-real challenges.
-
-**Teleoperation**:  
-Teleoperation enables human experts to command robot motions in real time, providing smooth, natural, and diverse demonstrations across tasks such as object manipulation, locomotion, and handovers. Full-body teleoperation, however, often requires specialized hardware like IMU suits or exoskeletons and remains challenging for capturing dynamic motions like walking.
-
-Key differences exist between teleoperation and policy execution data:
-- Teleoperation data is often **multimodal**, representing multiple valid ways to perform a task.
-- Policy execution data tends to be **unimodal**, producing more consistent behaviors.
-
-From these datasets, several IL techniques are applied:
-- **Behavior Cloning (BC)**: Casts IL as supervised learning, effective for direct skill transfer.
-- **Diffusion Policies**: Learn multimodal, highly versatile skills by modeling action distributions.
-- **Inverse Reinforcement Learning (IRL)**: Infers reward structures from expert demonstrations and retrains policies to generalize across environments.
-- **Action Chunking Transformers (ACT)**: Address distribution shift and error compounding by predicting sequences of future actions.
-
-Although collecting high-quality robot data is resource-intensive, IL remains a reliable method to achieve expert-level performance. Scaling teleoperation data collection has become a major focus for industrial labs and companies like Tesla, Toyota Research, and Sanctuary, aiming to create large, diverse datasets for training increasingly versatile robotic policies.
-
-### Combining Imitation Learning (IL) and Reinforcement Learning (RL)
-
-One effective approach to sim-to-real transfer combines imitation learning (IL) and pure reinforcement learning (RL) through a two-stage teacher-student paradigm. In this framework, a teacher policy is first trained using pure RL with privileged observations in simulation. A student policy then clones the teacher's behavior using only onboard, partial observations, making it suitable for direct deployment on real hardware.
-
-An alternative two-stage framework reverses the order: IL is used to pre-train a policy from expert demonstrations, which is then fine-tuned using RL to surpass expert-level performance and adapt to varying tasks and environments. This combination leverages the efficiency of IL with the adaptability and performance potential of RL.
-
-
-## Summary
-
-Humanoid robots offer tremendous potential for performing complex whole-body loco-manipulation tasks, but achieving robust and adaptable behavior remains a significant challenge. In this article, we explored two primary avenues: traditional model-based planning and control methods, and emerging learning-based approaches.
-
-Model-based methods, including multi-contact planning, model predictive control (MPC), and whole-body control (WBC), provide strong theoretical guarantees and precise physical feasibility but face challenges in computational scalability and adaptability to complex environments. Learning-based methods, particularly reinforcement learning (RL) and imitation learning (IL), offer flexibility and scalability but encounter issues such as sample inefficiency and the sim-to-real gap.
-
-Combining these two paradigms — using learning to enhance planning or using model-based structures to guide learning — is emerging as a powerful strategy. Techniques like curriculum learning, domain randomization, and hybrid IL-RL training frameworks show promise for bridging the gap between simulation and real-world deployment, especially for complex humanoid tasks.
-
-Going forward, advancing humanoid robotics will require innovations in both scalable learning techniques and efficient, adaptive control frameworks. Additionally, systematic approaches to address the sim-to-real gap and leverage large-scale data collection will be crucial to enable reliable humanoid performance in real-world, dynamic environments.
-
-
-## See Also:
-- [Model Predictive Control (MPC) for Robotics](https://roboticsknowledgebase.com/wiki/actuation/model-predictive-control/)
-
-## Further Reading
-- [Sim-to-Real Transfer for Reinforcement Learning Policies in Robotics](https://arxiv.org/abs/2009.13303) — A detailed exploration of strategies to overcome the sim-to-real gap in robotic learning.
-- [Curriculum Learning in Deep Reinforcement Learning: A Survey](https://arxiv.org/abs/2003.04664) — A review on curriculum learning approaches that accelerate policy training in complex environments.
-- [How Tesla Trains its Humanoid Robot (Tesla AI Day Summary, 2022)](https://www.tesla.com/AI) — Insights into Tesla's Optimus humanoid, focusing on scalable learning from teleoperation and simulation.
-- [Deep Reinforcement Learning: Pong from Pixels (OpenAI Blog)](https://openai.com/research/pong-from-pixels) — A simple but powerful introduction to how deep RL can learn complex behavior from raw visual inputs.
-
-
-## References
-[1] Z. Gu, A. Shamsah, J. Li, W. Shen, Z. Xie, S. McCrory, R. Griffin, X. Cheng, C. K. Liu, A. Kheddar, X. B. Peng, G. Shi, X. Wang, and W. Yu, "Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning," arXiv preprint arXiv:2501.02116, 2025.
-
-[2] J. Achiam, "Spinning Up in Deep Reinforcement Learning," OpenAI, 2018. [Online]. Available: https://spinningup.openai.com/en/latest/
-

From db9df9d9dc6cd2d644eb692e24ac50aef0acff22 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sun, 4 May 2025 08:20:39 -0400
Subject: [PATCH 10/12] A Taxonomy of Reinforcement Learning Algorithms

---
 _data/navigation.yml                          |   2 +
 .../reinforcement-learning-algorithms.md      | 177 ++++++++++++++++++
 2 files changed, 179 insertions(+)
 create mode 100644 wiki/reinforcement-learning/reinforcement-learning-algorithms.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 6c7408d9..83992c34 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -186,6 +186,8 @@ wiki:
     children:
       - title: Key Concepts in Reinforcemnet Learning (RL)
         url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
+      - title: Reinforcement Learning Algorithms
+        url: /wiki/reinforcemnet-learning/reinforcement-learning-algorithms/
   - title: State Estimation
     url: /wiki/state-estimation/
     children:
diff --git a/wiki/reinforcement-learning/reinforcement-learning-algorithms.md b/wiki/reinforcement-learning/reinforcement-learning-algorithms.md
new file mode 100644
index 00000000..4187bbbb
--- /dev/null
+++ b/wiki/reinforcement-learning/reinforcement-learning-algorithms.md
@@ -0,0 +1,177 @@
+---
+date: 2025-05-04
+title: A Taxonomy of Reinforcement Learning Algorithms
+---
+
+Reinforcement Learning (RL) is a foundational paradigm in artificial intelligence where agents learn to make decisions through trial and error, guided by rewards. Over the years, a rich variety of RL algorithms have been developed, each differing in the way they represent knowledge, interact with the environment, and generalize from data. This article presents a high-level taxonomy of RL algorithms with an emphasis on design trade-offs, learning objectives, and algorithmic categories. The goal is to provide a structured guide to the RL landscape for students and practitioners.
+
+## Model-Based vs Model-Free Reinforcement Learning
+
+One of the most fundamental distinctions among RL algorithms is whether or not the algorithm uses a model of the environment's dynamics.
+
+### Model-Free RL
+
+Model-free algorithms do not attempt to learn or use an internal model of the environment. Instead, they learn policies or value functions directly from experience. These methods are typically simpler to implement and tune, making them more widely adopted in practice.
+
+**Key Advantages:**
+- Easier to apply when the environment is complex or high-dimensional.
+- No need for a simulator or model-learning pipeline.
+
+**Drawbacks:**
+- High sample complexity: requires many interactions with the real or simulated environment.
+- Cannot perform planning or imagination-based reasoning.
+
+**Examples:**
+- **DQN (Deep Q-Networks)**: First to combine Q-learning with deep networks for Atari games.
+- **PPO (Proximal Policy Optimization)**: A robust policy gradient method widely used in robotics and games.
+
+### Model-Based RL
+
+In contrast, model-based algorithms explicitly learn or use a model of the environment that predicts future states and rewards. The agent can then plan ahead by simulating trajectories using this model.
+
+**Key Advantages:**
+- Better sample efficiency through planning and simulation.
+- Can separate learning from data collection, enabling "dream-based" training.
+
+**Challenges:**
+- Learning accurate models is difficult.
+- Errors in the model can lead to compounding errors during planning.
+
+**Use Cases:**
+- High-stakes environments where sample efficiency is critical.
+- Scenarios requiring imagination or foresight (e.g., robotics, strategic games).
+
+**Examples:**
+- **MBVE (Model-Based Value Expansion)**: Uses a learned model to expand the value estimate of real transitions.
+- **AlphaZero**: Combines MCTS with learned value/policy networks to dominate board games.
+
+## What to Learn: Policy, Value, Q, or Model?
+
+RL algorithms also differ based on what the agent is trying to learn:
+
+- **Policy** $\pi_\theta(a|s)$: A mapping from state to action, either deterministic or stochastic.
+- **Value function** $V^\pi(s)$: The expected return starting from state $s$ under policy $\pi$.
+- **Action-Value (Q) function** $Q^\pi(s, a)$: The expected return starting from state $s$ taking action $a$, then following $\pi$.
+- **Model**: A transition function $f(s, a) \rightarrow s'$ and reward predictor $r(s, a)$.
+
+### Model-Free Learning Strategies
+
+#### 1. Policy Optimization
+
+These algorithms directly optimize the parameters of a policy using gradient ascent on a performance objective:
+
+$$
+J(\pi_\theta) = \mathbb{E}_{\pi_\theta} \left[ \sum_{t=0}^\infty \gamma^t r_t \right]
+$$
+
+They often require estimating the advantage function or value function to reduce variance.
+
+**Characteristics:**
+- **On-policy**: Data must come from the current policy.
+- **Stable and robust**: Optimizes directly for performance.
+
+**Popular Methods:**
+- **A2C / A3C (Asynchronous Advantage Actor-Critic)**: Learns both policy and value function in parallel.
+- **PPO (Proximal Policy Optimization)**: Ensures stable updates with clipped surrogate objectives.
+- **TRPO (Trust Region Policy Optimization)**: Uses trust regions to prevent catastrophic policy changes.
+
+#### 2. Q-Learning
+
+Instead of learning a policy directly, Q-learning methods aim to learn the optimal action-value function:
+
+$$
+Q^*(s, a) = \mathbb{E} \left[ r + \gamma \max_{a'} Q^*(s', a') \right]
+$$
+
+Once $Q^*(s, a)$ is known, the policy is derived via:
+
+$$
+\pi(s) = \arg\max_a Q^*(s, a)
+$$
+
+**Characteristics:**
+- **Off-policy**: Can use data from any past policy.
+- **Data-efficient**, but prone to instability.
+
+**Variants:**
+- **DQN**: Introduced experience replay and target networks.
+- **C51 / QR-DQN**: Learn a distribution over returns, not just the mean.
+
+> **Trade-Off**: Policy gradient methods are more stable and principled; Q-learning methods are more sample-efficient but harder to stabilize due to the "deadly triad": function approximation, bootstrapping, and off-policy updates.
+
+#### Hybrid Algorithms
+
+Some methods blend policy optimization and Q-learning:
+
+- **DDPG (Deep Deterministic Policy Gradient)**: Actor-Critic method with off-policy Q-learning and deterministic policies.
+- **TD3 (Twin Delayed DDPG)**: Addresses overestimation bias in DDPG.
+- **SAC (Soft Actor-Critic)**: Adds entropy regularization to encourage exploration and stabilize learning.
+
+### Model-Based Learning Strategies
+
+Model-based RL allows a variety of architectures and learning techniques.
+
+#### 1. Pure Planning (e.g., MPC)
+
+The agent uses a learned or known model to plan a trajectory and execute the first action, then replan. No policy is explicitly learned.
+
+#### 2. Expert Iteration (ExIt)
+
+Combines planning and learning. Planning (e.g., via MCTS) provides strong actions ("experts"), which are used to train a policy via supervised learning.
+
+- **AlphaZero**: Exemplifies this method by using MCTS and neural nets in self-play.
+
+#### 3. Data Augmentation
+
+The learned model is used to synthesize additional training data.
+
+- **MBVE**: Augments true experiences with simulated rollouts.
+- **World Models**: Trains entirely on imagined data ("dreaming").
+
+#### 4. Imagination-Augmented Agents (I2A)
+
+Here, planning is embedded as a subroutine inside the policy network. The policy learns when and how to use imagination.
+
+> This technique can mitigate model bias because the policy can learn to ignore poor planning results.
+
+## Summary
+
+The landscape of RL algorithms is broad and evolving, but organizing them into categories based on model usage and learning targets helps build intuition:
+
+| Dimension               | Model-Free RL            | Model-Based RL                          |
+|------------------------|--------------------------|------------------------------------------|
+| Sample Efficiency      | Low                      | High                                     |
+| Stability              | High (Policy Gradient)   | Variable (depends on model quality)      |
+| Planning Capability    | None                     | Yes (MPC, MCTS, ExIt)                    |
+| Real-World Deployment  | Slower                   | Faster (if model is accurate)            |
+| Representative Methods | DQN, PPO, A2C            | AlphaZero, MBVE, World Models, I2A       |
+
+Understanding these trade-offs is key to selecting or designing an RL algorithm for your application.
+
+
+## Further Reading
+- [Spinning Up in Deep RL – OpenAI](https://spinningup.openai.com/en/latest/)
+- [RL Course by David Silver](https://www.davidsilver.uk/teaching/)
+- [RL Book – Sutton and Barto (2nd ed.)](http://incompleteideas.net/book/the-book-2nd.html)
+- [CS285: Deep Reinforcement Learning – UC Berkeley (Sergey Levine)](https://rail.eecs.berkeley.edu/deeprlcourse/)
+- [Deep RL Bootcamp (2017) – Stanford](https://sites.google.com/view/deep-rl-bootcamp/lectures)
+- [Lil’Log – Reinforcement Learning Series by Lilian Weng](https://lilianweng.github.io/lil-log/)
+- [RL Algorithms – Denny Britz’s GitHub](https://github.com/dennybritz/reinforcement-learning)
+- [Reinforcement Learning Zoo – A curated collection of RL papers and code](https://github.com/instillai/reinforcement-learning-zoo)
+- [Distill: Visualizing Reinforcement Learning](https://distill.pub/2019/visual-exploration/)
+- [Deep Reinforcement Learning Nanodegree – Udacity](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893)
+- [Reinforcement Learning: State-of-the-Art (2019) – Arulkumaran et al.](https://arxiv.org/abs/1701.07274)
+- [The RL Baselines3 Zoo – PyTorch Implementations of Popular RL Algorithms](https://github.com/DLR-RM/rl-baselines3-zoo)
+
+
+## References
+- [2] V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” ICML, 2016.
+- [3] J. Schulman et al., “Proximal Policy Optimization Algorithms,” arXiv:1707.06347, 2017.
+- [5] T. Lillicrap et al., “Continuous Control with Deep Reinforcement Learning,” ICLR, 2016.
+- [7] T. Haarnoja et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL,” ICML, 2018.
+- [8] V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” NIPS Deep Learning Workshop, 2013.
+- [9] M. Bellemare et al., “A Distributional Perspective on Reinforcement Learning,” ICML, 2017.
+- [12] D. Ha and J. Schmidhuber, “World Models,” arXiv:1803.10122, 2018.
+- [13] T. Weber et al., “Imagination-Augmented Agents,” NIPS, 2017.
+- [14] A. Nagabandi et al., “Neural Network Dynamics for Model-Based Deep RL,” CoRL, 2017.
+- [16] D. Silver et al., “Mastering the Game of Go without Human Knowledge,” Nature, 2017.

From 78c08e5557ad058a3fa8eeadab7853c59189f615 Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sun, 4 May 2025 08:32:12 -0400
Subject: [PATCH 11/12] Proximal Policy Optimization (PPO): Concepts, Theory,
 and Insights

---
 _data/navigation.yml                          |   2 +
 .../intro-to-policy-gradient-methods.md       | 152 ++++++++++++++++++
 2 files changed, 154 insertions(+)
 create mode 100644 wiki/reinforcement-learning/intro-to-policy-gradient-methods.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 83992c34..0bea0416 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -188,6 +188,8 @@ wiki:
         url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
       - title: Reinforcement Learning Algorithms
         url: /wiki/reinforcemnet-learning/reinforcement-learning-algorithms/
+      -title: Policy Gradient Methods
+        url: /wiki/reinforcemnet-learning/intro-to-policy-gradient-methods/
   - title: State Estimation
     url: /wiki/state-estimation/
     children:
diff --git a/wiki/reinforcement-learning/intro-to-policy-gradient-methods.md b/wiki/reinforcement-learning/intro-to-policy-gradient-methods.md
new file mode 100644
index 00000000..b4857545
--- /dev/null
+++ b/wiki/reinforcement-learning/intro-to-policy-gradient-methods.md
@@ -0,0 +1,152 @@
+---
+date: 2025-05-04
+title: Proximal Policy Optimization (PPO): Concepts, Theory, and Insights
+---
+
+Proximal Policy Optimization (PPO) is one of the most widely used algorithms in modern reinforcement learning. It combines the benefits of policy gradient methods with a set of improvements that make training more stable, sample-efficient, and easy to implement. PPO is used extensively in robotics, gaming, and simulated environments like MuJoCo and OpenAI Gym. This article explains PPO from the ground up: its motivation, theory, algorithmic structure, and practical considerations.
+
+## Motivation
+
+Traditional policy gradient methods suffer from instability due to large, unconstrained policy updates. While they optimize the expected return directly, updates can be so large that they lead to catastrophic performance collapse.
+
+Trust Region Policy Optimization (TRPO) proposed a solution by introducing a constraint on the size of the policy update using a KL-divergence penalty. However, TRPO is relatively complex to implement because it requires solving a constrained optimization problem using second-order methods.
+
+PPO was designed to simplify this by introducing a clipped surrogate objective that effectively limits how much the policy can change during each update—while retaining the benefits of trust-region-like behavior.
+
+## PPO Objective
+
+Let the old policy be $\pi_{\theta_{\text{old}}}$ and the new policy be $\pi_\theta$. PPO maximizes the following clipped surrogate objective:
+
+$$
+L^{\text{CLIP}}(\theta) = \mathbb{E}_t \left[
+\min\left(
+r_t(\theta) \hat{A}_t,
+\text{clip}(r_t(\theta), 1 - \epsilon, 1 + \epsilon) \hat{A}_t
+\right)
+\right]
+$$
+
+where:
+
+- $r_t(\theta) = \frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)}$ is the probability ratio,
+- $\hat{A}_t$ is the advantage estimate at time step $t$,
+- $\epsilon$ is a small hyperparameter (e.g., 0.1 or 0.2).
+
+### Why Clipping?
+
+Without clipping, large changes in the policy could lead to very large or small values of $r_t(\theta)$, resulting in destructive updates. The **clip** operation ensures that updates do not push the new policy too far from the old one.
+
+This introduces a **soft trust region**: when $r_t(\theta)$ is within $[1 - \epsilon, 1 + \epsilon]$, the update proceeds normally. If $r_t(\theta)$ exceeds this range, the objective is "flattened", preventing further change.
+
+## Full PPO Objective
+
+In practice, PPO uses a combination of multiple objectives:
+
+- **Clipped policy loss** (as above)
+- **Value function loss**: typically a mean squared error between predicted value and empirical return
+- **Entropy bonus**: to encourage exploration
+
+The full loss function is:
+
+$$
+L^{\text{PPO}}(\theta) =
+\mathbb{E}_t \left[
+L^{\text{CLIP}}(\theta)
+- c_1 \cdot (V_\theta(s_t) - \hat{V}_t)^2
++ c_2 \cdot \mathcal{H}[\pi_\theta](s_t)
+\right]
+$$
+
+where:
+
+- $c_1$ and $c_2$ are weighting coefficients,
+- $\hat{V}_t$ is an empirical return (or bootstrapped target),
+- $\mathcal{H}[\pi_\theta]$ is the entropy of the policy at state $s_t$.
+
+## Advantage Estimation
+
+PPO relies on high-quality advantage estimates $\hat{A}_t$ to guide policy updates. The most popular technique is **Generalized Advantage Estimation (GAE)**:
+
+$$
+\hat{A}_t = \sum_{l=0}^{T - t - 1} (\gamma \lambda)^l \delta_{t+l}
+$$
+
+with:
+
+$$
+\delta_t = r_t + \gamma V(s_{t+1}) - V(s_t)
+$$
+
+GAE balances the bias-variance trade-off via the $\lambda$ parameter (typically 0.95).
+
+## PPO Training Loop Overview
+
+At a high level, PPO training proceeds in the following way:
+
+1. **Collect rollouts** using the current policy for a fixed number of steps.
+2. **Compute advantages** using GAE.
+3. **Compute returns** for value function targets.
+4. **Optimize the PPO objective** with multiple minibatch updates (typically using Adam).
+5. **Update the old policy** to match the new one.
+
+Unlike TRPO, PPO allows **multiple passes through the same data**, improving sample efficiency.
+
+## Practical Tips
+
+- **Clip epsilon**: Usually 0.1 or 0.2. Too large allows harmful updates; too small restricts learning.
+- **Number of epochs**: PPO uses multiple SGD epochs (3–10) per batch.
+- **Batch size**: Typical values range from 2048 to 8192.
+- **Value/policy loss scales**: The constants $c_1$ and $c_2$ are often 0.5 and 0.01 respectively.
+- **Normalize advantages**: Empirically improves stability.
+
+> **Entropy Bonus**: Without sufficient entropy, the policy may prematurely converge to a suboptimal deterministic strategy.
+
+## Why PPO Works Well
+
+- **Stable updates**: Clipping constrains updates to a trust region without expensive computations.
+- **On-policy training**: Leads to high-quality updates at the cost of sample reuse.
+- **Good performance across domains**: PPO performs well in continuous control, discrete games, and real-world robotics.
+- **Simplicity**: Easy to implement and debug compared to TRPO, ACER, or DDPG.
+
+## PPO vs TRPO
+
+| Feature                   | PPO                                 | TRPO                                 |
+|---------------------------|--------------------------------------|--------------------------------------|
+| Optimizer                | First-order (SGD/Adam)              | Second-order (constrained step)      |
+| Trust region enforcement | Clipping                            | Explicit KL constraint               |
+| Sample efficiency        | Moderate                            | Low                                  |
+| Stability                | High                                | Very high                            |
+| Implementation           | Simple                              | Complex                              |
+
+## Limitations
+
+- **On-policy nature** means PPO discards data after each update.
+- **Entropy decay** can hurt long-term exploration unless tuned carefully.
+- **Not optimal for sparse-reward environments** without additional exploration strategies (e.g., curiosity, count-based bonuses).
+
+## PPO in Robotics
+
+PPO has become a standard in sim-to-real training workflows:
+
+- Robust to partial observability
+- Easy to stabilize on real robots
+- Compatible with parallel simulation (e.g., Isaac Gym, MuJoCo)
+
+## Summary
+
+PPO offers a clean and reliable solution for training RL agents using policy gradient methods. Its clipping objective balances the need for learning speed with policy stability. PPO is widely regarded as a default choice for continuous control tasks, and has been proven to work well across a broad range of applications.
+
+
+## Further Reading
+- [Proximal Policy Optimization Algorithms – Schulman et al. (2017)](https://arxiv.org/abs/1707.06347)
+- [Spinning Up PPO Overview – OpenAI](https://spinningup.openai.com/en/latest/algorithms/ppo.html)
+- [CleanRL PPO Implementation](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_continuous_action.py)
+- [RL Course Lecture on PPO – UC Berkeley CS285](https://rail.eecs.berkeley.edu/deeprlcourse/)
+- [OpenAI Gym PPO Examples](https://github.com/openai/baselines/tree/master/baselines/ppo2)
+- [Generalized Advantage Estimation (GAE) – Schulman et al.](https://arxiv.org/abs/1506.02438)
+- [PPO Implementation from Scratch – Andriy Mulyar](https://github.com/awjuliani/DeepRL-Agents)
+- [Deep Reinforcement Learning Hands-On (PPO chapter)](https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On)
+- [Stable Baselines3 PPO Documentation](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html)
+- [OpenReview: PPO vs TRPO Discussion](https://openreview.net/forum?id=r1etN1rtPB)
+- [Reinforcement Learning: State-of-the-Art Survey (2019)](https://arxiv.org/abs/1701.07274)
+- [RL Algorithms by Difficulty – RL Book Companion](https://huggingface.co/learn/deep-rl-course/unit2/ppo)

From cadb38317be4ec43d93564e154b8641557c0c25b Mon Sep 17 00:00:00 2001
From: Shivang Vijay <shivangvijay@gmail.com>
Date: Sun, 4 May 2025 08:36:15 -0400
Subject: [PATCH 12/12] Foundation of Value-Based Reinforcement Learning

---
 _data/navigation.yml                          |   4 +-
 .../value-based-reinforcement-learning.md     | 127 ++++++++++++++++++
 2 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 wiki/reinforcement-learning/value-based-reinforcement-learning.md

diff --git a/_data/navigation.yml b/_data/navigation.yml
index 0bea0416..ae308fd2 100644
--- a/_data/navigation.yml
+++ b/_data/navigation.yml
@@ -188,8 +188,10 @@ wiki:
         url: /wiki/reinforcemnet-learning/key-concepts-in-rl/
       - title: Reinforcement Learning Algorithms
         url: /wiki/reinforcemnet-learning/reinforcement-learning-algorithms/
-      -title: Policy Gradient Methods
+      - title: Policy Gradient Methods
         url: /wiki/reinforcemnet-learning/intro-to-policy-gradient-methods/
+      - title: Foundation of Value-Based Reinforcement Learning
+        url: /wiki/reinforcemnet-learning/value-based-reinforcement-learning/
   - title: State Estimation
     url: /wiki/state-estimation/
     children:
diff --git a/wiki/reinforcement-learning/value-based-reinforcement-learning.md b/wiki/reinforcement-learning/value-based-reinforcement-learning.md
new file mode 100644
index 00000000..05d6d1fb
--- /dev/null
+++ b/wiki/reinforcement-learning/value-based-reinforcement-learning.md
@@ -0,0 +1,127 @@
+---
+date: 2025-05-04
+title: Deep Q-Networks (DQN): A Foundation of Value-Based Reinforcement Learning
+---
+
+Deep Q-Networks (DQN) introduced the integration of Q-learning with deep neural networks, enabling reinforcement learning to scale to high-dimensional environments. Originally developed by DeepMind to play Atari games from raw pixels, DQN laid the groundwork for many modern value-based algorithms. This article explores the motivation, mathematical structure, algorithmic details, and practical insights for implementing and improving DQN.
+
+## Motivation
+
+Before DQN, classic Q-learning worked well in small, discrete environments. However, it couldn't generalize to high-dimensional or continuous state spaces.
+
+DQN addressed this by using a deep neural network as a function approximator for the Q-function, $Q(s, a; \theta)$. This allowed it to learn directly from visual input and approximate optimal action-values across thousands of states.
+
+The core idea: learn a parameterized Q-function that satisfies the Bellman optimality equation.
+
+## Q-Learning Recap
+
+Q-learning is a model-free, off-policy algorithm. It aims to learn the **optimal action-value function**:
+
+$$
+Q^*(s, a) = \mathbb{E} \left[ r + \gamma \max_{a'} Q^*(s', a') \middle| s, a \right]
+$$
+
+The Q-learning update rule is:
+
+$$
+Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right)
+$$
+
+DQN replaces the tabular $Q(s, a)$ with a neural network $Q(s, a; \theta)$, trained to minimize:
+
+$$
+L(\theta) = \mathbb{E}_{(s, a, r, s')} \left[ \left( r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right)^2 \right]
+$$
+
+where $\theta^-$ is the parameter set of a **target network** that is held fixed for several steps.
+
+## Core Components of DQN
+
+### 1. Experience Replay
+
+Instead of learning from consecutive experiences (which are highly correlated), DQN stores them in a **replay buffer** and samples random minibatches. This reduces variance and stabilizes updates.
+
+### 2. Target Network
+
+DQN uses a separate target network $Q(s, a; \theta^-)$ whose parameters are updated less frequently (e.g., every 10,000 steps). This decouples the moving target in the loss function and improves convergence.
+
+### 3. $\epsilon$-Greedy Exploration
+
+To balance exploration and exploitation, DQN uses an $\epsilon$-greedy policy:
+
+- With probability $\epsilon$, choose a random action.
+- With probability $1 - \epsilon$, choose $\arg\max_a Q(s, a; \theta)$.
+
+$\epsilon$ is typically decayed over time.
+
+## DQN Algorithm Overview
+
+1. Initialize Q-network with random weights $\theta$.
+2. Initialize target network $\theta^- \leftarrow \theta$.
+3. Initialize replay buffer $\mathcal{D}$.
+4. For each step:
+   - Observe state $s_t$.
+   - Select action $a_t$ via $\epsilon$-greedy.
+   - Take action, observe reward $r_t$ and next state $s_{t+1}$.
+   - Store $(s_t, a_t, r_t, s_{t+1})$ in buffer.
+   - Sample random minibatch from $\mathcal{D}$.
+   - Compute targets: $y_t = r + \gamma \max_{a'} Q(s_{t+1}, a'; \theta^-)$.
+   - Perform gradient descent on $(y_t - Q(s_t, a_t; \theta))^2$.
+   - Every $C$ steps, update $\theta^- \leftarrow \theta$.
+
+## Key Strengths
+
+- **Off-policy**: Allows experience reuse, increasing sample efficiency.
+- **Stable with CNNs**: Effective in high-dimensional visual environments.
+- **Simple to implement**: Core components are modular.
+
+## DQN Enhancements
+
+Several follow-up works improved on DQN:
+
+- **Double DQN**: Reduces overestimation bias in Q-learning.
+  
+  $$
+  y_t = r + \gamma Q(s', \arg\max_a Q(s', a; \theta); \theta^-)
+  $$
+
+- **Dueling DQN**: Splits Q-function into state-value and advantage function:
+
+  $$
+  Q(s, a) = V(s) + A(s, a)
+  $$
+
+- **Prioritized Experience Replay**: Samples transitions with high temporal-difference (TD) error more frequently.
+- **Rainbow DQN**: Combines all the above + distributional Q-learning into a single framework.
+
+## Limitations
+
+- **Not suited for continuous actions**: Requires discretization or replacement with actor-critic methods.
+- **Sample inefficiency**: Still requires many environment steps to learn effectively.
+- **Hard to tune**: Sensitive to learning rate, replay buffer size, etc.
+
+## DQN in Robotics
+
+DQN is less commonly used in robotics due to continuous control challenges, but:
+
+- Can be used in discretized navigation tasks.
+- Serves as a baseline in hybrid planning-learning pipelines.
+- Inspires off-policy learning architectures in real-time control.
+
+## Summary
+
+DQN is a foundational deep RL algorithm that brought deep learning to Q-learning. By integrating function approximation, experience replay, and target networks, it opened the door to using RL in complex visual and sequential tasks. Understanding DQN provides a solid base for learning more advanced value-based and off-policy algorithms.
+
+
+## Further Reading
+- [Playing Atari with Deep Reinforcement Learning – Mnih et al. (2013)](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)
+- [Human-level Control through Deep Reinforcement Learning – Nature 2015](https://www.nature.com/articles/nature14236)
+- [Double Q-Learning – van Hasselt et al.](https://arxiv.org/abs/1509.06461)
+- [Dueling Network Architectures – Wang et al.](https://arxiv.org/abs/1511.06581)
+- [Rainbow: Combining Improvements in Deep RL – Hessel et al.](https://arxiv.org/abs/1710.02298)
+- [Prioritized Experience Replay – Schaul et al.](https://arxiv.org/abs/1511.05952)
+- [RL Course Lecture: Value-Based Methods – Berkeley CS285](https://rail.eecs.berkeley.edu/deeprlcourse/)
+- [Deep RL Bootcamp – Value Iteration & DQN](https://sites.google.com/view/deep-rl-bootcamp/lectures)
+- [CleanRL DQN Implementation](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py)
+- [Spinning Up: Why Use Value-Based Methods](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html)
+- [Reinforcement Learning: An Introduction]()

<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml'>
<head>
<title>pFad - Phonifier reborn</title>
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
</head>
<body>
<h1>Pfad - The Proxy pFad of &#169; 2024 Garber Painting. All rights reserved.</h1>


<!-- Disclaimer -->
<p>Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.</p>
<br>
<p>Alternative Proxies:</p><p><a href="http://rainy.clevelandohioweatherforecast.com/php-proxy/index.php?q=https://patch-diff.githubusercontent.com/raw/RoboticsKnowledgebase/roboticsknowledgebase.github.io/pull/200.patch" target="_blank">Alternative Proxy</a></p><p><a href="http://rainy.clevelandohioweatherforecast.com/pFad/index.php?u=https://patch-diff.githubusercontent.com/raw/RoboticsKnowledgebase/roboticsknowledgebase.github.io/pull/200.patch" target="_blank">pFad Proxy</a></p><p><a href="http://rainy.clevelandohioweatherforecast.com/pFad/v3index.php?u=https://patch-diff.githubusercontent.com/raw/RoboticsKnowledgebase/roboticsknowledgebase.github.io/pull/200.patch" target="_blank">pFad v3 Proxy</a></p><p><a href="http://rainy.clevelandohioweatherforecast.com/pFad/v4index.php?u=https://patch-diff.githubusercontent.com/raw/RoboticsKnowledgebase/roboticsknowledgebase.github.io/pull/200.patch" target="_blank">pFad v4 Proxy</a></p></body>
</html>