We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 53d5eba commit da60e51Copy full SHA for da60e51
dl/reinforcement/reinforcement.md
@@ -17,7 +17,7 @@ Log-Likelihood:计算每一个动作的概率,$$log\pi_\theta(a|s) = log[P_\
17
18
**diagonal Gaussian policies 通常用在连续动作空间的场景**
19
20
-采样阶段,生成随机动作的概率 $$a = \mu_\theta(s) +\delta_\theta(s)\odot z$$ $$z\sim N(0,I)$$
+采样阶段,生成随机动作的概率 $$a = \mu_\theta(s) +\delta_\theta(s)\odot z$$,$$z\sim N(0,I)$$
21
22
Log-Likelihood: $$log\pi_\theta(a|s) = -\frac{1}{2}( \sum_{i=1}^{k}(\frac{(a_i-\mu_i)^2)}{\delta_i^2}))+klog2\pi)$$
23
@@ -122,7 +122,3 @@ $$s_{t+1} \sim P(\odot|s_t, a_t)$$
122
| :--- |
123
124
125
-
126
127
-[^1]: Enter footnote here.
128
0 commit comments