doc: miscellaneous improvements (#70)

xuyxu · web-flow · commit ced469579ea1 · 2021-05-02T16:08:33.000+08:00
* doc: miscellaneous improvements

* doc: miscellaneous improvements
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -1,6 +1,20 @@
 Changelog
 =========
 
++---------------+-----------------------------------------------------------+
+| Badge         | Meaning                                                   |
++===============+===========================================================+
+| |Feature|     | Add something that cannot be achieved before.             |
++---------------+-----------------------------------------------------------+
+| |Efficiency|  | Improve the efficiency on the computation or memory.      |
++---------------+-----------------------------------------------------------+
+| |Enhancement| | Miscellaneous minor improvements.                         |
++---------------+-----------------------------------------------------------+
+| |Fix|         | Fix up something that does not work as expected.          |
++---------------+-----------------------------------------------------------+
+| |API|         | You will need to change the code to have the same effect. |
++---------------+-----------------------------------------------------------+
+
 Ver 0.1.*
 ---------
 
@@ -42,6 +56,6 @@ Beta
 .. |MajorFeature| replace:: :raw-html:`<span class="badge badge-success">Major Feature</span>` :raw-latex:`{\small\sc [Major Feature]}`
 .. |Feature| replace:: :raw-html:`<span class="badge badge-success">Feature</span>` :raw-latex:`{\small\sc [Feature]}`
 .. |Efficiency| replace:: :raw-html:`<span class="badge badge-info">Efficiency</span>` :raw-latex:`{\small\sc [Efficiency]}`
-.. |Enhancement| replace:: :raw-html:`<span class="badge badge-info">Enhancement</span>` :raw-latex:`{\small\sc [Enhancement]}`
+.. |Enhancement| replace:: :raw-html:`<span class="badge badge-primary">Enhancement</span>` :raw-latex:`{\small\sc [Enhancement]}`
 .. |Fix| replace:: :raw-html:`<span class="badge badge-danger">Fix</span>` :raw-latex:`{\small\sc [Fix]}`
 .. |API| replace:: :raw-html:`<span class="badge badge-warning">API Change</span>` :raw-latex:`{\small\sc [API Change]}`
diff --git a/README.rst b/README.rst
@@ -26,7 +26,14 @@
 Ensemble PyTorch
 ================
 
-Ensemble PyTorch is a unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model. Please refer to our `documentation <https://ensemble-pytorch.readthedocs.io/>`__ for details.
+Ensemble PyTorch is a unified ensemble framework for PyTorch to easily improve the performance and robustness of your deep learning model.
+
+Resources
+---------
+
+* `Document <https://ensemble-pytorch.readthedocs.io/>`__
+* `Source Code <https://github.com/xuyxu/Ensemble-Pytorch>`__
+* `Experiment <https://ensemble-pytorch.readthedocs.io/en/stable/experiment.html>`__
 
 Installation
 ------------
@@ -49,17 +56,17 @@ To use the latest version, you need to install the package from source:
 
     $ git clone https://github.com/xuyxu/Ensemble-Pytorch.git
     $ cd Ensemble-Pytorch
-    $ pip install -r requirements.txt (Optional)
+    $ pip install -r requirements.txt
     $ python setup.py install
 
-Minimal Example on How to Use
------------------------------
+Example
+-------
 
 .. code:: python
 
-    from torchensemble import VotingClassifier             # a classic ensemble method
+    from torchensemble import VotingClassifier             # Voting is a classic ensemble strategy
 
-    # Load your data
+    # Load data
     train_loader = DataLoader(...)
     test_loader = DataLoader(...)
 
@@ -73,7 +80,7 @@ Minimal Example on How to Use
                         weight_decay=weight_decay)         # weight decay of the optimizer
 
     # Set the scheduler
-    model.set_scheduler("CosineAnnealingLR", T_max=epochs) # optional
+    model.set_scheduler("CosineAnnealingLR", T_max=epochs) # (optional) learning rate scheduler
 
     # Train
     model.fit(train_loader,
@@ -103,11 +110,6 @@ Supported Ensemble
 | Fast Geometric Ensemble | Sequential |                                                                     `[NeurIPS'18] Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs <https://arxiv.org/pdf/1802.10026;Loss>`__                                                                       |    fast_geometric.py    |
 +-------------------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+
 
-Experiment
-----------
-
-Please refer to the `experiment part <https://ensemble-pytorch.readthedocs.io/en/stable/experiment.html>`__ of our documentation.
-
 Package Dependency
 ------------------
 
diff --git a/docs/experiment.rst b/docs/experiment.rst
@@ -4,7 +4,7 @@ Experiments
 Setup
 ~~~~~
 
-Experiments here are designed to evaluate the performance of each ensemble implemented in torchensemble. We have collected four configurations on dataset and base estimator, as shown in the table below. In addition, scripts on producing all figures below are available on `GitHub <https://github.com/xuyxu/Ensemble-Pytorch/tree/master/docs/plotting>`__.
+Experiments here are designed to evaluate the performance of each ensemble implemented in Ensemble-PyTorch. We have collected four different configurations on dataset and base estimator, as shown in the table below. In addition, scripts on producing all figures below are available on `GitHub <https://github.com/xuyxu/Ensemble-Pytorch/tree/master/docs/plotting>`__.
 
 .. table::
    :align: center
@@ -21,14 +21,14 @@ Experiments here are designed to evaluate the performance of each ensemble imple
    |ResNet\@CIFAR-100 |    ResNet-18   | CIFAR-100 |    2, 5, 7, 10    |
    +------------------+----------------+-----------+-------------------+
 
-1. Data augmentations were adopted on **CIFAR-10** and **CIFAR-100** datasets.
+1. Data augmentations were adopted on both **CIFAR-10** and **CIFAR-100** datasets.
 2. For **LeNet-5**, the ``Adam`` optimizer with learning rate ``1e-3`` and weight decay ``5e-4`` was used.
 3. For **ResNet-18**, the ``SGD`` optimizer with learning rate ``1e-1``, weight decay ``5e-4``, and momentum ``0.9`` was used.
 4. Reference code: `ResNet-18 on CIFAR-10 <https://github.com/kuangliu/pytorch-cifar>`__ and `ResNet-18 on CIFAR-100 <https://github.com/weiaicunzai/pytorch-cifar100>`__.
 
 .. tip::
 
-  For each experiment shwon below, we have added some comments that may be worthy of your attention. Feel free to open an `issue <https://github.com/xuyxu/Ensemble-Pytorch/issues>`__ if you have any question on the results.
+  For each experiment shown below, we have added some comments that may be worthy of your attention. Feel free to open an `issue <https://github.com/xuyxu/Ensemble-Pytorch/issues>`__ if you have any question on the results.
 
 LeNet\@MNIST
 ~~~~~~~~~~~~
@@ -39,8 +39,8 @@ LeNet\@MNIST
 
 * MNIST is a very easy dataset, and the testing acc of a single LeNet-5 estimator is over 99%
 * voting and bagging are the most effective ensemble in this case
-* bagging is even better than voting since the bootstrap sampling on training data ingests more diversity into the ensemble than voting
-* fusion does not perform well in this case, possibly because the model complexity of a single LeNet-5 estimator is already enough for MNIST. Therefore, simply encapsulating several LeNet-5 estimators into a large model will only make the over-fitting problem more severe.
+* bagging is even better than voting since the bootstrap sampling on training data ingests more diversity into the ensemble
+* fusion does not perform well in this case, possibly because the model complexity of a single LeNet-5 estimator is already sufficient for MNIST. Therefore, simply encapsulating several LeNet-5 estimators into a large model will only make the over-fitting problem more severe.
 
 LeNet\@CIFAR-10
 ~~~~~~~~~~~~~~~
@@ -50,7 +50,7 @@ LeNet\@CIFAR-10
    :width: 400
 
 * CIFAR-10 is a hard dataset for LeNet-5, and the testing acc of a single LeNet-5 estimator is around 70%
-* gradient boosting is the most effective ensemble because it is able to boost the performance of weak estimators by a large margin as a bias-reduction ensemble method
+* gradient boosting is the most effective ensemble because it is able to improve the performance of weak estimators by a large margin as a bias-reduction ensemble method
 * bagging is worse than voting since less training data are available
 * snapshot ensemble, more precisely, the customized learning rate scheduler in snapshot ensemble, does not adapt well with LeNet-5 (more training epochs are needed)
 
@@ -80,4 +80,4 @@ ResNet\@CIFAR-100
 Acknowledgement
 ~~~~~~~~~~~~~~~
 
-We would like to thank the `LAMDA Group <http://www.lamda.nju.edu.cn/MainPage.ashx>`__ from Nanjing University for providing us with the powerful V-100 GPU server.
+We would like to thank the `LAMDA Group <http://www.lamda.nju.edu.cn/MainPage.ashx>`__ from Nanjing University for providing us with the powerful V-100 GPU server.
diff --git a/docs/index.rst b/docs/index.rst
@@ -2,84 +2,57 @@
    :align: center
    :width: 400
 
-|github|_ |readthedocs|_ |codecov|_ |python|_ |pypi|_ |license|_
-
-.. |github| image:: https://github.com/xuyxu/Ensemble-Pytorch/workflows/torchensemble-CI/badge.svg
-.. _github: https://github.com/xuyxu/Ensemble-Pytorch/actions
-
-.. |readthedocs| image:: https://readthedocs.org/projects/ensemble-pytorch/badge/?version=latest
-.. _readthedocs: https://ensemble-pytorch.readthedocs.io/en/latest/index.html
-
-.. |codecov| image:: https://codecov.io/gh/xuyxu/Ensemble-Pytorch/branch/master/graph/badge.svg?token=2FXCFRIDTV
-.. _codecov: https://codecov.io/gh/xuyxu/Ensemble-Pytorch
-
-.. |python| image:: https://img.shields.io/badge/python-3.6+-blue?logo=python
-.. _python: https://www.python.org/
-
-.. |pypi| image:: https://img.shields.io/pypi/v/torchensemble
-.. _pypi: https://pypi.org/project/torchensemble/
-
-.. |license| image:: https://img.shields.io/github/license/xuyxu/Ensemble-Pytorch
-.. _license: https://github.com/xuyxu/Ensemble-Pytorch/blob/master/LICENSE
-
 Ensemble PyTorch Documentation
 ==============================
 
-.. rst-class:: center
-
-| |:homes:| `GitHub <https://github.com/xuyxu/Ensemble-Pytorch>`__ | |:book:| `ReadtheDocs <https://readthedocs.org/projects/ensemble-pytorch/>`__ | |:hammer_and_wrench:| `Codecov <https://codecov.io/gh/xuyxu/Ensemble-Pytorch>`__
-|
-
-Ensemble PyTorch is a unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model. It provides:
+Ensemble PyTorch is a unified ensemble framework for PyTorch to easily improve the performance and robustness of your deep learning model. It provides:
 
 * |:arrow_up_small:| Easy ways to improve the performance and robustness of your deep learning model.
 * |:eyes:| Easy-to-use APIs on training and evaluating the ensemble.
 * |:zap:| High training efficiency with parallelization.
 
-| This package is under active development. Please feel free to open an `issue <https://github.com/xuyxu/Ensemble-Pytorch/issues>`__ if your have any problem. In addition, any feature request or `pull request <https://github.com/xuyxu/Ensemble-Pytorch/pulls>`__ would be highly welcomed.
-
 Guidepost
 ---------
 
 * To get started, please refer to `Quick Start <./quick_start.html>`__;
 * To learn more about ensemble methods supported, please refer to `Introduction <./introduction.html>`__;
 * If you are confused on which ensemble method to use, our `experiments <./experiment.html>`__ and the instructions in `guidance <./guide.html>`__ may be helpful.
 
-Minimal Example on How to Use
------------------------------
+Example
+-------
 
 .. code:: python
 
-    from torchensemble import VotingClassifier             # a classic ensemble method
+  from torchensemble import VotingClassifier             # Voting is a classic ensemble strategy
 
-    # Load your data
-    train_loader = DataLoader(...)
-    test_loader = DataLoader(...)
+  # Load data
+  train_loader = DataLoader(...)
+  test_loader = DataLoader(...)
 
-    # Define the ensemble
-    model = VotingClassifier(estimator=base_estimator,     # your deep learning model
-                             n_estimators=10)              # the number of base estimators
+  # Define the ensemble
+  model = VotingClassifier(estimator=base_estimator,     # your deep learning model
+                           n_estimators=10)              # the number of base estimators
 
-    # Set the optimizer
-    model.set_optimizer("Adam",                            # parameter optimizer
-                        lr=learning_rate,                  # learning rate of the optimizer
-                        weight_decay=weight_decay)         # weight decay of the optimizer
+  # Set the optimizer
+  model.set_optimizer("Adam",                            # parameter optimizer
+                      lr=learning_rate,                  # learning rate of the optimizer
+                      weight_decay=weight_decay)         # weight decay of the optimizer
 
-    # Set the scheduler
-    model.set_scheduler("CosineAnnealingLR", T_max=epochs) # optional
+  # Set the scheduler
+  model.set_scheduler("CosineAnnealingLR", T_max=epochs) # (optional) learning rate scheduler
 
-    # Train
-    model.fit(train_loader,
-              epochs=epochs)                               # the number of training epochs
+  # Train
+  model.fit(train_loader,
+            epochs=epochs)                               # the number of training epochs
 
-    # Evaluate
-    acc = model.predict(test_loader)                       # testing accuracy
+  # Evaluate
+  acc = model.predict(test_loader)                       # testing accuracy
 
 Content
 -------
 
 .. toctree::
-  :maxdepth: 2
+  :maxdepth: 1
 
    Quick Start <quick_start>
    Introduction <introduction>
diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -70,7 +70,13 @@ During the training stage of each base estimator :math:`h^m`, an adversarial sam
 
 Same as Voting and Bagging, the output of ``AdversarialTrainingClassifier`` or ``AdversarialTrainingRegressor`` during the evaluating stage is the average over predictions from all base estimators.
 
+Fast Geometric Ensemble [3]_
+----------------------------
+
+Motivated by geometric insights on the loss surface of deep neural networks, Fast Geometirc Ensembling (FGE) is an efficient ensemble that uses a customized learning rate scheduler to generate base estimators, similar to snapshot ensemble.
+
 **References**
 
 .. [1] Huang Gao, Sharon Yixuan Li, Geoff Pleisset, et al., "Snapshot ensembles: Train 1, get m for free." ICLR, 2017.
-.. [2] Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell., "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles." NIPS 2017.
+.. [2] Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell., "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles." NIPS 2017.
+.. [3] Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin et al., "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs." NeurIPS, 2018.
diff --git a/docs/parameters.rst b/docs/parameters.rst
@@ -1,7 +1,7 @@
 Parameters
 ==========
 
-This page provides the API reference of ``torchensemble``, please also refer to `Introduction <./introduction.html>`__ for details.
+This page provides the API reference of :mod:`torchensemble`.
 
 Fusion
 ------
@@ -48,7 +48,7 @@ Bagging
 In bagging-based ensemble methods, each base estimator is trained
 independently. In addition, sampling with replacement is conducted on the
 training data to further encourage the diversity between different base
-estimators in the ensemble model.
+estimators in the ensemble.
 
 BaggingClassifier
 *****************
@@ -71,7 +71,7 @@ computed based on the ground truth and the output from base estimators
 fitted before, using ordinary least square.
 
 .. tip::
-    The input argument ``shrinkage_rate`` in :mod:`gradient_boosting` is also known as learning rate in other gradient boosting libraries such as `XGBoost <https://xgboost.readthedocs.io/en/latest/>`__. However, its meaning is totally different from the meaning of learning rate in the context of parameter optimizer in deep learning.
+    The input argument ``shrinkage_rate`` in :class:`gradient_boosting` is also known as learning rate in other gradient boosting libraries such as `XGBoost <https://xgboost.readthedocs.io/en/latest/>`__. However, its meaning is totally different from the meaning of learning rate in the context of parameter optimizer in deep learning.
 
 GradientBoostingClassifier
 **************************
@@ -139,7 +139,7 @@ Fast Geometric Ensemble
 -----------------------
 
 Motivated by geometric insights on the loss surface of deep neural networks,
-Fast Geometirc Ensembling (FGE) is an efficient ensemble that uses a
+Fast Geometric Ensembling (FGE) is an efficient ensemble that uses a
 customized learning rate scheduler to generate base estimators, similar to
 snapshot ensemble.
 
diff --git a/docs/quick_start.rst b/docs/quick_start.rst
diff --git a/docs/roadmap.rst b/docs/roadmap.rst