Improve latency of a RandomForestRegressor.predict #16310

sdpython · 2020-01-30T11:04:17Z

Reference Issues/PRs

Proposes one fix for #16143.

What does this implement/fix? Explain your changes.

Allow the user to disable the parallelisation or to set check_input=False when calling predict for a RandomForestRegression. The prediction time is divided by 3 by parallelisation is disabled, check_input=False with 100 trees, 10 features, and 1 observation.

parallel=T check=T: r=1000 e=100 n=1 f=10 time=4.597580
parallel=F check=T: r=1000 e=100 n=1 f=10 time=3.539385
parallel=F check=F: r=1000 e=100 n=1 f=10 time=1.461284

Profile with py-spy:

… and check_input are disabled

agramfort · 2020-01-30T11:22:57Z

sklearn/ensemble/_forest.py

@@ -743,7 +743,7 @@ def __init__(self,
            warm_start=warm_start,
            max_samples=max_samples)

-    def predict(self, X):
+    def predict(self, X, check_input=True, parallel_predict=True):


I would do this with context managers and to not add public parameters.

eg using:

with parallel_backend('threading', n_jobs=1): clf.predict(X)

and we should be able to do:

with sklearn.config_context(check_input=False): ....

DecisionTreeRegressor still has a parameter check_input=False (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor.predict). Which one should be used when check_input values are not the same?

sdpython · 2020-01-30T13:22:29Z

I extended the script to compare the predict method to two onnx runtimes (onnxruntime, mlprodict) which work well on small batches:

parallel=T check=T:  r=1000 e=100 n=1 f=10 time=4.420036
parallel=F check=T:  r=1000 e=100 n=1 f=10 time=3.733469
parallel=F check=F:  r=1000 e=100 n=1 f=10 time=1.408071
onnxruntime predict: r=1000 e=100 n=1 f=10 time=0.029372
mlprodict_predict:   r=1000 e=100 n=1 f=10 time=0.014844

When the batch size is 1000 (instead of 1 above), results are comparable:

parallel=T check=T:  r=10 e=100 n=1000 f=10 time=0.315662
parallel=F check=T:  r=10 e=100 n=1000 f=10 time=0.299146
parallel=F check=F:  r=10 e=100 n=1000 f=10 time=0.274048
onnxruntime predict: r=10 e=100 n=1000 f=10 time=0.604990
mlprodict_predict:   r=10 e=100 n=1000 f=10 time=0.225800

When the batch size is 100.000, onnxruntime is the slowest, the other runtime is still fast.

parallel=T check=T:  r=10 e=100 n=100000 f=10 time=19.007561
parallel=F check=T:  r=10 e=100 n=100000 f=10 time=18.932476
parallel=F check=F:  r=10 e=100 n=100000 f=10 time=18.834580
onnxruntime predict: r=10 e=100 n=100000 f=10 time=59.601847
mlprodict_predict:   r=10 e=100 n=100000 f=10 time=19.402375

ogrisel · 2020-01-30T16:03:37Z

Skipping the validation checks on specific estimators will be significantly either once the PR on n_features_in_ is merged (#16112).

thomasjpfan · 2020-01-30T16:41:32Z

On master, the default is check_input=False and parallel. What does the performance look like if we set parallel=False on master and leave check_input=False?

…into rfperf

sdpython · 2021-02-10T11:10:00Z

Not sure what the status is about this PR. I suggest closing it.

ogrisel · 2021-06-15T09:06:26Z

Let's close (but keep #16143 open). We need to redo profiling on the main branch.

I suspect that even when n_jobs=1 is set explicitly, joblib still imposes a very large overhead for individual predictions. We probably need to find a way to bypass the backend discovery and configuration mechanism completely when n_jobs is explicitly set to 1. That won't fix the default behavior though because the default value of n_jobs depends on the active backend machinery.

Measure performance of a RandomForestRegressor when parallalelization…

e3c87ca

… and check_input are disabled

agramfort reviewed Jan 30, 2020

View reviewed changes

Fix style, add comparison to onnxruntime

0d6ea93

Fix style

d28bff2

jeremiedbb added module:ensemble Performance Sprint labels Jan 30, 2020

pep8

e8349e1

ogrisel changed the title ~~Improve performance of a RandomForestRegressor~~ Improve latency of a RandomForestRegressor.predict Jan 30, 2020

sdpython added 2 commits February 28, 2020 16:11

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

626d5c5

…into rfperf

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

8d149d6

…into rfperf

Base automatically changed from master to main January 22, 2021 10:51

ogrisel closed this Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve latency of a RandomForestRegressor.predict #16310

Improve latency of a RandomForestRegressor.predict #16310

sdpython commented Jan 30, 2020

agramfort Jan 30, 2020 •

edited

Loading

sdpython Jan 30, 2020

sdpython commented Jan 30, 2020 •

edited

Loading

ogrisel commented Jan 30, 2020

thomasjpfan commented Jan 30, 2020

sdpython commented Feb 10, 2021

ogrisel commented Jun 15, 2021

Improve latency of a RandomForestRegressor.predict #16310

Improve latency of a RandomForestRegressor.predict #16310

Conversation

sdpython commented Jan 30, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

agramfort Jan 30, 2020 • edited Loading

Choose a reason for hiding this comment

sdpython Jan 30, 2020

Choose a reason for hiding this comment

sdpython commented Jan 30, 2020 • edited Loading

ogrisel commented Jan 30, 2020

thomasjpfan commented Jan 30, 2020

sdpython commented Feb 10, 2021

ogrisel commented Jun 15, 2021

agramfort Jan 30, 2020 •

edited

Loading

sdpython commented Jan 30, 2020 •

edited

Loading