Skip to content

Improve latency of a RandomForestRegressor.predict #16310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Improve latency of a RandomForestRegressor.predict #16310

wants to merge 6 commits into from

Conversation

sdpython
Copy link
Contributor

Reference Issues/PRs

Proposes one fix for #16143.

What does this implement/fix? Explain your changes.

Allow the user to disable the parallelisation or to set check_input=False when calling predict for a RandomForestRegression. The prediction time is divided by 3 by parallelisation is disabled, check_input=False with 100 trees, 10 features, and 1 observation.

parallel=T check=T: r=1000 e=100 n=1 f=10 time=4.597580
parallel=F check=T: r=1000 e=100 n=1 f=10 time=3.539385
parallel=F check=F: r=1000 e=100 n=1 f=10 time=1.461284

Profile with py-spy:

image

@@ -743,7 +743,7 @@ def __init__(self,
warm_start=warm_start,
max_samples=max_samples)

def predict(self, X):
def predict(self, X, check_input=True, parallel_predict=True):
Copy link
Member

@agramfort agramfort Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this with context managers and to not add public parameters.

eg using:

with parallel_backend('threading', n_jobs=1):
     clf.predict(X)

and we should be able to do:

with sklearn.config_context(check_input=False):
     ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecisionTreeRegressor still has a parameter check_input=False (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor.predict). Which one should be used when check_input values are not the same?

@sdpython
Copy link
Contributor Author

sdpython commented Jan 30, 2020

I extended the script to compare the predict method to two onnx runtimes (onnxruntime, mlprodict) which work well on small batches:

parallel=T check=T:  r=1000 e=100 n=1 f=10 time=4.420036
parallel=F check=T:  r=1000 e=100 n=1 f=10 time=3.733469
parallel=F check=F:  r=1000 e=100 n=1 f=10 time=1.408071
onnxruntime predict: r=1000 e=100 n=1 f=10 time=0.029372
mlprodict_predict:   r=1000 e=100 n=1 f=10 time=0.014844

When the batch size is 1000 (instead of 1 above), results are comparable:

parallel=T check=T:  r=10 e=100 n=1000 f=10 time=0.315662
parallel=F check=T:  r=10 e=100 n=1000 f=10 time=0.299146
parallel=F check=F:  r=10 e=100 n=1000 f=10 time=0.274048
onnxruntime predict: r=10 e=100 n=1000 f=10 time=0.604990
mlprodict_predict:   r=10 e=100 n=1000 f=10 time=0.225800

When the batch size is 100.000, onnxruntime is the slowest, the other runtime is still fast.

parallel=T check=T:  r=10 e=100 n=100000 f=10 time=19.007561
parallel=F check=T:  r=10 e=100 n=100000 f=10 time=18.932476
parallel=F check=F:  r=10 e=100 n=100000 f=10 time=18.834580
onnxruntime predict: r=10 e=100 n=100000 f=10 time=59.601847
mlprodict_predict:   r=10 e=100 n=100000 f=10 time=19.402375

@ogrisel
Copy link
Member

ogrisel commented Jan 30, 2020

Skipping the validation checks on specific estimators will be significantly either once the PR on n_features_in_ is merged (#16112).

@ogrisel ogrisel changed the title Improve performance of a RandomForestRegressor Improve latency of a RandomForestRegressor.predict Jan 30, 2020
@thomasjpfan
Copy link
Member

On master, the default is check_input=False and parallel. What does the performance look like if we set parallel=False on master and leave check_input=False?

Base automatically changed from master to main January 22, 2021 10:51
@sdpython
Copy link
Contributor Author

Not sure what the status is about this PR. I suggest closing it.

@ogrisel
Copy link
Member

ogrisel commented Jun 15, 2021

Let's close (but keep #16143 open). We need to redo profiling on the main branch.

I suspect that even when n_jobs=1 is set explicitly, joblib still imposes a very large overhead for individual predictions. We probably need to find a way to bypass the backend discovery and configuration mechanism completely when n_jobs is explicitly set to 1. That won't fix the default behavior though because the default value of n_jobs depends on the active backend machinery.

@ogrisel ogrisel closed this Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants