Skip to content

JIT cause 100% cpu usage #11736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrii29 opened this issue Jul 18, 2023 · 10 comments
Open

JIT cause 100% cpu usage #11736

andrii29 opened this issue Jul 18, 2023 · 10 comments

Comments

@andrii29
Copy link

Description

The following code:

    [opcache]
    opcache.enable=1
    opcache.enable_cli=1
    opcache.memory_consumption=512
    opcache.jit_buffer_size=256M
    opcache.interned_strings_buffer=16
    opcache.max_accelerated_files=200000
    opcache.max_wasted_percentage=5
    opcache.use_cwd=1
    opcache.validate_timestamps=0
    opcache.revalidate_freq=600
    opcache.revalidate_path=0
    opcache.save_comments=1
    opcache.enable_file_override=0
    opcache.optimization_level=0x7FFFBFFF
    opcache.dups_fix=0
    opcache.max_file_size=0
    opcache.consistency_checks=0
    opcache.force_restart_timeout=180
    opcache.file_cache_only=0
    opcache.file_cache_consistency_checks=1
    opcache.huge_code_pages=1
    opcache.preload=/path/to/config/preload.php
    opcache.preload_user=user

Resulted in this output:

[18-Jul-2023 03:40:09] NOTICE: [pool www] child 179 started
Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /path/to/file.php on line 33
[18-Jul-2023 03:40:13] WARNING: [pool www] child 47 exited with code 124 after 381.082060 seconds from start
[18-Jul-2023 03:40:13] NOTICE: [pool www] child 182 started
Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /path/to/file.php on line 33
[18-Jul-2023 03:40:25] WARNING: [pool www] child 42 exited with code 124 after 393.292804 seconds from start

But I expected this output instead:

no errors in log

Hello,
after enabling JIT, our application start perform better (~10% faster). But, some php-fpm pods (~1 in 20) after start use all available cpu and timeout the requests. After pod restart, everything back to normal. Without JIT everything is ok and cpu utilization is at normal value.

Regards

opcache.blacklist_filename => no value => no value
opcache.consistency_checks => 0 => 0
opcache.dups_fix => Off => Off
opcache.enable => On => On
opcache.enable_cli => On => On
opcache.enable_file_override => Off => Off
opcache.error_log => no value => no value
opcache.file_cache => no value => no value
opcache.file_cache_consistency_checks => On => On
opcache.file_cache_only => Off => Off
opcache.file_update_protection => 2 => 2
opcache.force_restart_timeout => 180 => 180
opcache.huge_code_pages => On => On
opcache.interned_strings_buffer => 16 => 16
opcache.jit => tracing => tracing
opcache.jit_bisect_limit => 0 => 0
opcache.jit_blacklist_root_trace => 16 => 16
opcache.jit_blacklist_side_trace => 8 => 8
opcache.jit_buffer_size => 256M => 256M
opcache.jit_debug => 0 => 0
opcache.jit_hot_func => 127 => 127
opcache.jit_hot_loop => 64 => 64
opcache.jit_hot_return => 8 => 8
opcache.jit_hot_side_exit => 8 => 8
opcache.jit_max_exit_counters => 8192 => 8192
opcache.jit_max_loop_unrolls => 8 => 8
opcache.jit_max_polymorphic_calls => 2 => 2
opcache.jit_max_recursive_calls => 2 => 2
opcache.jit_max_recursive_returns => 2 => 2
opcache.jit_max_root_traces => 1024 => 1024
opcache.jit_max_side_traces => 128 => 128
opcache.jit_prof_threshold => 0.005 => 0.005
opcache.lockfile_path => /tmp => /tmp
opcache.log_verbosity_level => 1 => 1
opcache.max_accelerated_files => 200000 => 200000
opcache.max_file_size => 0 => 0
opcache.max_wasted_percentage => 5 => 5
opcache.memory_consumption => 512 => 512
opcache.opt_debug_level => 0 => 0
opcache.optimization_level => 0x7FFFBFFF => 0x7FFFBFFF
opcache.preferred_memory_model => no value => no value
opcache.preload => /path/to/config/preload.php => //path/to/config/preload.php
opcache.preload_user => user => user
opcache.protect_memory => Off => Off
opcache.record_warnings => Off => Off
opcache.restrict_api => no value => no value
opcache.revalidate_freq => 600 => 600
opcache.revalidate_path => Off => Off
opcache.save_comments => On => On
opcache.use_cwd => On => On
opcache.validate_permission => Off => Off
opcache.validate_root => Off => Off
opcache.validate_timestamps => Off => Off

PHP Version

PHP 8.1.20

Operating System

Debian GNU/Linux 10 (buster) docker container on Amazon Linux 2

@nielsdos
Copy link
Member

I think this will be pretty much impossible to debug and fix without having a reproducer or at least more info about where it goes wrong.

@maaarghk
Copy link
Contributor

looks like #11609 - @iluuu1994 do you know offhand if it'd be relevant to php 8.1.20 as indicated here?

@iluuu1994
Copy link
Member

@maaarghk I don't think this is related, as this reverting the tracing counter handlers was only added in 8.2.

@andrii29
Copy link
Author

what tool should i use to get relevant trace?

some trace logs from stuck workers

[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, 8) = 0
[pid 12215] sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="5\1\0\0", iov_len=4}, {iov_base="\265\2\0\0", iov_len=4}, {iov_base="\0\0\0\0", iov_len=4}, {iov_base="\334\7\0\0", iov_len=4}, {iov_base="\335\7\0\0", iov_len=4}, {iov_base="2\1\0\0", iov_len=4}, {iov_base="\3", iov_len=1}, {iov_base="(\265/\375`2\0\225\10\0\6\21?; \1)\10\301\360~\31A\220\302\36'uh\5\t\t"..., iov_len=284}], msg_iovlen=8, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 309
[pid 12215] recvfrom(7, 0x5580d061efd0, 4, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12215] poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 299999) = 1 ([{fd=7, revents=POLLIN}])
[pid 12215] recvfrom(7, "\312\1\0\0", 4, 0, NULL, NULL) = 4
[pid 12215] recvfrom(7, "\272\2\210\17\265\2\0\0\334\7\0\0\335\7\0\0\320\2\0\0\3(\265/\375`\320\1=\r\0\366"..., 454, 0, NULL, NULL) = 454
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="j\1\0\0", iov_len=4}, {iov_base="\267\2\0\0", iov_len=4}, {iov_base="\0\0\0\0", iov_len=4}, {iov_base="\334\7\0\0", iov_len=4}, {iov_base="\335\7\0\0", iov_len=4}, {iov_base="\207\1\0\0", iov_len=4}, {iov_base="\3", iov_len=1}, {iov_base="(\265/\375`\207\0=\n\0\26TIB \2156\10\263p\325!\337X\27z\201\332E\321\326\354"..., iov_len=337}], msg_iovlen=8, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 362
[pid 12215] recvfrom(7, 0x5580d06b8150, 4, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12215] poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 299999) = 1 ([{fd=7, revents=POLLIN}])
[pid 12215] recvfrom(7, "\274\1\0\0", 4, 0, NULL, NULL) = 4
[pid 12215] recvfrom(7, "\273\2\210\17\267\2\0\0\334\7\0\0\335\7\0\0\23\2\0\0\3(\265/\375`\23\1\315\f\0\226"..., 440, 0, NULL, NULL) = 440
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141

@nielsdos
Copy link
Member

Do you have a script that reliably reproduces the problem, or is it unrelated to a script?
The syscall trace you show indicates it seems stuck in an infinite EAGAIN loop, but that doesn't include the source information...

@andrii29
Copy link
Author

we use symfony 5.4 + overblog/GraphQLBundle
I enabled slow log for 15 seconds and got next results

[18-Jul-2023 16:32:08] NOTICE: child 37 stopped for tracing
[18-Jul-2023 16:32:08] NOTICE: about to trace 37
[18-Jul-2023 16:32:08]  [pool www] pid 37
script_filename = /app/public/index.php
[0x00007f5f86a16460] getFirstPriority() /app/src/GraphQLBundle/CustomResolver.php:38
[0x00007f5f86a163a0] valueFromObjectOrArray() /app/src/GraphQLBundle/CustomResolver.php:19
[0x00007f5f86a162e0] __invoke() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:623
[0x00007f5f86a161f0] resolveFieldValueOrError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:550
[0x00007f5f86a160c0] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[0x00007f5f86a15fa0] executeFields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1145
[0x00007f5f86a15ef0] collectAndExecuteSubfields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1106
[0x00007f5f86a15e10] completeObjectValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:793
[0x00007f5f86a15d30] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a15c50] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:557
[0x00007f5f86a15b20] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[0x00007f5f86a15a00] executeFields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1145
[0x00007f5f86a15950] collectAndExecuteSubfields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1106
[0x00007f5f86a15870] completeObjectValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:793
[0x00007f5f86a15790] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a156b0] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:887
[0x00007f5f86a15570] completeListValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:761
[0x00007f5f86a15490] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a153b0] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:557
[0x00007f5f86a15280] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[18-Jul-2023 16:32:08] NOTICE: finished trace of 37

logs from workers without slow log

Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /app/src/Document/AbstractDocument.php on line 33
[18-Jul-2023 03:40:25] WARNING: [pool www] child 42 exited with code 124 after 393.292804 seconds from start

in normal state requests are finished in less than second

@andrii29
Copy link
Author

andrii29 commented Oct 5, 2023

same behaviour on php 8.2

@smepti
Copy link

smepti commented Jun 18, 2024

same behaviour on php 8.1.19, symfony 6.3.4

Same, word for word
"after enabling JIT, our application start perform better (~10% faster). But, some php-fpm pods (~1 in 20) after start use all available cpu and timeout the requests. After pod restart, everything back to normal. Without JIT everything is ok and cpu utilization is at normal value."

we using k8s and after deploy ~1 in 20 pods use 100% cpu
Снимок экрана 2024-06-19 в 01 57 37

@andrii29 may be you resolve issue?

@smepti
Copy link

smepti commented Jun 18, 2024

here is our opcache config

opcache.validate_timestamps=0
opcache.interned_strings_buffer=50
opcache.memory_consumption=300
opcache.max_accelerated_files=32531
opcache.jit_buffer_size=128M
opcache.preload=/app/symfony-app/config/preload.php
opcache.preload_user=www-data

@andrii29
Copy link
Author

@smepti nope(
we upgraded php to 8.3, but the issue is still present

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants