-
Notifications
You must be signed in to change notification settings - Fork 7.9k
PHP-FPM: 8.2 random lockups of daemon #12449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Customer prepared minimal application in nette to reproduce it (thanks @kravcik) I have attached it in zip. Directories user1, user2 and user3 contains php code for three separate php-fpm users (pools). You need to replace image.jpg with some real file, our test photo is around 7MBs in size. Script test starts apache benchmark on first two urls and then it tries to upload file in 16 concurent curl processes) for url3 |
Hi, our test machine has updated to php 8.2.12 and i'm unable to reproduce the problem anymore. Was there some change? I don't think i see anything related in changelog |
I have been looking into this and don't see anything suspicious in the config or app code. I'm not exactly sure what could get process to D (Uninterruptible sleep) in FPM. It's usually caused by device waiting on IO from what I gather. I have been checking the diff between 8.2.11 and 8.2.12 and I don't see anything that could have any impact on this. I'm wondering if you maybe updated other system packages as well that might be causing this? Are you able to recreate this locally with 8.2.11 ideally with vanilla PHP build (compile PHP) or only on this machine / system with Debian packages? Also is there anything filesystem specific used (e.g. NFS) or some device that could affect this? It would be also good to provide more info about web server used (configuration) and system if you can't recreate it locally. |
Hi, We switched our production to php8.2 and problem occured again. I will have to be more creative/aggressive on our test server I think. Filesystem is btrfs on production, ext4 on test server. Apache is configured with mpm-event and proxy_fcgi to php. Both VMs are running on Dell servers inside vmware. Storage is Dell ME5024 over SAS, all-flash storage. It always locks only the php8.2-fpm , all other processes (like webs running under php8.1, behind the same apache web server) are not affected. |
When you say It's possible that some change in PHP might have caused this but it might be just some other OS / kernel issue that it hit. Unfortunately it's hard to debug this because D means that process is in kernel space waiting on some IO so it is not possible to attach debugger to it. I found some hints here that might help with debugging: https://unix.stackexchange.com/questions/303613/find-the-cause-of-a-permanently-blocked-i-o-process-in-uninterruptible-sleep . So maybe try that. |
No images, it's ordinary web server with apache2 daemon and some php-fpm daemons (each having multiple pools, with unix socket, per user). When i say, that we switchted production, we swtiched fcgi in apache for example from
to
Thanks for the link. I will try to figure out something. I noticed, it's difficult to debug with strace. When i start whole php-fpm under strace, it does not lock, but the children crash instead |
I captured this on our production today, just before restarting php-fpm First capture - the last process is new child?
Second capture - the last process is new child? It did not change it's caption to
|
I have been thinking about this and there is one change in PHP 8.2 that might have potentially effect on this. The stream copy function is now using You might potentially see some hints which syscall is used by the locked up process so I would really recommend to check You could also verify if disabling
If you want to test it on your test server I would recommend you to set it up in the same system configuration as prod - especially with use of btrfs ( |
Ok, i need to find out how to get our test server to lockup again. I will update it to Debian12 and i will try to use btrfs. |
I'm also seeing this exact same issue.
However I tried the test Nette application and i'm unable to reproduce with this. |
Hi, try checking how much memory (and swap) you have assigned to php-fpm unit (service). Most of our problems dissapeared when we lifted the limits. When php-fpm ran out of memory, it started to swap heavily, causing IO. It was hard to diagnose, becasue there was plenty of free memory in the OS |
There may be some problem with memory freeing in php8.2, we have one server, where php-fpm slice consumes around 2-3GB of RAM. Sometimes, it starts to creep up, we have had to increase memory limit of slice/unit up to 12GB. It acts strangely, newly spawned processes already have bigger memory footprint then usually |
@Elkropac What limits are you referring to? |
We have had file
When it uses all available memory, it starts to swap heavily. Rest of system has plenty of memory. We increased these limits and most problems went away |
Hm, I don't have that set, and i'm not convinced that actually does anything... it would just cap the memory usage of the php-fpm daemon itself, not the workers, surely? |
I've enabled |
No, it should cap memory of entire unit, ie. main daemon and all children. You can see it in output of
We use this setting for all users
but i think, i was experimenting with lowering |
Hello The issue just struck again. srv ~ # ps aux | grep 1451966 srv ~ # cat /proc/1451966/syscall srv ~ # cat /proc/1451966/stack All processes are in "S" state Is there anything which can be gleamed from this? |
Hey - I found the root cause. The issue appears to be a kernel NFS client bug. I had a misconfiguration which was causing files to be served via nfs rather than the local filesystem. Normally this had little impact, but at times of heavy load it occasionally caused the kernel nfs driver to just completely lock up, putting any process which tried to read from nfs into an uninterruptible sleep. When the problem occurs, it's a transient issue (subsequent attempts to read from NFS work fine) but the processes never recover - they have to be killed and restarted. So - not a PHP issue. But i thought i'd leave this here in case it helps anyone. |
I think i have the same problem. Randomly (more or less every 2 months) my server is loocking up and all services from apache (nextcloud, wiki etc.) are not accessable. After restarting fpm it works again. I am running Debian 12 with ext4. Until now i have not looked deeper in to the problem, i just restarted fpm and it worked again. But it would be nice if i don't need to do this 😄 |
Description
Hi,
we have long standing problem with php-fpm 8.2 which forces us to stay on php8.1.
We experience random lockups of php-fpm daemon, which is caused by all child processes staying in D or R state and not finishing.
Our production machine was running Debian 11, which we updated to Debian 12 recently. It's amd64 architecture, running VM in vmware esxi 6.7 on Dell Poweredge servers. We use php packages from deb.sury.org repository.
Our test machine is Debian 11 with same settings, running on same server.
We are testing it on php 8.2.11 , but when we tried to switch to php8.2 initialy, it was version 8.2.3 , i believe.
I can reproduce it by running concurent load on 3 php-fpm pools under single master process:
ab -c 2 -t 1800 url1
, url1 has phpinfo() function in index.phpab -c 2 -t 1800 url2
, url2 has phpinfo() function in index.phpWhen i start to upload the images while
ab
processes are hitting the other users, our php-fpm locks up, here is process list from one of the test runs:You can see, there is some new child in the end unable to switch to it's final user.
When opcache is enabled , i can see this locks
These locks are not there with opcache disabled.
When i try to run whole php-fpm inside strace, running the upload does not block the whole php-fpm, but child processes of user user3 slowly die out and no new processes are spawn.
When we use php8.1, with the same settings as in php8.2, we cannot lock it, it runs ok.
I tried to
opcache.jit=0
, as suggested in JIT tracing - thundering herd causes lockup after opcache reset #11609opcache.enable=0
as second stepkernel.randomize_va_space=0
, as suggested in php-fpm master processes runs 100% CPU and keeps spawning new ones #12157pm=dynamic
instead ofpm=ondemand
No success with any of that, lock still occurs.
Our php ini modification is
our fpm pool config is the same for all users (only socket path changes)
I can add more information when asked.
Thanks
PHP Version
PHP 8.2.11
Operating System
Debian 11
The text was updated successfully, but these errors were encountered: