z-push creates php threads (not processes) until server crashes
-
Hi smilbert,
did you check nginx log for segfaults regarding those pids? Or z-push.log if they were running for longer than 30 seconds? Are theses ProcessLoopDetectionPreviousConnectionFailed entries for FolderSync command only or also for other commands?
Do you have a proxy in front of nginx?
Do these android devices belong to different users?
When you disable Z-Push, are there some Webapp users? Or do you disable it as well?Maybe it would help to reduce pm.max_requests value to e.g. 200.
Manfred
-
From a practical point of view - if your system is slow generating the responses to z-push you can run into a situation where the mobile has given up waiting, and re-issues the request while the server is still churning through gathering the data and preparing it. This can spiral out of control.
I would suggest you try reducing the
define('SYNC_MAX_ITEMS', 512);
in the config file to see if it alleviates the problem. Try 256 or even 128 and observe if it makes a difference. There is little downside for synced clients - obviously for new clients it will slow down the initial sync as more connection cycles will be needed to get the data over to the device.
If you identify that is helps you have a clue where to start looking - performance of your collaboration suite/z-push. .
-
Thank you both for your replies. As you suggested, I tried the following steps:
- set “SYNC_MAX_ITEMS” to 25 and even down to 5
- set “pm.max_requests” to 200 and even down to 100
Both attempts did not solve the problem. Before each change, I deacviated z-push for at least 6 Minutes so all clients would be disconnected. After the changes, I restartet php. Other things I recently and unsuccessfully tried:
- set “SCRIPT_TIMEOUT” to “100”
- set “PING_INTERVAL” to 1000
- set “SYNC_CONTACTS_MAXPICTURESIZE” to 1242880
- set “SEARCH_WAIT” and “SEARCH_MAXRESULTS” to 5 and even to 2
- deactivate “additionalFolders”
And to answer your questions, Manfred:
- My nginx error log shows plenty of entries like this:
2018/09/30 21:42:02 [error] 20417#20417: *249222 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 94.223.123.456, server: my.server.de, request: "POST /Microsoft-Server-ActiveSync?Cmd=Ping&User=my.user%40myDomain.de&DeviceId=android1234567890407&DeviceType=Android HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "my.server.de"
- I cannot find any interesting error messages in the z-push-logs. There are various entries with times of 300 seconds, though.
- Lately, the "ProcessLoopDetectionPreviousConnectionFailed " did not appear anymore. I only find one of those in the logs - and that one is for “Provisioning” instead of “FolderSync”
- I do not have a proxy in front of nginx.
- The Android devices belong (mostly) to different users. Two users use more than one device, about three devices sync for more than one user.
- When I disable z-push (by setting root-only permissions on its index.php), I leave Kopano Webapp fully accessible.
Thank you for your suggestions so far. Any additional help would be greatly appreciated!
Smilbert -
Did you check your timeouts for:
- php-fpm: max_execution_time
- nginx: fastcgi_read_timeout
?
++umgfoin
-
fastcgi_read_timeout was already set to the pretty high value of 3660.
And I increased max_execution_time to 360.
Still no improvement… -
Hi Smilbert,
the Ping and Sync with heartbeat requests in Z-Push may be open up until 59 minutes (3540 seconds). fastcgi_read_timeout value in the nginx config already reflects that. However max_execution_time and request_terminate_timeout are way below this value. I’m not very familiar with nginx and php-fpm, but maybe you have to increase these two values as well?
Alternatively you could set PING_HIGHER_BOUND_LIFETIME in Z-Push config to some value lower than max_execution_time/request_terminate_timeout. Just be aware that this will result in a request every PING_HIGHER_BOUND_LIFETIME if there were no changes.
Manfred
-
Hi Manfred,
again, thank you for your advice. To sum it all up: After trying everything mentioned above, my problem still hasn’t changed. (However, the segmentation faults are gone now. They appearently weren’t connected to the thread-flooding-problem after all.)
So I concentrated on further analyzing the problem using z-push-top.php. As it turns out, the threads are not spawned after the usual server side timeouts we discussed above. In z-push-top, I see many connections turn grey (–> terminated) after 30 seconds. And each time that happens, the php thread count goes up by one.
If I understand the comment in z-push-config correctly, then the 30s-Limit is not set on the server side but it is defined by the android clients, right? Do you have any suggestions on what I can do to solve that?Thank you and best regards
Smilbert -
Hi Smilbert,
yes, the Android clients have a built-in 30 seconds timeout for a request and if they don’t receive a response they will issue a new request. However it’s different for Ping requests as the devices send Lifetime/HeartbeatInterval parameter (which could be up to 3540 seconds) and also honour it.
Did you notice which connections are mostly turning grey? Ping? Sync?It might be that the processes aren’t being terminated correctly or there’s some issue with PHP-FPM, but unfortunately I don’t have any other suggestions.
Manfred
-
Hi Manfred,
I’d say that more than 90% of them are ping processes, the rest are Syncs and FolderSyncs.
In most cases, the column “Additional Information” just states “OK : lifetime 30s” or simply “OK” for the ping terminations.
Straaaaange!Stefan
-
It is very strange that you are getting 30s lifetimes.
If you look at the active Ping or Sink commands, what do you see immediately after the Ping/Sync - it sould show elapsed/maximum seconds and then the list of folders
-
One Month ago I updated Kopano Core from 8.4.90 to 8.6.82. That actually solved the problem. Now there’s only one thread per process. Better late than never. Thank you all for your help.