Navigation

    Kopano
    • Register
    • Login
    • Search
    • Categories
    • Get Official Kopano Support
    • Recent
    Statement regarding the closure of the Kopano community forum and the end of the community edition

    z-push creates php threads (not processes) until server crashes

    Z-Push when using Kopano
    4
    12
    2427
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • smilbert
      smilbert last edited by

      I’ve been struggling with the following problem for a while now and would very much appreciate your help.

      My webserver completely crashes on a regular basis due to too many php threads. The number of php processes is stable within a normal range (e.g. 35 processes). The number of threads per process constantly increases over time. After a couple of hours the total sum of all threads in all php processes gets greater than 900 - and that’s when normal server operations start to shut down.

      The threads are somehow tied to z-push. After disabling z-push and restarting php, the number of threads per process is 1 and stays that way. I am using z-push to sync kopano data to about 7 different android smartphones.

      I am running

      • a virtual server
      • 8 GB RAM
      • 4 vCores
      • ubuntu 16.04
      • nginx 1.10.3
      • php-fpm 7.0.3
      • kopano core 8.4.90
      • kopano webapp 3.4.3
      • z-push 2.4.4

      my …/fpm/pool.d/www.conf looks like this (alphabetical order):

      user = www-data
      request_terminate_timeout = 600s
      pm.start_servers = 20
      pm.min_spare_servers = 20
      pm.max_spare_servers = 35
      pm.max_requests = 500
      pm.max_children = 72
      pm = dynamic
      listen.owner = www-data
      listen.group = www-data
      listen = /run/php/php7.0-fpm.sock
      group = www-data
      env[TMP] = /tmp
      env[TMPDIR] = /tmp
      env[TEMP] = /tmp
      env[PATH] = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
      env[HOSTNAME] = $HOSTNAME
      

      And my nginx configuration looks like this:

      user www-data;
      worker_processes auto;
      pid /run/nginx.pid;
      
      events {
          worker_connections 1024;
          # multi_accept on;
      }
      
      http {
          client_max_body_size 501m;
          sendfile on;
          tcp_nopush on;
          tcp_nodelay on;
          keepalive_timeout 3;
          types_hash_max_size 2048;
          server_names_hash_max_size 8192;
          include /etc/nginx/mime.types;
          default_type application/octet-stream;  
          log_format  main  '$remote_addr - $host $remote_user [$time_local] "$request" '
                            '$status $body_bytes_sent "$http_referer" '
                            '"$http_user_agent" "$http_x_forwarded_for"';     
          ssl_protocols TLSv1 TLSv1.1 TLSv1.2; 
          ssl_prefer_server_ciphers on;
          map $status $loggable {
              ~^404  0;
              default 1;
          }
          access_log /var/log/nginx/access.log;
          error_log /var/log/nginx/error.log ;
          gzip on;
          gzip_disable "msie6";
          application/xml+rss text/javascript;
          gzip_min_length 1100;
          gzip_vary on;
          gzip_comp_level 2;
          gzip_proxied any;
          gzip_types      text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;  
          include /etc/nginx/conf.d/*.conf;
          include /etc/nginx/sites-enabled/*;
          limit_conn_zone $binary_remote_addr zone=addr:10m;  
          limit_conn addr 50; 
          fastcgi_cache_path /etc/nginx/cache levels=1:2 keys_zone=YOURAPP:100m inactive=60m;
          fastcgi_cache_key "$scheme$request_method$host$request_uri";    
          upstream php {
              server unix:/run/php/php7.0-fpm.sock;
          }    
          map $http_upgrade $connection_upgrade {
                  default upgrade;
                  '' close;
          }     
      }
      
      server {
          listen 443;
          server_name my.server.de;
          # ssl specs here 
          # some general location stuff here
          root    /usr/share/z-push;
          index   index.php;
      
          fastcgi_param  QUERY_STRING       $query_string;
          fastcgi_param  REQUEST_METHOD     $request_method;
          fastcgi_param  CONTENT_TYPE       $content_type;
          fastcgi_param  CONTENT_LENGTH     $content_length;
          fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
          fastcgi_param  REQUEST_URI        $request_uri;
          fastcgi_param  DOCUMENT_URI       $document_uri;
          fastcgi_param  DOCUMENT_ROOT      $document_root;
          fastcgi_param  SERVER_PROTOCOL    $server_protocol;
          fastcgi_param  REQUEST_SCHEME     $scheme;
          fastcgi_param  HTTPS              $https if_not_empty;
          fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
          fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;
          fastcgi_param  REMOTE_ADDR        $remote_addr;
          fastcgi_param  REMOTE_PORT        $remote_port;
          fastcgi_param  SERVER_ADDR        $server_addr;
          fastcgi_param  SERVER_PORT        $server_port;
          fastcgi_param  SERVER_NAME        $server_name;
          fastcgi_param  REDIRECT_STATUS    200;
          fastcgi_index   index.php;
          fastcgi_param   SCRIPT_FILENAME $document_root$fastcgi_script_name;
          fastcgi_param   REQUEST_URI $1;
          fastcgi_param   PHP_FLAG "magic_quotes_gpc=off \n register_globals=off \n magic_quotes_runtime=off \n short_open_tag=on";
          fastcgi_param   PHP_VALUE "post_max_size=20M \n upload_max_filesize=20M \n max_execution_time=3660";
          fastcgi_param   HTTP_PROXY ""; 
          fastcgi_read_timeout 3660; 
      
          location ~* /Microsoft-Server-ActiveSync {
              alias       /usr/share/z-push/index.php;
              access_log  /var/log/nginx/z-push-access.log;
              error_log   /var/log/nginx/z-push-error.log;
              client_max_body_size 20m;
              client_body_buffer_size 128k;
              fastcgi_pass    unix:/var/run/php/php7.0-fpm.sock; 
          }
      
          location ~* /AutoDiscover/AutoDiscover.xml {
              alias       /usr/share/z-push/autodiscover/autodiscover.php;
              access_log  /var/log/nginx/z-push-autodiscover-access.log;
              error_log   /var/log/nginx/z-push-autodiscover-error.log;
              fastcgi_pass    unix:/var/run/php/php7.0-fpm.sock;
          }
      
          location / {
              try_files $uri $uri/ index.php; 
          }
      }
      

      And of course z-push-conf.php:

      <?php
      
          define('TIMEZONE', 'Europe/Berlin');
          setlocale(LC_CTYPE, "de_DE.UTF-8");  
          define('BASE_PATH', dirname($_SERVER['SCRIPT_FILENAME']). '/');
          define('SCRIPT_TIMEOUT', 0);
          define('USE_CUSTOM_REMOTE_IP_HEADER', false);
          define("CERTIFICATE_OWNER_PARAMETER", "SSL_CLIENT_S_DN_CN");
          define('USE_FULLEMAIL_FOR_LOGIN', true);
          define('STATE_MACHINE', 'FILE');
          define('STATE_DIR', '/var/lib/z-push/');
          define('IPC_PROVIDER', '');
          define('LOGBACKEND', 'filelog');
          define('LOGLEVEL', LOGLEVEL_ERROR);
          define('LOGAUTHFAIL', true);
          define('LOGUSERLEVEL', LOGLEVEL_DEVICEID);
          $specialLogUsers = array();
          define('LOGFILEDIR', '/var/log/z-push/');
          define('LOGFILE', LOGFILEDIR . 'z-push.log');
          define('LOGERRORFILE', LOGFILEDIR . 'z-push-error.log');
          define('LOG_SYSLOG_HOST', false);
          define('LOG_SYSLOG_PORT', 514);
          define('LOG_SYSLOG_PROGRAM', 'z-push');
          define('LOG_SYSLOG_FACILITY', LOG_LOCAL0);
          define('PROVISIONING', false);
          define('LOOSE_PROVISIONING', false);
          define('PROVISIONING_POLICYFILE', 'policies.ini');
          define('SYNC_CONFLICT_DEFAULT', SYNC_CONFLICT_OVERWRITE_PIM);
          define('SYNC_FILTERTIME_MAX', SYNC_FILTERTYPE_ALL);
          define('PING_INTERVAL', 300);
          define('FILEAS_ORDER', SYNC_FILEAS_LASTFIRST);
          define('SYNC_MAX_ITEMS', 25);
          define('UNSET_UNDEFINED_PROPERTIES', false);
          define('SYNC_CONTACTS_MAXPICTURESIZE', 5242880);
          define('ALLOW_WEBSERVICE_USERS_ACCESS', false);
          define('USE_PARTIAL_FOLDERSYNC', false);
          define('PING_LOWER_BOUND_LIFETIME', false);
          define('PING_HIGHER_BOUND_LIFETIME', 320); 
          define('SYNC_TIMEOUT_SHORT_DEVICETYPES', "");     
          define('SYNC_TIMEOUT_MEDIUM_DEVICETYPES', "");
          define('SYNC_TIMEOUT_LONG_DEVICETYPES',   "iPod, iPad, iPhone, WP, WindowsOutlook, WindowsMail");
          define('RETRY_AFTER_DELAY', 300);
          define('BACKEND_PROVIDER', '');
          define('SEARCH_PROVIDER', '');
          define('SEARCH_WAIT', 10);
          define('SEARCH_MAXRESULTS', 10);
          define('KOE_CAPABILITY_GAB', true);
          define('KOE_CAPABILITY_RECEIVEFLAGS', true);
          define('KOE_CAPABILITY_SENDFLAGS', true);
          define('KOE_CAPABILITY_OOF', true);
          define('KOE_CAPABILITY_OOFTIMES', true);
          define('KOE_CAPABILITY_NOTES', true);
          define('KOE_CAPABILITY_SHAREDFOLDER', true);
          define('KOE_CAPABILITY_SENDAS', true);
          define('KOE_CAPABILITY_SECONDARYCONTACTS', true);
          define('KOE_CAPABILITY_SIGNATURES', true);
          define('KOE_CAPABILITY_RECEIPTS', true);
          define('KOE_CAPABILITY_IMPERSONATE', true);
          define('KOE_GAB_STORE', 'SYSTEM');
          define('KOE_GAB_FOLDERID', '');
          define('KOE_GAB_NAME', 'Z-Push-KOE-GAB');
          $additionalFolders = array(
              array(
                  'store'     => "SYSTEM",
                  'folderid'  => "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                  'name'      => "IWDR-Kontakte",
                  'type'      => SYNC_FOLDER_TYPE_USER_CONTACT,
                  'flags'     => DeviceManager::FLD_FLAGS_NONE
              ),        
          );
      

      And finally, I end up with plenty of entries in z-push.log which look like these:

      22/09/2018 07:22:07 [ 8782] [ERROR] [name@server.de] Please check your logs for this PID and errors like PHP-Fatals or Apache segmentation faults and report your results to the Z-Push dev team.
      22/09/2018 08:08:06 [16667] [ERROR] [name@server.de] LoopDetection->ProcessLoopDetectionPreviousConnectionFailed(): Command 'FolderSync' at 22/09/2018 08:08:01 with pid '15348' terminated unexpectedly or is still running.
      

      I understand that this kind of problem is often connected to timeouts. Can you give me advice on the best practice to avoid these timeouts?
      Are these timeouts also the reason for the ever growing number of threads?

      Any help would be very much appreciated. Thank you in advance!

      1 Reply Last reply Reply Quote 0
      • Manfred
        Manfred Kopano last edited by

        Hi smilbert,

        did you check nginx log for segfaults regarding those pids? Or z-push.log if they were running for longer than 30 seconds? Are theses ProcessLoopDetectionPreviousConnectionFailed entries for FolderSync command only or also for other commands?
        Do you have a proxy in front of nginx?
        Do these android devices belong to different users?
        When you disable Z-Push, are there some Webapp users? Or do you disable it as well?

        Maybe it would help to reduce pm.max_requests value to e.g. 200.

        Manfred

        1 Reply Last reply Reply Quote 0
        • liverpoolfcfan
          liverpoolfcfan last edited by

          From a practical point of view - if your system is slow generating the responses to z-push you can run into a situation where the mobile has given up waiting, and re-issues the request while the server is still churning through gathering the data and preparing it. This can spiral out of control.

          I would suggest you try reducing the

          define('SYNC_MAX_ITEMS', 512);
          

          in the config file to see if it alleviates the problem. Try 256 or even 128 and observe if it makes a difference. There is little downside for synced clients - obviously for new clients it will slow down the initial sync as more connection cycles will be needed to get the data over to the device.

          If you identify that is helps you have a clue where to start looking - performance of your collaboration suite/z-push. .

          1 Reply Last reply Reply Quote 0
          • smilbert
            smilbert last edited by

            Thank you both for your replies. As you suggested, I tried the following steps:

            • set “SYNC_MAX_ITEMS” to 25 and even down to 5
            • set “pm.max_requests” to 200 and even down to 100

            Both attempts did not solve the problem. Before each change, I deacviated z-push for at least 6 Minutes so all clients would be disconnected. After the changes, I restartet php. Other things I recently and unsuccessfully tried:

            • set “SCRIPT_TIMEOUT” to “100”
            • set “PING_INTERVAL” to 1000
            • set “SYNC_CONTACTS_MAXPICTURESIZE” to 1242880
            • set “SEARCH_WAIT” and “SEARCH_MAXRESULTS” to 5 and even to 2
            • deactivate “additionalFolders”

            And to answer your questions, Manfred:

            • My nginx error log shows plenty of entries like this: 2018/09/30 21:42:02 [error] 20417#20417: *249222 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 94.223.123.456, server: my.server.de, request: "POST /Microsoft-Server-ActiveSync?Cmd=Ping&User=my.user%40myDomain.de&DeviceId=android1234567890407&DeviceType=Android HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "my.server.de"
            • I cannot find any interesting error messages in the z-push-logs. There are various entries with times of 300 seconds, though.
            • Lately, the "ProcessLoopDetectionPreviousConnectionFailed " did not appear anymore. I only find one of those in the logs - and that one is for “Provisioning” instead of “FolderSync”
            • I do not have a proxy in front of nginx.
            • The Android devices belong (mostly) to different users. Two users use more than one device, about three devices sync for more than one user.
            • When I disable z-push (by setting root-only permissions on its index.php), I leave Kopano Webapp fully accessible.

            Thank you for your suggestions so far. Any additional help would be greatly appreciated!
            Smilbert

            umgfoin 1 Reply Last reply Reply Quote 0
            • umgfoin
              umgfoin @smilbert last edited by

              Did you check your timeouts for:

              • php-fpm: max_execution_time
              • nginx: fastcgi_read_timeout

              ?

              ++umgfoin

              1 Reply Last reply Reply Quote 0
              • smilbert
                smilbert last edited by

                fastcgi_read_timeout was already set to the pretty high value of 3660.
                And I increased max_execution_time to 360.
                Still no improvement…

                1 Reply Last reply Reply Quote 0
                • Manfred
                  Manfred Kopano last edited by

                  Hi Smilbert,

                  the Ping and Sync with heartbeat requests in Z-Push may be open up until 59 minutes (3540 seconds). fastcgi_read_timeout value in the nginx config already reflects that. However max_execution_time and request_terminate_timeout are way below this value. I’m not very familiar with nginx and php-fpm, but maybe you have to increase these two values as well?

                  Alternatively you could set PING_HIGHER_BOUND_LIFETIME in Z-Push config to some value lower than max_execution_time/request_terminate_timeout. Just be aware that this will result in a request every PING_HIGHER_BOUND_LIFETIME if there were no changes.

                  Manfred

                  1 Reply Last reply Reply Quote 0
                  • smilbert
                    smilbert last edited by

                    Hi Manfred,

                    again, thank you for your advice. To sum it all up: After trying everything mentioned above, my problem still hasn’t changed. (However, the segmentation faults are gone now. They appearently weren’t connected to the thread-flooding-problem after all.)

                    So I concentrated on further analyzing the problem using z-push-top.php. As it turns out, the threads are not spawned after the usual server side timeouts we discussed above. In z-push-top, I see many connections turn grey (–> terminated) after 30 seconds. And each time that happens, the php thread count goes up by one.
                    If I understand the comment in z-push-config correctly, then the 30s-Limit is not set on the server side but it is defined by the android clients, right? Do you have any suggestions on what I can do to solve that?

                    Thank you and best regards
                    Smilbert

                    1 Reply Last reply Reply Quote 0
                    • Manfred
                      Manfred Kopano last edited by

                      Hi Smilbert,

                      yes, the Android clients have a built-in 30 seconds timeout for a request and if they don’t receive a response they will issue a new request. However it’s different for Ping requests as the devices send Lifetime/HeartbeatInterval parameter (which could be up to 3540 seconds) and also honour it.
                      Did you notice which connections are mostly turning grey? Ping? Sync?

                      It might be that the processes aren’t being terminated correctly or there’s some issue with PHP-FPM, but unfortunately I don’t have any other suggestions.

                      Manfred

                      1 Reply Last reply Reply Quote 0
                      • smilbert
                        smilbert last edited by

                        Hi Manfred,

                        I’d say that more than 90% of them are ping processes, the rest are Syncs and FolderSyncs.
                        In most cases, the column “Additional Information” just states “OK : lifetime 30s” or simply “OK” for the ping terminations.
                        Straaaaange!

                        Stefan

                        1 Reply Last reply Reply Quote 0
                        • liverpoolfcfan
                          liverpoolfcfan last edited by

                          It is very strange that you are getting 30s lifetimes.

                          If you look at the active Ping or Sink commands, what do you see immediately after the Ping/Sync - it sould show elapsed/maximum seconds and then the list of folders

                          1 Reply Last reply Reply Quote 0
                          • smilbert
                            smilbert last edited by

                            One Month ago I updated Kopano Core from 8.4.90 to 8.6.82. That actually solved the problem. Now there’s only one thread per process. Better late than never. Thank you all for your help.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post