Z-push with Zimbra users of various configurations
-
How many messages are in your Inbox?
Tens of thousands, spanning twenty years. This is true of both users – the one that does not authenticate to Active Directory (cjm) and the and one that does authenticate to Active Directory (mhr)
Have you set all timeouts to allow sessions up to 60 minutes?
I have made nearly zero changes to the stock configuration. I have indicated the name of the server, and I have enabled “ZIMBRA_URL_ALLOW_REDIRECT”. Please tell me your recommendations for settings.
Have you configured the z-push setting to 256 or even 128 to allow the server to respond more quickly to the client?
No. I have not played with the things I don’t understand. I would be happy to make any changes that you suggest. I would even perform some procedure to discover the values that you think should be changed, if you have a suggestion.
define('SYNC_MAX_ITEMS', 256);
In /usr/share/z-push/config.php I have “define(‘SYNC_MAX_ITEMS’, 512);”. Are you recommending that I reduce this number to 256? Is this number “per user” or “per server”? I have a tiny number of users; fewer than a dozen. As a diagnostic, would it help to reduce this to a ridiculous number, like 8?
-
Are both users using the same mobile device type?
Yes and No. I have tried the following configurations:
- Both email accounts on the same device, under a single device user’s email profile. (device user cjm, viewing email for cjm, mhr)
- Both email accounts on the same device, under separate device user’s email profile. (device user cjm, viewing email for cjm, and device user mhr vewing email for mhr)
- Two devices
In all cases, the locally authenticated user work quite well, but as soon as I register the Active Directory authenticated user (mhr):
- Performance declines to nearly unusable
- Email folders are visible but no email contents are visible, nor are calendar events, nor contacts.
I have not experimented to see what happens with two locally authenticated users, because I only have the one, and even if the problem disappears under that configuration, it would not solve the problem I have with my Active Directory authenticated users.
-
The authentication method has nothing to do with it. I suspect the issue is sheer quantity of messages.
An unfortunate aspect of the third-party backend frasmework for z-push is that they have to walk through all the messages in any synced folder to enumerate them prior to beginning synchronization. If there are huge numbers of messages in the folder this process can take quite long, and the client can give up waiting for a response thinking that the communication had a problem. The client will re-issue the same request starting another thread on the zpush server running a parallel thrawl through the users mailbox. You could end up with several of these running simultaneously which of course will degrade the server performance.
Do you see any crashes in the Apache logs? Perhaps the z-push process is running out of memory.
If you monitor z-push-top.php you should be able to see what is happening for each connected client.
-
The authentication method has nothing to do with it. I suspect the issue is sheer quantity of messages.
I agree. Almost. Both users (cjm, mhr) have tens of thousands of messages. cjm (locally authenticated) has no problem. mhr (Active Directory authenticated) has never synced more than folder structure, even when mhr is the only user ActiveSync-ing on the entire server, meaning cjm is deleted from all devices.
Do you see any crashes in the Apache logs? Perhaps the z-push process is running out of memory.
I am using Nginx, but your question is still valid. I see nothing that looks like a report of process or thread abnormally ending.
For user mhr, I see endless reports of time-out. In “nginx.log”: Upstream timed out (110: Connection timed out). For cjm, I see this only occasionally.
If you monitor z-push-top.php you should be able to see what is happening for each connected client.
I deleted cjm and I created mhr on my handheld device. mhr is the only user configured for ActiveSync. I ran z-push-top-.php. I saw a variety of initialization and then I saw the command “folders…”.
Then I saw a repeating pattern; I see two threads, turning from black to red at about 0:05 seconds. One will count to somewhere between 01:30 and 03:00, , and the other will count to 0:30 and vanish with a new thread immediately appearing to take its place. Periodically, the older thread will expire and the younger thread will continue past the normal expiration point of 0:30 and take the place of the senior thread.
This pattern continues forever with mhr, my Active Directory authenticated user, but by contrast terminates as expected with cjm, my locally authenticated user. I see a variety of commands including obvious requests for emails. The threads for cjm never turn red. This process completes for cjm in less than a minute.
I am beginning to wonder if this isn’t a plumbing problem…
-
Hi
I had similar Problem with Outlook and KOE - could it be that the failing user has too many folders in his Mailbox ?
On my side I had a user with over 1600 Email Folders - after reducing them to about 900 it worked.rg
Christian -
My two users have vastly different numbers of folders.
cjm (locally authenticated, works) has several hundred folders, possibly more
mhr (Active Directory authenticated, fails) has fewer than a dozen folders.I think this is plumbing. All the capacity limits would impeach the user that works, cjm, not the user that fails. I see endless time-outs logged for mhr, and although I also see time-outs for cjm they are infrequent. I see that I can increase logging level to “DEBUG”, but it is not clear to me how I do that. Can you advise me?
-
Hi @cjm51213 ,
https://wiki.z-hub.io/display/ZP/Debugging should provide you with what you need.
-
@fbartels Thanks. I have followed those instructions with some success. I am seeing strong indications of the problem, but the specifics are not clear. Here’s what I see:
Reasonable request:
25/08/2017 09:31:41 [ 6044] [DEBUG] [hollie@tclc.org] [sec1efd804e75816] Zimbra->GetUserInfo(): START GetUserInfo 25/08/2017 09:31:41 [ 6044] [DEBUG] [hollie@tclc.org] [sec1efd804e75816] Zimbra->SoapRequest(): SOAP Message: <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"> <soap:Header> <context xmlns="urn:zimbra"> <authToken><Deleted for brevity></authToken> <session id="120" /> <format type="js" /> <userAgent name="Android-SAMSUNG-SM-G318ML/101.40404(...e75816) devip=10.1.1.105 ZPZB" version="66" /> </context> </soap:Header> <soap:Body> <GetInfoRequest sections="mbox,prefs,attrs,idents,dsrcs,children" xmlns="urn:zimbraAccount"></GetInfoRequest> </soap:Body> </soap:Envelope>
Denied:
25/08/2017 09:31:41 [ 6044] [DEBUG] [hollie@tclc.org] [sec1efd804e75816] Zimbra->SoapRequest(): SOAP Response: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 502 Connection to Upstream is Refused</title> </head> <body> <h2>HTTP ERROR 502</h2> <p>Problem accessing ZCS upstream server. Cannot connect to the ZCS upstream server. Connection is refused.<br/> Possible reasons: <ul> <li>upstream server is unreachable</li> <li>upstream server is currently being upgraded</li> <li>upstream server is down</li> </ul> Please contact your ZCS administrator to fix the problem. </p><br/> <i><small>Powered by Nginx-Zimbra://</small></i><br/> </body> </html>
None of the suggestions is reasonable since the server is currently up and reachable. So, there are a few questions:
- What is z-push trying to accomplish when this upstream access is refused?
- How is z-push authenticating to the upstream server, meaning protocol, port, and credentials?
I have to believe that this is z-push having been misdirected since at the time of these errors, the Active Directory user, mhr, is perfectly able to transact e-mail through the web interface.
Does anybody have any advice?
Chris.
-
Do you have just one or multiple zimbra MTA servers?
Have you whitelisted the z-push server on the zimbra server(s)
-
Also, is your zimbra server reasonably sized? It could be that it is struggling to handle all of the requests
-
@liverpoolfcfan said in Z-push with Zimbra users of various configurations:
Do you have just one or multiple zimbra MTA servers?
I have one server, and three users.
Have you whitelisted the z-push server on the zimbra server(s)
No. z-push is run by nginx, which is the same nginx that is known a “Zimbra Proxy”. Zimbra supports ActiveSync, but not the Community Version, however, it looked like the natural place to put my z-push configuration.
Is there something else that I need to configure? As far as Zimbra is concerned, z-push is trivial and is configured like this:
http { server { server_name mail.tclc.org; listen 10.1.1.12:443; location ^~ /Microsoft-Server-ActiveSync { alias /usr/share/z-push/index.php; fastcgi_param ... <edited for brevity> fastcgi_pass unix:/var/run/php-fpm/php5-fpm.sock; } } }
-
I am afraid I have no experience in configuring nginx or using zimbra with FastCGI
Perhaps there are others looking on who can help with that aspect
I am slightly concerned that you will have nginx calling fastcgi calling nginx calling zimbra - I have no idea if nginx might have some kind of self preservation system to avoid loops/lockups. I don’t know how long fastCGI will hang about waiting for responses, etc.
I suggest you separate the z-push out onto an apache server first to see if everything settles down, then if and when it does you can go back to trying to get this config to work in the understanding that the core zimbra functionality and core z-push functionality are working as expected.
-
@liverpoolfcfan said in Z-push with Zimbra users of various configurations:
I am afraid I have no experience in configuring nginx or using zimbra with FastCGI
Perhaps there are others looking on who can help with that aspect
I am slightly concerned that you will have nginx calling fastcgi calling nginx calling zimbra - I have no idea if nginx might have some kind of self preservation system to avoid loops/lockups. I don’t know how long fastCGI will hang about waiting for responses, etc.
I suggest you separate the z-push out onto an apache server first to see if everything settles down, then if and when it does you can go back to trying to get this config to work in the understanding that the core zimbra functionality and core z-push functionality are working as expected.
Let’s not give up quite yet… This is an interesting problem, and we know a few things that might be helpful.
We know that my problem is only with Active Directory authenticated users; cjm works and mhr fails. I am not the only system in the world using Zimbra “External LDAP Authentication”, and I am not the only system in the world doing this with nginx, however, I might be the only system in the world using z-push and the Zimbra backend on nginx to do this. This might be a zimbra problem or it might be a z-push zimbra backend problem; it is too early to tell.
I’m willing to put in the work necessary to debug this. I’m going to need a little help.
I assume that provisioning an account is initiated with the hand-held. So, my guess at the flow, and please correct me, handheld requests “https://…/Microsoft-Server-ActiveSync” which calls, in my case, “/usr/share/z-push/index.php”. Z-push, running in php-fpm is still just php even though not in the context of nginx, will dispatch this to the zimbra backend, which is going to be in the same process. I assume zimbra backend does the authentication. I assume that this will be a soap request to an nginx served location. Please explain a bit about how that happens. I also assume that zimbra backend is stateless, meaning over the course of a fairly complex provisioning request, there will be many calls to authenticate.
At some point these requests to authenticate begin to fail. Interestingly, they fail consistently at the same procedural place regardless of efforts before and after the failure, meaning the failure seems to not be related to either the volume of e-mail or the number of folders.
This is where it gets confusing… Much of the provisioning is completed, meaning I can see mail folders, but not e-mail. Because I have a locally authenticated user, cjm, that works without any problem, and an Active Directory authenticated user that fails, it is tempting to conclude that something in the authentication path is failing. “z-push/zimbra backend” probably knows nothing about the various authentication mechanisms available to Zimbra. It is easy to conclude that Zimbra is doing something to try to authenticate that is failing, but since this seems to happen in the same procedural place, it is also possible that there is a bug in z-push/zimbra backend code that is corrupting authentication values, so unbeknownst to zimbra backend, he is trying to authenticate with garbage.
I just need a bit of a hint about “z-push/zimbra backend” processing, so I can pepper the approximate location of the action with a suitable set of "printf()"s and “var_dumps”. First, please confirm that no authentication is necessary to get as far as I have been getting, meaning creation of the email folder tree. If I am wrong and authentication is necessary, then please tell me the call that Zimbra backend is using, and I can break at that point to see what is happening, meaning why I am able to authenticate sufficiently to see folders, but not to see contents.
-
The zimbra backend is effectively equivalent to a web user connection to zimbra using the standard zimbra webmail client. The backend performs a curl request to the same HTTP(S) address as the webmail does when it issues requests.
There is nothing that can take place without authentication - so if any client has gotten as far as provisioning then they have been authenticated.
Have you captured a detailed debug log?
Use the $specialLogUsers array in the z-push config file to capture detailed logs for just that user.
If you turn the z-push logging up to WBXML level in the z-push config file for the specialLogUsers it will provide all of the z-push comings and goings.
In addition if you then turn ZIMBRA_DEBUG on (or set it to the particular username) in the zimbra config.php file then you will also get all of the zimbra SOAP requests and responses logged.
The first request issued in the zimbra backend is an AuthRequest. If that fails then it simply returns a failure to z-push and you are dead.
I am sure this is succeeding in every case. I am sure your issue is further down the line, and I suspect it is folder/message volume related somehow.
-
@liverpoolfcfan said in Z-push with Zimbra users of various configurations:
The zimbra backend is effectively equivalent to a web user connection to zimbra using the standard zimbra webmail client. The backend performs a curl request to the same HTTP(S) address as the webmail does when it issues requests.
Thanks. This is quite helpful. I have a much better understanding with just that simple explanation. In this case, I’m pretty sure I have encountered a Zimbra bug. We shall see…
There is nothing that can take place without authentication - so if any client has gotten as far as provisioning then they have been authenticated.
This seems to be the case. Then somewhere around 10:43, I see complaints in z-push.log:
z-push.log:30/08/2017 10:43:24 [24394] [ERROR] [hollie@tclc.org] [sec1efd804e75816] LoopDetection->ProcessLoopDetectionPreviousConnectionFailed(): Command 'Sync' at 30/08/2017 10:42:18 with pid '22964' terminated unexpectedly or is still running.
z-push.log:30/08/2017 10:43:24 [24394] [ERROR] [hollie@tclc.org] [sec1efd804e75816] Please check your logs for this PID and errors like PHP-Fatals or Apache segmentation faults and report your results to the Z-Push dev team.
Have you captured a detailed debug log?
Yes. I have z-push.log, z-push-error.log, logged at the WBXML level. I have some php logs, and some zimbra logs, as well. They are all trimmed to the correct time frame to reduce the noise, but not within that time frame to prevent obscuring notice of external influences. I also have operator log, which details my user commands with time stamps.
Use the $specialLogUsers array in the z-push config file to capture detailed logs for just that user. If you turn the z-push logging up to WBXML level in the z-push config file for the specialLogUsers it will provide all of the z-push comings and goings.
Done. hollie_tclc_org.log hollie_tclc_org.log.formatted has the JSON formated in a human readable form.
In addition if you then turn ZIMBRA_DEBUG on (or set it to the particular username) in the zimbra config.php file then you will also get all of the zimbra SOAP requests and responses logged.
This is new information for me, and I have not done this. I will repeat my experiment with this setting, but I suspect that it is now superfluous. It is clear to me that the complaints in z-push.log are because that process has become unresponsive. I looked and it was still running. I also see hints about mailbox locking, so I think that somehow, after successfully returning the folder structure and beginning to return the contents, the mailbox is locked and blocks subsequent accesses.
This deadlock will be on the “Authenticate with External LDAP” path in zimbra, because ALL such users fail, and my one Zimbra authenticate user succeeds.
I am sure this is succeeding in every case. I am sure your issue is further down the line, and I suspect it is folder/message volume related somehow.
I have been quite sure since we first started working on this that it was not capacity related simply because the user that works, cjm, has a much higher capacity requirement than the user that fails, mhr. In fact, I have also tried experimental users with trivial amounts of volume, which also fail. However, since you clearly know more about this than I do, I have not asserted my belief strongly. I think we will see that it is a Zimbra logic bug causing a deadlock.
I have a set of logs that constitute about four megs and might be interesting to you, if you would like them, and if you instruct me how, I will provide them to you. As the operator, I kept a timestamped log of my actions. It is included below:
10.1.1.100: law.tclc.org (mhr desktop) 10.1.1.103: code.tclc.org (cjm desktop) 10.1.1.105: Android Cell Phone (mhr handheld) 10.1.1.12: mail.tclc.org (Zimbra Server) 2017.08.30 10:31:01 cleared logs 2017.08.30 10:31:54 Powered down white tablet to reduce noise 2017.08.30 10:33:36 Powered up white cell phone 201708.30 10:34:14 invoke phone email application 2017.08.30 10:36:01 "Next" (email application on phone) 2017.08.30 10:36:12 Server could not finish -- Operator Error: incorrect domain name (mail.itclc.org) 2017.08.30 10:36:49 "Next" (email application on phone) 2017.08.30 10:36:55 "Continue" (email application on phone) 2017.08.30 10:37:46 authentication failed -- Operator Error: incorrect username (hollie@tclc.orh) 2017.08.30 10:38:04 "Next" (email application on phone) 2017.08.30 10:38:09 "Continue" (email application on phone) 2017.08.30 10:39:29 "Next" (email application on phone) (provisioning) 2017.08.30 10:39:51 Device administrator 2017.08.30 10:41:47 Policies deployed 2017.08.30 10:43:01 Request contents of "inbox" ... endless "Loading..." 2017.08.30 10:44:48 exit mail application 2017.08.30 10:47:06 last of z-push-top threads dissappears 2017.08.30 11:06:29 stop zimbra, php-fpm
-
If you click on the chats icon beside your profile avatar you should see a message from me.