Z-push with Zimbra users of various configurations
-
Hi @cjm51213 ,
https://wiki.z-hub.io/display/ZP/Debugging should provide you with what you need.
-
@fbartels Thanks. I have followed those instructions with some success. I am seeing strong indications of the problem, but the specifics are not clear. Here’s what I see:
Reasonable request:
25/08/2017 09:31:41 [ 6044] [DEBUG] [hollie@tclc.org] [sec1efd804e75816] Zimbra->GetUserInfo(): START GetUserInfo 25/08/2017 09:31:41 [ 6044] [DEBUG] [hollie@tclc.org] [sec1efd804e75816] Zimbra->SoapRequest(): SOAP Message: <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"> <soap:Header> <context xmlns="urn:zimbra"> <authToken><Deleted for brevity></authToken> <session id="120" /> <format type="js" /> <userAgent name="Android-SAMSUNG-SM-G318ML/101.40404(...e75816) devip=10.1.1.105 ZPZB" version="66" /> </context> </soap:Header> <soap:Body> <GetInfoRequest sections="mbox,prefs,attrs,idents,dsrcs,children" xmlns="urn:zimbraAccount"></GetInfoRequest> </soap:Body> </soap:Envelope>
Denied:
25/08/2017 09:31:41 [ 6044] [DEBUG] [hollie@tclc.org] [sec1efd804e75816] Zimbra->SoapRequest(): SOAP Response: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 502 Connection to Upstream is Refused</title> </head> <body> <h2>HTTP ERROR 502</h2> <p>Problem accessing ZCS upstream server. Cannot connect to the ZCS upstream server. Connection is refused.<br/> Possible reasons: <ul> <li>upstream server is unreachable</li> <li>upstream server is currently being upgraded</li> <li>upstream server is down</li> </ul> Please contact your ZCS administrator to fix the problem. </p><br/> <i><small>Powered by Nginx-Zimbra://</small></i><br/> </body> </html>
None of the suggestions is reasonable since the server is currently up and reachable. So, there are a few questions:
- What is z-push trying to accomplish when this upstream access is refused?
- How is z-push authenticating to the upstream server, meaning protocol, port, and credentials?
I have to believe that this is z-push having been misdirected since at the time of these errors, the Active Directory user, mhr, is perfectly able to transact e-mail through the web interface.
Does anybody have any advice?
Chris.
-
Do you have just one or multiple zimbra MTA servers?
Have you whitelisted the z-push server on the zimbra server(s)
-
Also, is your zimbra server reasonably sized? It could be that it is struggling to handle all of the requests
-
@liverpoolfcfan said in Z-push with Zimbra users of various configurations:
Do you have just one or multiple zimbra MTA servers?
I have one server, and three users.
Have you whitelisted the z-push server on the zimbra server(s)
No. z-push is run by nginx, which is the same nginx that is known a “Zimbra Proxy”. Zimbra supports ActiveSync, but not the Community Version, however, it looked like the natural place to put my z-push configuration.
Is there something else that I need to configure? As far as Zimbra is concerned, z-push is trivial and is configured like this:
http { server { server_name mail.tclc.org; listen 10.1.1.12:443; location ^~ /Microsoft-Server-ActiveSync { alias /usr/share/z-push/index.php; fastcgi_param ... <edited for brevity> fastcgi_pass unix:/var/run/php-fpm/php5-fpm.sock; } } }
-
I am afraid I have no experience in configuring nginx or using zimbra with FastCGI
Perhaps there are others looking on who can help with that aspect
I am slightly concerned that you will have nginx calling fastcgi calling nginx calling zimbra - I have no idea if nginx might have some kind of self preservation system to avoid loops/lockups. I don’t know how long fastCGI will hang about waiting for responses, etc.
I suggest you separate the z-push out onto an apache server first to see if everything settles down, then if and when it does you can go back to trying to get this config to work in the understanding that the core zimbra functionality and core z-push functionality are working as expected.
-
@liverpoolfcfan said in Z-push with Zimbra users of various configurations:
I am afraid I have no experience in configuring nginx or using zimbra with FastCGI
Perhaps there are others looking on who can help with that aspect
I am slightly concerned that you will have nginx calling fastcgi calling nginx calling zimbra - I have no idea if nginx might have some kind of self preservation system to avoid loops/lockups. I don’t know how long fastCGI will hang about waiting for responses, etc.
I suggest you separate the z-push out onto an apache server first to see if everything settles down, then if and when it does you can go back to trying to get this config to work in the understanding that the core zimbra functionality and core z-push functionality are working as expected.
Let’s not give up quite yet… This is an interesting problem, and we know a few things that might be helpful.
We know that my problem is only with Active Directory authenticated users; cjm works and mhr fails. I am not the only system in the world using Zimbra “External LDAP Authentication”, and I am not the only system in the world doing this with nginx, however, I might be the only system in the world using z-push and the Zimbra backend on nginx to do this. This might be a zimbra problem or it might be a z-push zimbra backend problem; it is too early to tell.
I’m willing to put in the work necessary to debug this. I’m going to need a little help.
I assume that provisioning an account is initiated with the hand-held. So, my guess at the flow, and please correct me, handheld requests “https://…/Microsoft-Server-ActiveSync” which calls, in my case, “/usr/share/z-push/index.php”. Z-push, running in php-fpm is still just php even though not in the context of nginx, will dispatch this to the zimbra backend, which is going to be in the same process. I assume zimbra backend does the authentication. I assume that this will be a soap request to an nginx served location. Please explain a bit about how that happens. I also assume that zimbra backend is stateless, meaning over the course of a fairly complex provisioning request, there will be many calls to authenticate.
At some point these requests to authenticate begin to fail. Interestingly, they fail consistently at the same procedural place regardless of efforts before and after the failure, meaning the failure seems to not be related to either the volume of e-mail or the number of folders.
This is where it gets confusing… Much of the provisioning is completed, meaning I can see mail folders, but not e-mail. Because I have a locally authenticated user, cjm, that works without any problem, and an Active Directory authenticated user that fails, it is tempting to conclude that something in the authentication path is failing. “z-push/zimbra backend” probably knows nothing about the various authentication mechanisms available to Zimbra. It is easy to conclude that Zimbra is doing something to try to authenticate that is failing, but since this seems to happen in the same procedural place, it is also possible that there is a bug in z-push/zimbra backend code that is corrupting authentication values, so unbeknownst to zimbra backend, he is trying to authenticate with garbage.
I just need a bit of a hint about “z-push/zimbra backend” processing, so I can pepper the approximate location of the action with a suitable set of "printf()"s and “var_dumps”. First, please confirm that no authentication is necessary to get as far as I have been getting, meaning creation of the email folder tree. If I am wrong and authentication is necessary, then please tell me the call that Zimbra backend is using, and I can break at that point to see what is happening, meaning why I am able to authenticate sufficiently to see folders, but not to see contents.
-
The zimbra backend is effectively equivalent to a web user connection to zimbra using the standard zimbra webmail client. The backend performs a curl request to the same HTTP(S) address as the webmail does when it issues requests.
There is nothing that can take place without authentication - so if any client has gotten as far as provisioning then they have been authenticated.
Have you captured a detailed debug log?
Use the $specialLogUsers array in the z-push config file to capture detailed logs for just that user.
If you turn the z-push logging up to WBXML level in the z-push config file for the specialLogUsers it will provide all of the z-push comings and goings.
In addition if you then turn ZIMBRA_DEBUG on (or set it to the particular username) in the zimbra config.php file then you will also get all of the zimbra SOAP requests and responses logged.
The first request issued in the zimbra backend is an AuthRequest. If that fails then it simply returns a failure to z-push and you are dead.
I am sure this is succeeding in every case. I am sure your issue is further down the line, and I suspect it is folder/message volume related somehow.
-
@liverpoolfcfan said in Z-push with Zimbra users of various configurations:
The zimbra backend is effectively equivalent to a web user connection to zimbra using the standard zimbra webmail client. The backend performs a curl request to the same HTTP(S) address as the webmail does when it issues requests.
Thanks. This is quite helpful. I have a much better understanding with just that simple explanation. In this case, I’m pretty sure I have encountered a Zimbra bug. We shall see…
There is nothing that can take place without authentication - so if any client has gotten as far as provisioning then they have been authenticated.
This seems to be the case. Then somewhere around 10:43, I see complaints in z-push.log:
z-push.log:30/08/2017 10:43:24 [24394] [ERROR] [hollie@tclc.org] [sec1efd804e75816] LoopDetection->ProcessLoopDetectionPreviousConnectionFailed(): Command 'Sync' at 30/08/2017 10:42:18 with pid '22964' terminated unexpectedly or is still running.
z-push.log:30/08/2017 10:43:24 [24394] [ERROR] [hollie@tclc.org] [sec1efd804e75816] Please check your logs for this PID and errors like PHP-Fatals or Apache segmentation faults and report your results to the Z-Push dev team.
Have you captured a detailed debug log?
Yes. I have z-push.log, z-push-error.log, logged at the WBXML level. I have some php logs, and some zimbra logs, as well. They are all trimmed to the correct time frame to reduce the noise, but not within that time frame to prevent obscuring notice of external influences. I also have operator log, which details my user commands with time stamps.
Use the $specialLogUsers array in the z-push config file to capture detailed logs for just that user. If you turn the z-push logging up to WBXML level in the z-push config file for the specialLogUsers it will provide all of the z-push comings and goings.
Done. hollie_tclc_org.log hollie_tclc_org.log.formatted has the JSON formated in a human readable form.
In addition if you then turn ZIMBRA_DEBUG on (or set it to the particular username) in the zimbra config.php file then you will also get all of the zimbra SOAP requests and responses logged.
This is new information for me, and I have not done this. I will repeat my experiment with this setting, but I suspect that it is now superfluous. It is clear to me that the complaints in z-push.log are because that process has become unresponsive. I looked and it was still running. I also see hints about mailbox locking, so I think that somehow, after successfully returning the folder structure and beginning to return the contents, the mailbox is locked and blocks subsequent accesses.
This deadlock will be on the “Authenticate with External LDAP” path in zimbra, because ALL such users fail, and my one Zimbra authenticate user succeeds.
I am sure this is succeeding in every case. I am sure your issue is further down the line, and I suspect it is folder/message volume related somehow.
I have been quite sure since we first started working on this that it was not capacity related simply because the user that works, cjm, has a much higher capacity requirement than the user that fails, mhr. In fact, I have also tried experimental users with trivial amounts of volume, which also fail. However, since you clearly know more about this than I do, I have not asserted my belief strongly. I think we will see that it is a Zimbra logic bug causing a deadlock.
I have a set of logs that constitute about four megs and might be interesting to you, if you would like them, and if you instruct me how, I will provide them to you. As the operator, I kept a timestamped log of my actions. It is included below:
10.1.1.100: law.tclc.org (mhr desktop) 10.1.1.103: code.tclc.org (cjm desktop) 10.1.1.105: Android Cell Phone (mhr handheld) 10.1.1.12: mail.tclc.org (Zimbra Server) 2017.08.30 10:31:01 cleared logs 2017.08.30 10:31:54 Powered down white tablet to reduce noise 2017.08.30 10:33:36 Powered up white cell phone 201708.30 10:34:14 invoke phone email application 2017.08.30 10:36:01 "Next" (email application on phone) 2017.08.30 10:36:12 Server could not finish -- Operator Error: incorrect domain name (mail.itclc.org) 2017.08.30 10:36:49 "Next" (email application on phone) 2017.08.30 10:36:55 "Continue" (email application on phone) 2017.08.30 10:37:46 authentication failed -- Operator Error: incorrect username (hollie@tclc.orh) 2017.08.30 10:38:04 "Next" (email application on phone) 2017.08.30 10:38:09 "Continue" (email application on phone) 2017.08.30 10:39:29 "Next" (email application on phone) (provisioning) 2017.08.30 10:39:51 Device administrator 2017.08.30 10:41:47 Policies deployed 2017.08.30 10:43:01 Request contents of "inbox" ... endless "Loading..." 2017.08.30 10:44:48 exit mail application 2017.08.30 10:47:06 last of z-push-top threads dissappears 2017.08.30 11:06:29 stop zimbra, php-fpm
-
If you click on the chats icon beside your profile avatar you should see a message from me.