Syncronization repeatedly breaks on one device

Hello,

we are running kopano-core 8.3.4 on CentOS 7 7.4.1708 with seperate z-push-server running z-push 2.3.9 with apache 2.4 (scl) and php5.6 (scl) on CentOS 7 7.4.1708.
We have successfully connected 11 Blackberry DTEK50 without any major issues over Blackberry UEM-Server, except one device where synchronization breaks every two weeks or so (with z-push 2.3.7). Since the update to z-push 2.3.9 (or for some other reason) the syncronization on that device breaks every day with endless FolderSync. All folders on the device are empty, but it still can send emails.
The log is full of entries like:

16/02/2018 01:11:18 [21350] [ INFO] [dmueller] [241516c6533a44219968086966457ccb] cmd='FolderSync' memory='2.90 MiB/3.25 MiB' time='0.83s' devType='BlackBerry' devId='241516c6533a44219968086966457ccb' getUser='dmueller' from='10.27.10.63' idle='0s' version='2.3.9+0' method='POST' httpcode='200'
16/02/2018 01:11:30 [21336] [ INFO] [dmueller] [241516c6533a44219968086966457ccb] cmd='FolderSync' memory='2.90 MiB/3.25 MiB' time='0.78s' devType='BlackBerry' devId='241516c6533a44219968086966457ccb' getUser='dmueller' from='10.27.10.63' idle='0s' version='2.3.9+0' method='POST' httpcode='200'

After doing a hierarchy-sync the device works for some hours.
In the error-log I have tons of loopdetections from that device and at the time of lastsync (as of z-push-admin) such error-messages:

15/02/2018 23:26:45 [17654] [WARN] [dmueller] [241516c6533a44219968086966457ccb] Mobile loop detected! Messages sent to the mobile will be restricted to 2 items in order to identify the conflict
15/02/2018 23:26:45 [17654] [WARN] [dmueller] [241516c6533a44219968086966457ccb] /usr/share/z-push/backend/kopano/mapiprovider.php:261 mapi_zarafa_getuser_by_name(): Unable to resolve the user: 8004010F (2)
15/02/2018 23:26:45 [17654] [ERROR] [dmueller] [241516c6533a44219968086966457ccb] Ignored broken message (SyncAppointment). Reason: '2' Folderid: 'U2e5d5' message id 'U2e5d5:26f48cc17b2d4ffd84a2d00ac82b836f187b3b000000'
15/02/2018 23:26:45 [17654] [WARN] [dmueller] [241516c6533a44219968086966457ccb] /usr/share/z-push/backend/kopano/mapiprovider.php:261 mapi_zarafa_getuser_by_name(): Unable to resolve the user: 8004010F (2)
15/02/2018 23:26:45 [17654] [WARN] [dmueller] [241516c6533a44219968086966457ccb] Mobile loop detected! Messages sent to the mobile will be restricted to 2 items in order to identify the conflict
15/02/2018 23:26:45 [17654] [ERROR] [dmueller] [241516c6533a44219968086966457ccb] Ignored broken message (SyncMail). Reason: '2' Folderid: 'U7631e' message id 'U7631e:26f48cc17b2d4ffd84a2d00ac82b836f0f6f3d000000'
15/02/2018 23:26:48 [17654] [WARN] [dmueller] [241516c6533a44219968086966457ccb] Mobile loop detected! Messages sent to the mobile will be restricted to 1 items in order to identify the conflict

Following 10 minutes with loopdetections and then:

15/02/2018 23:36:42 [ 6218] [WARN] [dmueller] [241516c6533a44219968086966457ccb] Mobile loop detected! Messages sent to the mobile will be restricted to 2 items in order to identify the conflict
15/02/2018 23:36:42 [ 6218] [ERROR] [dmueller] [241516c6533a44219968086966457ccb] Ignored broken message (SyncAppointment). Reason: '2' Folderid: 'U2e5d5' message id 'U2e5d5:26f48cc17b2d4ffd84a2d00ac82b836fe1cf39000000'

After that silence in the error-log but still the FolderSync-Messages in the logfile.
What might be the problem with that device? I can provide wbxml-logs.

Best regards,
Achim

Hello @afischer,

if this account with “Folderid: ‘U2e5d5’ message id ‘U2e5d5:26f48cc17b2d4ffd84a2d00ac82b836fe1cf39000000’” can sync to other devices via z-push, i bet this ise some kind of weird timeout issue.

In your z-push.conf there should be a line:

 define('SYNC_TIMEOUT_MEDIUM_DEVICETYPES', "SAMSUNGGTI");
[...]

i read a post in one BB forum with syncing via active-sync, where the bb timeouts, the active-sync server answers to a client side closed session and is rejected. - kind of overlapping.

since your log knows the deviceType (BlackBerry) try to add the device to the long or medium timeout group,
could fix it (maybe) - got no testing reference here

coffee_is_life

Hi Achim,

have you run kopano-fsck for that user? Could you also post WBXML log?

Manfred

Hello Manfred,

yes I ran kopano-fsck.
What logs do you exactly need, it is very big. Where ca I upload it, I don’t want to share it publicly.

Best regards,
Achim

@coffee_is_life said in Syncronization repeatedly breaks on one device:

Hello @afischer,

if this account with “Folderid: ‘U2e5d5’ message id ‘U2e5d5:26f48cc17b2d4ffd84a2d00ac82b836fe1cf39000000’” can sync to other devices via z-push, i bet this ise some kind of weird timeout issue.

In your z-push.conf there should be a line:

 define('SYNC_TIMEOUT_MEDIUM_DEVICETYPES', "SAMSUNGGTI");
[...]

i read a post in one BB forum with syncing via active-sync, where the bb timeouts, the active-sync server answers to a client side closed session and is rejected. - kind of overlapping.

since your log knows the deviceType (BlackBerry) try to add the device to the long or medium timeout group,
could fix it (maybe) - got no testing reference here

coffee_is_life

Hello Coffee_is_life,

thank you for your hint. My understanding of these parameters is different. I thought it is worse to have a device with a too long timeout than have one with a too short timeout.
What will happen if the medium timeout is too long for the devices. Will it also break the synchronization (I have 10 working devices at the moment).

Best regards,
Achim

like i said, got no testing device here, so i cant answer that. - But yes, technically its worse having a longer timeout, because the server wont stop gathering data at 30 sec and send it, opens a new push and starts gathering again.
i cant tell how BB is handling timeouts etc. - just remembered a post in some BB-forum for active-syncing and the fix was setting a longer timeout on server. (i will try to get this post again)

if you are running z-push on a vm, you could easily clone this one and create a small test-env with this device
(furthermore its a good idea for identifying update-issues before them applying on productive-env.)

For further investigations without guessing, the WBXML-log is needed, just like @Manfred asked.

btw what shows “z-push-admin -a list -u <user> -d <deviceid>”?

coffee_is_life

Hi Achim,

@afischer said in Syncronization repeatedly breaks on one device:

What logs do you exactly need, it is very big. Where ca I upload it, I don’t want to share it publicly.

well, ideally after the last successful sync and later. You can upload them somewhere you have access and send me the link. Or open an issue with Kopano and add them as attachment.

Has the device in question the same firmware version as the others? Is the store of the user bigger than the users? Is there a lot of switching between wifi and 3g for that device?

Manfred

The timeout is not the timeout on the server, but related to the timeout of the device. You should definitely not set a higher timeout for devices with short timeouts. The devices will never see an answer from the server.

I think the timeout is not an issue here. The FolderSync requests take 0.8 seconds, this is totally fine.
There are also lines indicating that some data is sent to the device, but runs into loop detection (which by default is not a bad thing).

More about loop detection here: https://wiki.z-hub.io/display/ZP/Loop+detection
Only a WBXML log will tell us more, see how to generate one here: https://wiki.z-hub.io/display/ZP/Debugging

Cheers,
Sebastian

Hello Manfred, hello Sebastian,

I sent you a link to the wbxml-log.

Best regards,
Achim

Hi Achim,

I wasn’t able to find the issue from the logs, but I do have some questions.

The device stopped issuing Sync requests shortly after the midnight of the 16th of February. Until around 8:30 it was only sending Options and FolderSync requests until 08:30:06 when a hierarchy state was not found at a new full resync was issued.
Did you perform z-push-admin resync or removed states from the database at that point? Or the database wasn’t available shortly?

Have you removed the account from the device, removed the states and created the account again after the update to Z-Push 2.3.9+0?

Is there some kind of do not disturb set on the device, so that it won’t sync between 00:00-08:00?

Your KC version is 8.3.4, but on Z-Push server there’s “BackendKopano using PHP-MAPI version: 7.2.4-29”. Is there a reason not to use the same PHP-MAPI version as with the kopano server?

Manfred

Hello Manfred,

I did a z-push-admin -a resync -t hierarchy -u user -d device to make the synchronization work again at that point.
No, I did not remove the account nor remove the states after the upgrade.
There is no specific reason for the different PHP-MAPI versions. I had to do some maintenance work on the z-push server and did am ‘yum update’.
I will ask the user if there is a do not disturb function. But I don’t think so, because on other days it breaks at different times - somedays twice a day.

Best regards,
Achim

Hi Achim,

could you ask the user to remove the account from the device, then remove the device with z-push-admin and then add the account on the device again.

Manfred

@manfred said in Syncronization repeatedly breaks on one device:

Your KC version is 8.3.4, but on Z-Push server there’s “BackendKopano using PHP-MAPI version: 7.2.4-29”. Is there a reason not to use the same PHP-MAPI version as with the kopano server?

Ah, now I got you!
I think it’s because of using php5.6 (scl). I had some problems with dependencies and had contact with Sebastian. I think this might be fixed already. I will try to add the kopano-repo to the server and try to update.

Best regards,
Achim

@manfred said in Syncronization repeatedly breaks on one device:

Hi Achim,

could you ask the user to remove the account from the device, then remove the device with z-push-admin and then add the account on the device again.

Manfred

Yes, I will do so tomorrow and gather fresh logs!