ZP-1157 / similar errors with Caldav/Carddav/IMAP and Windows 10 mail/people/calendar



  • Hi all,

    I’ve been using z-push 2.3.x happily with android and ios devices, but I’ve hit a problem adding a Windows 10 device.

    The common message appearing the logs is:

    18/06/2017 13:24:03 [12153] [FATAL] [myuser] Exception: (StateInvalidException) - Unexpected synckey value oldcounter: ‘1’ synckey: ‘{12345678-1234-1234-1234-123456789012}1’ internal key: ‘{12345678-1234-1234-1234-123456789012}’

    I’ve tried wiping the entire state and starting with a fresh user (empty calendar/contacts/tasks) but the problem seems to take hold immediate.

    I can see ZP returning a 500 error to the device, and it reports a synchronisation problem.

    ZP-1157 is the nearest I could find to this particular error, but it looks like it is a correction for a sync interrupted due to an error (and more to do with the recover?)

    I’m using:

    • SabreDav 3.2.2
    • Dovecot IMAP
    • Z-Push 2.3.7beta1

    I’m using the combined backend:

    • email -> Imap
    • Tasks, Calendars -> CalDav
    • Contacts -> CardDav

    This doesn’t seem to upset android/ios, but it is stopping Windows 10 / mail from syncing. So far I can see emails, contacts, but calendar events won’t sync from the device back to the server

    I’m still doing some more digging…



  • I’ve not made much progress yet. I think I’ve ruled out the plain streams fix (ZP-1180), but I’m not entirely sure how ZP-1157 resolves the problem I’m seeing.

    It seems when an unexpected synckey is received, ZP will fail with a 500… I think that means the state of the client doesn’t change, so it’s doomed to loop.

    I’m trying to understand how the client and the server got out of sync.

    I’ve capture the logs for a single Windows 10 client with a test user, with a few test mails messages, calendar events and contacts.

    https://maildebug.intrepid.co.uk/capture.log


  • Kopano

    Hi cpitchford,

    The problem might be that the caldav backend doesn’t set the main calendar, only user calendar. If you look at the FolderSync in your log at 18/06/2017 23:56:44 you’ll see that c/Ctesting is of type 13 (user calendar type).

    There is CALDAV_PERSONAL constant in the config.php of caldav. Setting it to ‘testing’ is worth trying. Or you set the calendar’s name to ‘PRINCIPAL’ which is the default value for CALDAV_PERSONAL.

    Btw, I suppose the tasks are also not working as it is the same issue.

    Manfred



  • Ah that’s interesting!!

    I think I’ve spotted the problem, will test and report back… but:

    My log file was actually filtered (sensitive terms and names changed to protect the innocent!)

    Lets pretend my calendar is actually called “Testing” not “testing”

    So, I’d already defined CALDAV_PERSONAL in my config:

    define('CALDAV_PERSONAL', 'Testing');
    

    However, there are two (identical) if statements that check the calendar:

    if (defined('CALDAV_PERSONAL') && strtolower(substr($id, 1)) == CALDAV_PERSONAL) {
    

    The if statement won’t match as its only converting the caldav calendar name, not the defined constant

    So, I propose (as a patch) changing the if statements in caldav.php into:

    if (defined('CALDAV_PERSONAL') && strcasecmp(substr($id, 1), CALDAV_PERSONAL) == 0) {
    

    So that the names are compared case-insensatively

    (the alternative is, perhaps, a comment in the caldav config file explaining CALDAV_PERSONAL must be entirely lower case.



  • I’ve removed the state in /var/lib/z-push, blanked the device and retried:

    I can see from the debug log that the folder types are now correct (so that patch might be good)

    O    <FolderHierarchy:Add>
    O     <FolderHierarchy:ServerEntryId>
    O     c/CTesting
    O     </FolderHierarchy:ServerEntryId>
    O     <FolderHierarchy:ParentId>
    O     0
    O     </FolderHierarchy:ParentId>
    O     <FolderHierarchy:DisplayName>
    O     test calendar
    O     </FolderHierarchy:DisplayName>
    O     <FolderHierarchy:Type>
    O     8
    O     </FolderHierarchy:Type>
    O    </FolderHierarchy:Add>
    O    <FolderHierarchy:Add>
    O     <FolderHierarchy:ServerEntryId>
    O     c/TTesting
    O     </FolderHierarchy:ServerEntryId>
    O     <FolderHierarchy:ParentId>
    O     0
    O     </FolderHierarchy:ParentId>
    O     <FolderHierarchy:DisplayName>
    O     test calendar
    O     </FolderHierarchy:DisplayName>
    O     <FolderHierarchy:Type>
    O     7
    O     </FolderHierarchy:Type>
    O    </FolderHierarchy:Add>
    

    However, I’m still stuck with the unexpected sync key:

    [ INFO] StateInvalidException: Unexpected synckey value oldcounter: '1' synckey: '{819bb07e-9854-47d5-9462-110b4e9b4a35}1' internal key: '{819bb07e-9854-47d5-9462-110b4e9b4a35}' - code: 0 - file: /usr/share/z-push/lib/core/statemanager.php:236
    [FATAL]  WBXML 10K debug data: AwFqAEVpAzkwMAABVQMyMDAAASYB
    [ INFO] User-agent: 'unknown'
    [FATAL] [test@testing.com] [3d8750c317150576af476d9dbd631ee1] Exception: (StateInvalidException) - Unexpected synckey value oldcounter: '1' synckey: '{819bb07e-9854-47d5-9462-110b4e9b4a35}1' internal key: '{819bb07e-9854-47d5-9462-110b4e9b4a35}'
    

    which leaves Z-Push returning a 500 error to the device

    Full log from first login to fatal error:

    https://maildebug.intrepid.co.uk/capture2.log

    I’m going to see if I can find out why the user-agent is reported as unknown too…



  • I’ve had a look at this commit:

    https://stash.z-hub.io/projects/ZP/repos/z-push/commits/14c1b661aff7cbeabe0d885db186c7e2b76e0ccd

    If I back it out, my test user is able to sync, though my regular user still hits the same unexpected synckey value.

    It definitely changes, though when this commit is removed. I’m wondering if it introduced another problem, or it’s not entirely fixed the bug it was addressing?

    I’ll play more tomorrow!


  • Kopano

    I looked at your log but I can’t figure the unexpected synckey value from it. I tried WP but I can’t make it happen.
    The only think that seems sure is that

     self::$deviceManager->GetStateManager()->SetSyncState($spa->GetNewSyncKey(), $state, $spa->GetFolderId());
    

    is being called in lib/request/sync.php line 1248 (in develop).
    This is a bit odd because $spa->HasNewSyncKey() from a few lines up shouldn’t really be true at this point.
    It could be that this is related to SyncParameters->uuidNewCounter which is checked there.
    You could debug this a little bit by added a few more debug outputs around these lines and printing values.
    It could also be that there is another UID in the game which was checked before, because in your log in the next request another state is missing:

    SyncParameters->GetLatestSyncKey(): '{54324d90-32e6-4816-ae74-efed41ecb7ba}1'
    StateNotFoundException: FileStateMachine->GetState(): Could not locate state '/var/lib/z-push/1/e/3d8750c317150576af476d9dbd631ee1-54324d90-32e6-4816-ae74-efed41ecb7ba-1'
    

    It should read counter 2 here, because 1 was already deleted. This would indicate that the -fd file is not correctly updated reflecting to an old state.


  • Kopano

    Don’t worry too much about the user agent. WP doesn’t send it in all requests and there we just print the header (which is not available).
    It’s because that’s an unexpected exception that is not being catched.


  • Kopano

    Hi cpitchford,

    @cpitchford said in ZP-1157 / similar errors with Caldav/Carddav/IMAP and Windows 10 mail/people/calendar:

    Ah that’s interesting!!

    I think I’ve spotted the problem, will test and report back… but:

    My log file was actually filtered (sensitive terms and names changed to protect the innocent!)

    Lets pretend my calendar is actually called “Testing” not “testing”

    So, I’d already defined CALDAV_PERSONAL in my config:

    define('CALDAV_PERSONAL', 'Testing');
    

    However, there are two (identical) if statements that check the calendar:

    if (defined('CALDAV_PERSONAL') && strtolower(substr($id, 1)) == CALDAV_PERSONAL) {
    

    The if statement won’t match as its only converting the caldav calendar name, not the defined constant

    So, I propose (as a patch) changing the if statements in caldav.php into:

    if (defined('CALDAV_PERSONAL') && strcasecmp(substr($id, 1), CALDAV_PERSONAL) == 0) {
    

    So that the names are compared case-insensatively

    (the alternative is, perhaps, a comment in the caldav config file explaining CALDAV_PERSONAL must be entirely lower case.

    The caldav and cardav backends were contributed by the community, so feel free to open a JIRA ticket, commit the change and send us a pull request.
    We have a wiki page on how to contribute to Z-Push: https://wiki.z-hub.io/display/ZP/Development+guidelines

    Manfred



  • @Manfred

    Hi Manfred,

    I’m hoping I’ve done things correctly and created ZP-1244 for the case-matching problem. It’s looking good on my set up so fingers crossed!

    I’m going to crack on with inserting more debug lines into the code to see if I can watch what’s happening to this missing states


  • Kopano

    Patch looks good to me. I’ve set the target version to 2.3.8.


  • Kopano

    Your patch is merged into develop and is also in 2.3.8beta0+22 (just published).

    Thanks!



  • I’ve narrowed down the original problem (now running 2.3.8beta0)

    I’ve set up a test account:

    zero emails
    zero calendar entries
    zero tasks
    one contact

    When I set up the account in MS Mail, it will sync, once. If I press “refresh” it faults (and Z-Push returns 500)

    I can’t figure out why [StateManager]->newStateCounter is either unset or “”

    It should be set to either 1 or 1+the current counter if [StateManager]->GetNewSyncKey is called…

    But for this StateManager, it appears GetNewSyncKey was not called.

    So if GetNewSyncKey was not called, why would it need to call [StateManager]->SetSyncState??

    I’ve an updated log of configuring the client, syncing once, retrying and hitting the error

    https://maildebug.intrepid.co.uk/capture3.log

    I’m wondering if it is related to loop detection? Or possibly a php issue.


  • Kopano

    Hey Chris,

    thanks for debugging this, your log gave some good indications what was going on. Disabling the FolderStat support I could also reproduce this with the Kopano backend.

    I’ve created a ticket and was able to fix the issue. The states were being confirmed which leads at a certain point to the deletion of the old state, but the folderdata (SyncParameters) are not saved/updated correctly.

    https://jira.z-hub.io/browse/ZP-1247

    The fix is already merged into develop. Could you test that?

    Cheers,
    Sebastian


  • Kopano

    There is also ZP-1248 which adds the windows mail app to the “long timeout device list”. It’s also in develop and will require to update your main config.



  • It’s looking great!!

    There were no problems syncing my test account (with nothing but one contact) and it’s now synced my regular personal email!

    Would you like me to capture the logs as before, for comparison?

    I’m going to go study the diff to see what you’ve changed, I’m keen to know more about how this works unde the hood!


  • Kopano

    No, no need for other logs at this point.
    Let me know if you need anything else.

    Cheers,
    Sebastian


Log in to reply
 

Looks like your connection to Kopano Community Forum was lost, please wait while we try to reconnect.