z-push 2.4.5 sync issue - 0xFFFFFFFF8004010F



  • @weini
    I can second your opinion about the Debian Kopano packages. No idea how they made it into the stable distribution in that poor condition - they should be removed asap.

    MySQL: I already had problems with MariaDB and inboxes with only about 1K items.



  • @bcoms because now, they can use buster-backports.
    If kopano was not in buster, no backport could be made and push into buster.



  • I’m also on Buster with Debian Kopano 8.7.0-3 and switched from MariaDB 10.3 to MySQL 5.7. It works fine since last wednesday now. I will file a bug report soon at debians bug reporting tool.

    I also dont know how these packages got into Debian Stable. I think I hold 90% of the bugs for kopano at debian …
    Kopano-Search has lots of apparmor issues
    Kopano-Server is missing config files (ldap.cfg and admin.cfg) also user creation, deletion, updating is not working with current config on ldap

    I also dont want to blame the Debian developers. They are very helpful, but it is just annoying to run from one issue into another.


  • Kopano

    @weini said in z-push 2.4.5 sync issue - 0xFFFFFFFF8004010F:

    But I now have two db schemas available, one working and one triggering the issue. Pls let me know how I may further supprt you debugging the issue.

    Sure, its worth a try.



  • @fbartels I have a debug log of this problem (with the ICS and SQL backend logs enabled). Something struck me as odd while comparing the logs with the code.

    I do see a log line with “MatchRestrictions: matching 1000 rows”, but I don’t see the line with “MatchRestrictions: %zu match(es) out of %d rows (%d properties)” that is logged a little later in that function. So somewhere in between we hit an error condition.

    I’m currently rebuilding kopanocore with some extra debugging and might have some additional information soon.



  • @fbartels The function that fails is the call to gcache->GetObject() in ECGetContentChangesHelper::MatchRestrictions()

    This is with 8.7.0, btw.



  • @fbartels Some more info: I see calls to MatchRestrictions() that succeed. In these cases I get the “matching 43 rows” log, then the loop that iters over index_objs runs with the same number of iterations as the number of rows, then finally I get a log saying "18 match(es) out of 43 rows (1 properties).

    Every time Z-push logs the sync issue, the log shows “matching 1000 rows”, the loop that iters over index_objs doesn’t run any iterations at all, and when gcache->GetObject() is called, ulObjId is 0.

    This is about as far as I’ll get, since I’m not a C++ programmer. I hope this is useful to you.
    I’d be happy to test any patches; I can rebuild kopanocore myself.

    Best regards, Roel



  • @fbartels I tried to get a bit further. Running with the cache debug log provided some additional info. Whenever the amount of rows is less than 1000, they can be retrieved from the cache:

    MatchRestrictions: matching 940 rows
    cache: Get object ids from props total ids 940 from disk 940
    ...
    MatchRestrictions: 28 match(es) out of 940 rows (1 properties)
    

    Whenever the number of rows is 1000, they cannot:

    MatchRestrictions: matching 1000 rows
    cache: Get objects ids warning 1000 objects not found
    cache: Get object ids from props total ids 1000 from disk 1000
    cache: Get object id 0 error 0x80000002
    


  • @rolek said in z-push 2.4.5 sync issue - 0xFFFFFFFF8004010F:

    @fbartels The function that fails is the call to gcache->GetObject() in ECGetContentChangesHelper::MatchRestrictions()

    This is with 8.7.0, btw.

    8.7.5 is current. The cache fixes (so far) were done months ago in 8.7.1.



  • Rebuilding kopano-libs with 500 instead of 1000 items in provider/libserver/ECICS.cpp, function getchanges_contents() does seem to solve the problem. That got me thinking: maybe we are hitting the max_allowed_packet size? The default for MariaDB is 16MB, the default for MySQL 8 is 64MB.



  • I can confirm that the problem still exists (with mail, calendar AND contacts) with z-push (2.4.5 and 2.5.1) and kopano-core 8.7.83.106.59c4d1516. If I interpret the thread correctly, the general theory is that it’s a database issue.

    My personal setup:

    • Debian 10
    • Apache 2.4
    • php7.3-fpm
    • MariaDB 10.3
    • Official z-push 2.5.1
    • Official kopanocore 8.7.83.106.59c4d1516

    I have not found a solution using the current software. Existing syncs work fine, new ones look like they’re choking but don’t know why. Topic 1678 is about the same issue and suggests disabling the enhanced ics, but this does not fix the issue for me. This probably works if the issue is only on the calendar. (Tried it anyway)

    Replacing the database software is a BIG step that I’m not willing to take … yet. I did test a full mysqldump and restore of the database. Maybe migrating from mariadb to mysql solved the problem because of the dump/restore…but nope, that’s not it.

    I’ve tested the max_allowed_packet setting suggested by rolek and increased it to the mysql8 default of 64M but this also does not solve the issue:

    10/10/2019 12:06:01 [28264] [WARN] [user] StatusException: ExportChangesICS->InitializeExporter(): Error, mapi_exportchanges_config() failed: 0xFFFFFFFF8004010F - code: 12 - file: /usr/share/z-push/backend/kopano/exporter.php:230
    

    Maybe there’s more size settings that we need to tweak.



  • I can report that the patch from https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=939751;filename=fix-zpush-sync-error.patch;msg=15 fixes the problem for us. It would be nice if somebody else could confirm this.

    Regards, Roel



  • If you think you hit max_allowed_packet_size, you want to test that theory:

    1. there should be log messages by k-server (possibly with log_level=0x20006 see manpage for the value)
    2. lower max_allowed_packet_size and observe even more queries being rejected
    3. conversely, raise max_allowed_packet and observe z-push succeeding(?)

    Moreover, if you raise the ICS value (in that patch) from 1000 to 500, you should be observing a change in SQL query lengths - and compare with max_allowed_packet_size too.



  • Finally found a solution. After looking for hours at the increased log level suggested by jengelh and messing with a lot of settings without much luck (even having to restore the whole DB at some point). It looks like my error was related to a corruption in a mailbox. By lowering the SYNC_MAX_ITEMS, in the z-push config, all the way down to 5, z-push started syncing that account on the three new devices (already connected devices were working fine). Maybe running into a corruption stops syncing of the whole batch, and if that batch contains part of the hierarchy, it never manages to set it up.

    If that is the case, it can be “fixed” by adding this situation to the loopdetection. It would make z-push more robust when dealing with these errors in a backend.

    Anyway, to permanently fix this problem for me:

    • kopano-backup user
    • kopano-storeadm -D -n user
    • kopano-storeadm -C -n user
    • kopano-backup --restore user

    This did take the better part of a day.

    Makes me wonder though, what changed in the last half year or so that caused this corruption/errors, or made z-push/kopano more sensitive? Connecting new devices worked fine a few months ago.

    P.S. btrfs filesystem snapshots are a life saver. :-)



  • @jengelh Jan, I tried increasing the MariaDB value for max_allowed_packet to 64MB, but that didn’t fix the issue. (This was a few days ago). So I don’t think that’s the cause.



  • That sounds familiar. Synchronization streams will currently abort on error (e.g. attachment strangely missing from database).



  • I had the same error synchronizing a folder in my android phone that had many and messages with big attachments. I store all attachments in the database to make backup easier (just do a mysqldump).

    I use Debian 10 and increasing max_allowed_packet for mariadb didn’t work.

    I used @vincentvana suggestion (backup of user + deleting / recreating store + restore user) and it worked for me. At least for now and I didn’t dare to resync the big folder - I may try this on the weekend, when I have more time to fix problems.


Log in to reply