kopano server crash

Hi,

my kopano-server 8.6.81 crashes several times a week with the following last log entries:

Oct 10 10:20:14 charlieb kopano-server[6656]: setSyncStatus(): collision
Oct 10 10:20:58 charlieb kopano-server[6656]: message repeated 7 times: [ setSyncStatus(): collision]
Oct 10 10:21:53 charlieb kopano-server[6656]: Replacing notification request for ID 5597770132109941427
Oct 10 10:25:28 charlieb kopano-server[6656]: Replacing notification request for ID 5597770132109941427
Oct 10 10:26:35 charlieb kopano-server[6656]: setSyncStatus(): collision
Oct 10 10:28:10 charlieb kopano-server[6656]: LDAP search error: Timed out. Will unbind, reconnect and retry.
Oct 10 10:29:11 charlieb kopano-server[6656]: message repeated 2 times: [ LDAP search error: Timed out. Will unbind, reconnect and retry.]
Oct 10 10:30:40 charlieb kopano-server[6656]: Failed to read data: timeout (80000026)
Oct 10 10:31:21 charlieb kopano-server[6656]: Session failed to shut down: skipping clean

That is a little bit strange: all other services which use that ldap system didn’t have any problems, slapd itself seems to run correctly.

Any idea what’s the reason for this?

Hi @Steffen ,

never seen ldap request time out, if the ldap on the other side is still responding. Maybe when you enable the user backend logging, this could give further insights. Also your ldap.cfg would be interesting. Maybe you are using a dns name that not always resolves properly?

from man kopano-server

DIAGNOSTICS
       If you run into problems, check the log for any errors. If you made a mistake in the configuration of the log method, this will be reported
       on standard error. You can also restart the server with a higher log level. Also, before starting the server, always make sure the database
       server is running at the right location and no other server is listening on the configured TCP port.

       For extended diagnostics, there are special extended log options available for enhanced debugging capabilities. The parameter log_level has
       special or-ed values which can be set to investigate different modules within the server process:

       SQL: 0x00010000, User backend: 0x00020000, Server cache: 0x00040000, SOAP: 0x00100000, ICS: 0x00200000

       For example, if you are using LDAP as the user plugin, you can set the log_level to 0x00020006 for extended LDAP logging (the last digit 6
       enables extended verbose logging). To enable SQL and LDAP logging at the same time, set log_level to 0x00030006

       WARNING: The log options create huge amounts of log entries in production environments, this results in abnormal large logfiles which can
       fill up available disk space very fast. Only use this with extreme caution.

Hi Felix.

I have set the log_level and will post the error messages the next time that happens . For now my ldap.cfg:

!include /usr/share/kopano/ldap.openldap.cfg

####################
# Verbindungseinstellungen
ldap_uri = ldap://localhost
ldap_starttls = yes
ldap_bind_user = ...
ldap_bind_passwd = ...
ldap_search_base = ou=kopano,...


####################
# Überschriebene Werte aus ldap.openldap.cfg
ldap_company_search_filter = (objectClass=kopano-company)

ldap_user_type_attribute_value = kopano-user
ldap_user_search_filter = (|(kopanoAccount=1)(objectClass=kopano-contact))
ldap_user_unique_attribute = mail
ldap_fullname_attribute = displayName
ldap_loginname_attribute = mail

ldap_group_type_attribute_value = kopano-group
ldap_group_search_filter = (objectClass=kopano-group)
ldap_group_unique_attribute = entryUUID
ldap_groupmembers_attribute = member
ldap_groupmembers_attribute_type = dn
ldap_groupmembers_relation_attribute =

Something new which I found this afternoon after the last crash: in syslog there are some oom-killer messages.

This is a testing environment which is slowly filled. So the cache settings in the server.cfg are still on there default values. If I look in top I see that the kopano-server uses 12.7G of the system ram. But why?

Looking in “kopano-stats --system” shows the messages bellow. But why is tc_allocated so big? And why is pt_allocated a negative value?

	0x3001001E		0x6740001E									0x6741001E	
	--------------------------------------------------------------------------------------------------------------------------------------------------
	server_start_date	Time when the server was started						Thu Oct 11 20:03:28 2018	
	cache_purge_date	Time when the cache was cleared								
	config_reload_date	Time when the configuration file was reloaded / logrotation (SIGHUP)			
	connections		Number of handled incoming connections						554299	
	max_socket		Highest socket number used							1047	
	redirections		Number of redirected requests							0	
	soap_request		Number of soap requests handled by server					2780927	
	response_time		Response time of soap requests handled in milliseconds (includes time in queue)	5919572	
	processing_time		Time taken to process soap requests in milliseconds (wallclock time)		5880438	
	searchfld_loaded	Total number of searchfolders							9	
	searchfld_threads	Current number of running searchfolder threads					0	
	searchupd_retry		The number of times a search folder update was restarted			0	
	searchupd_fail		The number of failed search folder updates after retrying			0	
	sql_connect		Number of connections made to SQL server					553784	
	sql_select		Number of SQL Select commands executed						8362530	
	sql_insert		Number of SQL Insert commands executed						374	
	sql_update		Number of SQL Update commands executed						572	
	sql_delete		Number of SQL Delete commands executed						493	
	sql_connect_fail	Number of failed connections made to SQL server					0	
	sql_select_fail		Number of failed SQL Select commands						0	
	sql_insert_fail		Number of failed SQL Insert commands						0	
	sql_update_fail		Number of failed SQL Update commands						0	
	sql_delete_fail		Number of failed SQL Delete commands						0	
	sql_last_fail_time	Timestamp of last failed SQL command							
	mwops			MAPI Write Operations								10	
	mrops			MAPI Read Operations								1109624	
	deferred_fetches	Number rows retrieved via deferred write table					1127	
	deferred_merges		Number of merges applied to the deferred write table				0	
	deferred_records	Number records merged in the deferred write table				0	
	row_reads		Number of table rows read in row order						1127	
	counter_resyncs		Number of time a counter resync was required					0	
	login_password		Number of logins through password authentication				331	
	login_ssl		Number of logins through SSL certificate authentication				0	
	login_sso		Number of logins through Single Sign-on						0	
	login_unix		Number of logins through Unix socket						553970	
	login_failed		Number of failed logins								0	
	sessions_created	Number of created sessions							554301	
	sessions_deleted	Number of deleted sessions							554256	
	sessions_timeout	Number of timed-out sessions							26	
	sess_int_created	Number of created internal sessions						18	
	sess_int_deleted	Number of deleted internal sessions						18	
	sess_grp_created	Number of created sessiongroups							251	
	sess_grp_deleted	Number of deleted sessiongroups							245	
	ldap_connect		Number of connections made to LDAP server					339	
	ldap_reconnect		Number of re-connections made to LDAP server					0	
	ldap_connect_fail	Number of failed connections made to LDAP server				0	
	ldap_connect_time	Total duration of connections made to LDAP server				1957938	
	ldap_max_connect	Longest connection time made to LDAP server					24723	
	ldap_auth		Number of LDAP authentications							331	
	ldap_auth_fail		Number of failed authentications						0	
	ldap_auth_time		Total authentication time							2153338	
	ldap_max_auth		Longest duration of authentication made to LDAP server				11113	
	ldap_avg_auth		Average duration of authentication made to LDAP server				6521	
	ldap_search		Number of searches made to LDAP server						1602	
	ldap_search_fail	Number of failed searches made to LDAP server					0	
	ldap_search_time	Total duration of LDAP searches							492986	
	ldap_max_search		Longest duration of LDAP search							3126	
	index_search_errors	Number of failed indexer queries						0	
	index_search_max	Maximum duration of an indexed search query					0	
	index_search_avg	Average duration of an indexed search query					0	
	search_indexed		Number of indexed searches performed						142	
	search_database		Number of database searches performed						0	
	machine_id												7576ff4f2a334486a71d4a64dad38ec8	
	server_guid												8F9B04C5E38F4A30A7EB229E85146D88	
	cache_acl_items		Cache acl items									18	
	cache_acl_size		Cache acl size									1752	
	cache_acl_maxsz		Cache acl maximum size								1048576	
	cache_acl_req		Cache acl requests								116	
	cache_acl_hit		Cache acl hits									98	
	cache_quota_items	Cache quota items								2	
	cache_quota_size	Cache quota size								208	
	cache_quota_maxsz	Cache quota maximum size							1048576	
	cache_quota_req		Cache quota requests								351	
	cache_quota_hit		Cache quota hits								200	
	cache_uquota_items	Cache uquota items								1	
	cache_uquota_size	Cache uquota size								104	
	cache_uquota_maxsz	Cache uquota maximum size							1048576	
	cache_uquota_req	Cache uquota requests								277	
	cache_uquota_hit	Cache uquota hits								203	
	cache_extern_items	Cache extern items								6	
	cache_extern_size	Cache extern size								978	
	cache_extern_maxsz	Cache extern maximum size							26214400	
	cache_extern_req	Cache extern requests								5313	
	cache_extern_hit	Cache extern hits								4351	
	cache_userid_items	Cache userid items								14	
	cache_userid_size	Cache userid size								2128	
	cache_userid_maxsz	Cache userid maximum size							26214400	
	cache_userid_req	Cache userid requests								9757	
	cache_userid_hit	Cache userid hits								9648	
	cache_abinfo_items	Cache abinfo items								14	
	cache_abinfo_size	Cache abinfo size								2576	
	cache_abinfo_maxsz	Cache abinfo maximum size							26214400	
	cache_abinfo_req	Cache abinfo requests								6260	
	cache_abinfo_hit	Cache abinfo hits								6246	
	cache_server_items	Cache server items								0	
	cache_server_size	Cache server size								0	
	cache_server_maxsz	Cache server maximum size							1048576	
	cache_server_req	Cache server requests								0	
	cache_server_hit	Cache server hits								0	
	cache_store_items	Cache store items								10896	
	cache_store_size	Cache store size								1046016	
	cache_store_maxsz	Cache store maximum size							1048576	
	cache_store_req		Cache store requests								7231780	
	cache_store_hit		Cache store hits								6675979	
	cache_obj_items		Cache obj items									39920	
	cache_obj_size		Cache obj size									3512960	
	cache_obj_maxsz		Cache obj maximum size								16777216	
	cache_obj_req		Cache obj requests								7797502	
	cache_obj_hit		Cache obj hits									7757687	
	cache_cell_items	Cache cell items								84252	
	cache_cell_size		Cache cell size									881005712	
	cache_cell_maxsz	Cache cell maximum size								4186265600	
	cache_cell_req		Cache cell requests								35169527	
	cache_cell_hit		Cache cell hits									35083231	
	cache_index1_items	Cache index1 items								79652	
	cache_index1_size	Cache index1 size								10434830	
	cache_index1_maxsz	Cache index1 maximum size							33554432	
	cache_index1_req	Cache index1 requests								3409669	
	cache_index1_hit	Cache index1 hits								3290223	
	cache_index2_items	Cache index2 items								79652	
	cache_index2_size	Cache index2 size								9797614	
	cache_index2_maxsz	Cache index2 maximum size							33554432	
	cache_index2_req	Cache index2 requests								5635821	
	cache_index2_hit	Cache index2 hits								4957909	
	sessions		Number of sessions								19	
	sessions_size		Memory usage of sessions							1368	
	sessiongroups		Number of session groups							6	
	sessiongroups_size	Memory usage of session groups							4984	
	persist_conn		Persistent connections								0	
	persist_conn_size	Memory usage of persistent connections						0	
	persist_sess		Persistent sessions								0	
	persist_sess_size	Memory usage of persistent sessions						0	
	tables_subscr		Tables subscribed								1	
	tables_subscr_size	Memory usage of subscribed tables						72	
	object_subscr		Objects subscribed								1	
	object_subscr_size	Memory usage of subscribed objects						64	
	searchfld_stores	Number of stores in use by search folders					5	
	searchfld_folders	Number of folders in use by search folders					9	
	searchfld_events	Number of events waiting for searchfolder updates				0	
	searchfld_size		Memory usage of search folders							8135	
	queuelen		Current queue length								0	
	queueage		Age of the front queue item							0.000	
	threads			Number of threads running to process items					0	
	threads_idle		Number of idle threads								8	
	usercnt_active		Number of active users								4	
	usercnt_nonactive	Number of total non-active objects						0	
	usercnt_na_user		Number of non-active users							0	
	usercnt_room		Number of rooms									0	
	usercnt_equipment	Number of equipment								0	
	usercnt_contact		Number of contacts								7	
	tc_allocated		Current allocated memory by TCMalloc						12301745216	
	tc_reserved		Bytes of system memory reserved by TCMalloc					12710084608	
	tc_page_map_free	Number of bytes in free, mapped pages in page heap				0	
	tc_page_unmap_free	Number of bytes in free, unmapped pages in page heap (released to OS)		185679872	
	tc_threadcache_max	A limit to how much memory TCMalloc dedicates for small objects			33554432	
	tc_threadcache_cur	Current allocated memory in bytes for thread cache				24689088	
	pt_allocated		Current allocated memory by libc ptmalloc, in bytes				-583155312	
	userplugin		User plugin used								ldap	

@steffen said in kopano server crash:

oom-killer messages

This has previously been discussed in https://forum.kopano.io/topic/1705/kopano-server-crashes