XWiki hangs intermittently

XWiki 11.2, docker distribution. Not very high load (total ~500 users). Maximum dozen of read requests per minute.
The issue is that from time to time the application hangs completely.
All I see is java process eating all available CPU resources. Logs do not indicate any errors, long-running processes, etc…

Not sure how to troubleshoot this. Could someone please point me to the right direction ?
Attaching threads dump, created with ‘kill -3 1’
threads.dump.txt (1.6 MB)

Hi, this is related to the notifications feature apparently. We fixed several bugs related to notification performances in 11.3, see https://www.xwiki.org/xwiki/bin/view/ReleaseNotes/Data/XWiki/11.3/#HPerformances

I suggest upgrading to 11.3 if you can and retesting.

Thanks and sorry for the problems.

Hi Vincent!

Thanks! Sorry, not sure why I could not find that in the list of known issue.
Will try upgrading ASAP!

If you mean you couldn’t find it in the Release notes for 11.3 then it’s because I’ve just added it (we were missing it) :wink:

11.3 seems to have fixed the issue. Thanks!

Awesome, thanks for posting back!

Yesterday night it hung again… 200% CPU - any connection times out.
Nothing in logs… Threads dump attached…
Any advise is very much appreciated
xwiki.txt (394.5 KB)

Hi @unadequate. I checked quickly the threads and they don’t seem to be blocked by XWiki but by Tomcat. You can see a lot of threads blocked by:

"http-nio-8080-exec-152" #763564 daemon prio=5 os_prio=0 tid=0x00007fc2d44ac800 nid=0x42a7 waiting for monitor entry [0x00007fc22306f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.register(AbstractProtocol.java:1020)
	- waiting to lock <0x00000006c0996be8> (a org.apache.coyote.AbstractProtocol$ConnectionHandler)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:797)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	- locked <0x000000078b1ce2e8> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

This could suggest some problem with your Tomcat setup or some bugs with the version of Tomcat you’re using.

@vmassol thank you for looking into it.
But I use official XWIki docker image, tomcat is part of it…

Sure but but we don’t code Tomcat (nor do we code Docker FWIW) :slight_smile: What I meant to say is that it seems the problem is coming from Tomcat and not from XWiki, at least I don’t see any hint about XWiki having a problem in your logs. There can be thousands of reasons why you’re having problems with Tomcat, including the version of Tomcat you use, the resources you give to it, the OS it runs on, how you configured it, etc. You could ask on the Tomcat forums to help you out too since they should know better than us here who are more XWiki experts than Tomcat experts.

Right now, I don’t have any clue for you about the problem except that a lot of threads are blocked by this lock.

Maybe someone else here will have a clue. Sorry for not helping more.

One idea would be to set up XWiki without the Docker image to rule out Docker too. Another would be to use Jetty to see if the problem is with Tomcat or not.

Maybe also try to give more memory to Tomcat.

Yes, I clearly understand that you are not responsible neither for Tomcat, nor for Docker.
But don’t think the issue is with Docker… Will research possible tomcat issues…

Appreciate your help as always =)

There are no “out of memory” errors logged whatsoever…
I’ve modified java opts to

XMX="-Xmx4096m -XX:PermSize=64m -XX:MaxPermSize=256m"

will see if that helps…

Yes I know, I have no clue about your problem :wink: Completely fishing here…

So, it hung multiple times during last 2 days. I wrote a script which is checking if xwiki is alive and if it is not - executes ‘kill -3’.
Looking into the most recent threads dump, I see that lots of threads are BLOCKED
- waiting to lock <0x00000006c1e88408> (a org.xwiki.classloader.NamespaceURLClassLoader)

And lock 0x00000006c1e88408 is held by the following thread:

"https://xwiki.domain.local/bin/get/XWiki/SuggestSolrService?query=fq%3Dtype%3ADOCUMENT%0Afq%3Dclass%3AXWiki.XWikiUsers%0Aqf%3Dproperty.XWiki.XWikiUsers.last_name%5E10%20property.XWiki.XWikiUsers.first_name%5E5%20name%5E2.5%20object.XWiki.XWikiUsers&nb=3&outputSyntax=plain&input=map%20job" #151066 daemon prio=5 os_prio=0 tid=0x00007f696c19f000 nid=0x5313 waiting for monitor entry [0x00007f6892344000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
	- locked <0x00000006c1e88408> (a org.xwiki.classloader.NamespaceURLClassLoader)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.xwiki.classloader.xwiki.internal.ContextNamespaceURLClassLoader.loadClass(ContextNamespaceURLClassLoader.java:176)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.commons.lang3.ClassUtils.getClass(ClassUtils.java:993)
	at org.apache.commons.lang3.ClassUtils.getClass(ClassUtils.java:1059)
	at org.apache.commons.lang3.ClassUtils.getClass(ClassUtils.java:1042)
	at org.apache.commons.configuration2.beanutils.BeanHelper.loadClass(BeanHelper.java:509)
	at org.apache.commons.configuration2.beanutils.BeanHelper.fetchBeanClass(BeanHelper.java:547)
	at org.apache.commons.configuration2.beanutils.BeanHelper.createBeanCreationContext(BeanHelper.java:612)
	at org.apache.commons.configuration2.beanutils.BeanHelper.createBean(BeanHelper.java:458)
	at org.apache.commons.configuration2.beanutils.BeanHelper.createBean(BeanHelper.java:481)
	at org.apache.commons.configuration2.beanutils.BeanHelper.createBean(BeanHelper.java:494)
	at org.apache.commons.configuration2.builder.BasicConfigurationBuilder.createResultInstance(BasicConfigurationBuilder.java:447)
	at org.apache.commons.configuration2.builder.BasicConfigurationBuilder.createResult(BasicConfigurationBuilder.java:417)
	at org.apache.commons.configuration2.builder.BasicConfigurationBuilder.getConfiguration(BasicConfigurationBuilder.java:285)
	- locked <0x00000007a38ee8a0> (a org.apache.commons.configuration2.builder.FileBasedConfigurationBuilder)
	at org.apache.commons.configuration2.builder.fluent.Configurations.properties(Configurations.java:300)
	at com.xpn.xwiki.internal.skin.EnvironmentSkin.getProperties(EnvironmentSkin.java:98)
	at com.xpn.xwiki.internal.skin.EnvironmentSkin.createParent(EnvironmentSkin.java:76)
	at com.xpn.xwiki.internal.skin.AbstractSkin.getParent(AbstractSkin.java:101)
	at com.xpn.xwiki.internal.skin.AbstractSkin.getParent(AbstractSkin.java:37)
	at com.xpn.xwiki.internal.skin.AbstractSkin.getResource(AbstractSkin.java:121)
	at com.xpn.xwiki.internal.template.InternalTemplateManager.getTemplate(InternalTemplateManager.java:939)
	at com.xpn.xwiki.internal.template.InternalTemplateManager.getTemplate(InternalTemplateManager.java:954)
	at com.xpn.xwiki.internal.template.InternalTemplateManager.renderFromSkin(InternalTemplateManager.java:720)
	at com.xpn.xwiki.internal.template.InternalTemplateManager.renderFromSkin(InternalTemplateManager.java:703)
	at com.xpn.xwiki.internal.template.InternalTemplateManager.render(InternalTemplateManager.java:689)
	at com.xpn.xwiki.internal.template.DefaultTemplateManager.render(DefaultTemplateManager.java:78)
	at com.xpn.xwiki.XWiki.evaluateTemplate(XWiki.java:2354)
	at com.xpn.xwiki.web.Utils.parseTemplate(Utils.java:179)
	at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:513)
	at com.xpn.xwiki.web.XWikiAction.execute(XWikiAction.java:215)
	at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:425)
	at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:228)
	at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1913)
	at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:449)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at com.xpn.xwiki.web.ActionFilter.doFilter(ActionFilter.java:112)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.xwiki.wysiwyg.filter.ConversionFilter.doFilter(ConversionFilter.java:109)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.xwiki.container.servlet.filters.internal.SetHTTPHeaderFilter.doFilter(SetHTTPHeaderFilter.java:63)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.xwiki.container.servlet.filters.internal.SavedRequestRestorerFilter.doFilter(SavedRequestRestorerFilter.java:208)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.xwiki.container.servlet.filters.internal.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.xwiki.resource.servlet.RoutingFilter.doFilter(RoutingFilter.java:132)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:660)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
	at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:679)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:798)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:808)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	- locked <0x00000007a356ff20> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

Could you please advice, how to properly interpret that ? does that mean that ClassLoader stucks for some reason ?
Sorry I’m not that good in java, my specialization is PHP =)

Usually a thread dump like this is the consequence of the whole JVM being super slow for some reason but hard to tell why. Could be the server itself being slow because of something else, just not powerful enough for the load, too many temporary memory created and the garbage collector having a hard time keeping track and blocking everything. You might get a bit more information using a tool like glowroot (https://glowroot.org/).

So I’m still struggling with XWiki hanging over and over again…
Situation became even worse - it usually hangs 3-4 times a day.
One time it was Java Out Of Memory Error (I had max heap size set to 2G).
It’s hard to imaging why so small installation like mine could require so many RAM.
But okay I’ve increased max heap size to 3Gigs… Last thread dumps show that there were hundreds if not thousands of threads like this:

"Thread-52233" #52297 daemon prio=5 os_prio=0 tid=0x00007f97f8188800 nid=0x4d9a runnable [0x00007f97543b4000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.net.SocketInputStream.read(SocketInputStream.java:224)
	at com.novell.ldap.asn1.ASN1Identifier.<init>(ASN1Identifier.java:129)
	at com.novell.ldap.Connection$ReaderThread.run(Connection.java:1269)
	at java.lang.Thread.run(Thread.java:748)

Seems to be LDAP authentication module?
Any clue why did it spawn to many threads. I guess the app hangs due to JVM eat all available RAM.

You should enable automatic memory dump in case of OOM, see https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts001.html ( HeapDumpOnOutOfMemoryError). It’s generally easy (at least for us :slight_smile:) to see what cause this from the memory dump.

Maybe https://jira.xwiki.org/browse/LDAP-6.

Hi Thomas,

Thanks for looking into this.
I feel that OOM might be a consequence of LDAP issue…
Unfortunately looks like no workaround atm…