Very long timeout in ehcache when pushing synchronous update over RMI to server that is down

65 views Asked by Raj At 03 October 2023 at 13:07

We've got an ehcache cluster that distributes a cache to several (Wildfly) servers using manual replication. We replicate the cache synchronously for historical reasons. If one server in the pool is unavailable, eg if the box hangs, or the network goes down, then ehcache seems to have a very long timeout (from my timings with a stopwatch it's about 128 seconds). If the Wildfly server itself is down, then there is no delay in updating the replicated cache.

We've got the socketTimeoutMillis property set to 5000 across all the nodes in the cluster in ehcache.xml but it doesn't seem to help.:

    <cacheManagerPeerProviderFactory class=
                          "net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory"
                          properties="peerDiscovery=manual,
                          rmiUrls=server1:41234|server2:41234"
                          propertySeparator="," />

    <cacheManagerPeerListenerFactory
            class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
            properties="port=41234,socketTimeoutMillis=5000"/>

...

    <cache name="sso.userDetails"
           maxElementsInMemory="20000"
           eternal="true"
           timeToIdleSeconds="0"
           timeToLiveSeconds="0"
           overflowToDisk="false">
        <cacheEventListenerFactory
                class="MISCore.sso.ZSCMonitorCacheEventListenerFactory"
                properties="replicateAsynchronously=true,
                            replicatePuts=true,
                            replicateUpdates=true,
                            replicateUpdatesViaCopy=true,
                            replicateRemovals=true,
                            asynchronousReplicationIntervalMillis=1000"
                propertySeparator=","/>
        <cacheEventListenerFactory
                class="net.sf.ehcache.distribution.RMICacheReplicatorFactory"
                properties="replicateAsynchronously=false"/>
        <bootstrapCacheLoaderFactory
                class="net.sf.ehcache.distribution.RMIBootstrapCacheLoaderFactory"/>
    </cache>

The (partial) exception I see is:

2023-10-03 13:22:42,475 DEBUG [net.sf.ehcache.distribution.ManualRMICacheManagerPeerProvider] (default task-1) Looking up rmiUrl //server1:41234/sso.userDetails through exception Connection refused to host: server1; nested exception is: 
    java.net.ConnectException: Connection timed out (Connection timed out). This may be normal if a node has gone offline. Or it may indicate network connectivity difficulties: java.rmi.ConnectException: Connection refused to host: server1; nested exception is: 
    java.net.ConnectException: Connection timed out (Connection timed out)
    at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:623)
    at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
    at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
    at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:343)
    at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:116)
    at java.rmi.Naming.lookup(Naming.java:101)
    at net.sf.ehcache.distribution.RMICacheManagerPeerProvider.lookupRemoteCachePeer(RMICacheManagerPeerProvider.java:127)

Any suggestions are welcome.

Original Q&A

TechQA.

Very long timeout in ehcache when pushing synchronous update over RMI to server that is down

There are 0 answers

Related Questions in EHCACHE

Related Questions in EHCACHE-2

Popular Questions

Trending Questions