Load balancing servers getting out of sync (v. 10.10.0.0)

Vote:
 

Hi everyone 

We are having some issues with our load balancing environment, where our two EPiServers (v. 10.10.0.0) doesn't always seem to agree on what to display.

/EPiServer can be accessed on both servers, and it is therefore random on which server our content editors make their changes. For 99.99% of the changes this goes well. I am assuming that the provider-based event management system in EPiServer makes sure the two servers keep each other informed about the "truth".

Sometimes however, we have problems such as:

  • an image seem to exist on one server and gives a 404 error on the other (we use a file share for blobs, so there is only one physical location of the image)
  • a page results in a 404 error on one server but not on the other server
  • a text is different on one server
  • jobs that create content in EPiServer based on an external source sometimes create the content twice (jobs were running on a random server, but has now been changed to always run on server 1 - this seems to have fixed this paticular issue)

Stuff like this happens every other day, and we have been upable to fix it.

Also, it is extremely hard to debug. We would like to take steps to force all access to /EPiServer to only take place on server 1, but our DevOps reports that in our network setup, this is not as easy as it sounds. Also, we are not sure it would solve anything.

Is there steps we can take to make sure our servers are always in sync? 

Can we force a sync in another way that recycling the server that seems to disagree with the truth?

Any feedback would be greatly appreciated.

Kind regards
Dennis

#180673
Edited, Jul 20, 2017 14:17
Vote:
 

Have you tried using the DeveloperTools, it contains some tools for remote events so you can check if there is any miscofnfiguration. I haven't used a setup with a loadbalancer in a long time, but misconfiguration was usually the problem with issues like this. There's also a NuGet package.

#180674
Jul 20, 2017 15:22
Vote:
 

Thank you @Jeroen, I will take a look at the tool.

This could very well be misconfiguration, but since it is working 99.99% of the times we make changes, I don't think we completely missed something in the configuration. The servers are communicationg, just once in a while it doesn't seem to work.

#180675
Jul 20, 2017 15:32
Vote:
 

Have you checked the clock on all servers is synced?

I used to have a simple "secret" page deployed on all servers that provides the facility to clear cache on that server. So if a server is not showing the up-to-date content, i browse the page using ip address and clear the cache. This however will effect performance and should only be used in rare cases like this :-)

btw, please mention which version of episerver you are on?

#180676
Jul 20, 2017 16:09
Vote:
 

@Thair, that is not a bad idea, and it might be something we might have to resort too. It is slightly more efficient than recycling the webserver.

And of course - we are on version 10.10.0.0. We have had the problem for a while now, and have been keeping up with upgrades on a monthly basis. 

#180677
Jul 20, 2017 16:16
Vote:
 

Can anyone provide some more information about the reliability of using the default UDP Multicast for load balancing? What are the benefits over the TPC approach?

#180699
Jul 21, 2017 13:19
Vote:
 

It seems that you need to look at DRBD, glusterfs, and similar products for real-time replication of your files.
192.168.1.1

#181733
Aug 29, 2017 9:44
This thread is locked and should be used for reference only.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.