I'm experiencing a nasty problem in a server-client application. Don't have the perfect solution yet, but I have ideas. Seeking opinions/ideas to find the perfect solution.
The application is a card/board gaming platform with many other functions for chat, management and such.
Client is healthy and works as an user interface to communicate with server and other clients. Typical client, nothing special.
As you can see on the chart server is divided into parts. Listeners are actually infinite loops awaiting connection requests. You can pick the channel you want to connect and login.
Channels have a maximum limit of 100 clients, it's the place where all chatting and gaming happens. Clients can roll new rooms and get to gaming part. They can publicly chat with everyone in the channel, they can perform management functions (kick, ban, etc) if they are authorized. Each channel is another independant executable, they can't reach other directly. However for some particular occasions they can reach each other indirectly via socket.
Each channel has a "Client" object to define users and a "ClientCollection" to hold all of them. Since each client object works under their own thread, I use SyncLock on the collection to avoid inconsistence.
The main problem is;
when the channel reaches certain numbers of online users (limit's 100, but it starts with 75-80 clients), things start to get interestingly unstable. For instance, when you type a single sentence on chatbox, it can be distributed to other 80 people after one minute (instead of instantly). I monitor the server's CPU usage, surely it increases, but nothing unusual. Normally, a single channel application uses a minumum of 0% - 4%. When we have 75-80 users, it goes up to 5%-10% which looks acceptable. It feels like a temporary lag, whatever is causing it can lag everything for a couple of minutes (it even disconnect users), then it goes back to normal. Somehow it gets frozen and out of breathe, it coughs, catches its breathe and goes on running. Even sometimes, we are forced to restart individual channels.
When we have less online, everything goes smoothly.
At this point, since channel runs unstable only when it is crowded, I tried to observe the differences between a crowded environment and an underpopulated environment. I logged incoming and outgoing data to reduce and eliminate the most repetitive one in order to reduce traffic, hoping for some performance increase.
In my theory, this unstability occurs more often when an action effects the whole channel, just like chatting. Whenever you type something, if there are 80 clients inside, this has to be sent to 79 other clients (each task is done by a single thread, not a thread per client). And this is my main suspect. It's a lot of distributation work for the server, when 10-15 users are chatting at the same time in public. Currently, I'm distributing all these messages to the users by a single thread per task (new user connected, someone typed something, someone left the channel). Maybe using another way like TaskFactory could solve this issue. I wonder if it makes sense to gather the data required and deal with the distributation using TaskFactory. Would it solve my problem, what do you think?
.NET Client - Server application gets frozen, need ideas to fix!
We addressed a similar issue before when we first started on this project. It was advised us to set socket's NoDelay property to true. Maybe it is possible to get over with this by a single smart trick.