Making Fast-Paced Multiplayer Networked Games is Hard

markmnl

5.00/5 (18 votes)

Sep 1, 2015

CPOL

20 min read

16576

Overview of making fast-paced multiplayer networked games - the challenges and techniques to overcome them.

Introduction

Fast-paced multiplayer networked games over the Internet are difficult to develop to a standard where the experience is still fun. Nonetheless, fast-paced multiplayer games like Counter Strike, Unreal Tournament and Quake have defined a generation. Where are all the Indie games in this category? I am not talking about AAA like graphics bonanza, just a fast-paced game I can play with my friends over the Internet. Some of my favourite childhood memories are at LAN parties blasting my friends with a rocket launcher and that was 1999 on a 10Mb/s hub. Now we have prevalent broadband Internet. There are some Indie games: Soldat is one, others I can think of off the top of my head are built on existing engines such as Open Arena, but what standalone Indie titles are there? I suspect one reason there are not so many is the difficulty in developing a fast-paced multiplayer networked game.

I am a network programmer on Square Heroes – a fast paced multiplayer arena shooter. So how did we (Gnomic Studios) do it? In this post, I cover the networking side of things. I try to stay at the game logic level on the problems with fast-paced multiplayer networked games over the Internet and some techniques to overcome them. Without getting too much into the bits and bytes, working with UDP, IP, Sockets, network topologies, NAT traversal... If you would like me to blog on them too, hit me up – I would be more than happy to.

Just to clarify when talking about “fast-paced”: twitch shooters are the typical example and would fall into the category I am talking about here (there are specific games in other genres too). RTS and online RPG games I would consider “medium-paced” and anything turn based “slow-paced”. Even though a game may appear fast, the underlying actions that matter and need to be networked may be relatively few and not time critical in comparison. For example, Final Fantasy as a turn based game: while a hardcore fighting animation might be playing out nothing needs to be networked – only the command that started the action: “player 1 attacks enemy 3 with gigantic sword” – thereafter, both peers know the sequence of events and play them out, i.e., it is deterministic. (I do not play Final Fantasy but that seems to be the gist of it). So how do multiplayer games do it? Well, this approach I just gave in the example is called “lock-step”.

How Did They Do It?

Lock-step is when the game “locks” until it has all the required information before continuing another step. This method is inherently suited to turn based games where that is the core gameplay anyway, another example of this is Chess – you cannot have your turn until your opponent has completed theirs! Lock-step can be used in any game though, a shooter (say) could change input in any frame, e.g., “fire rocket launcher” and/or “turn left”. So if playing against a network peer, the game could:

Get local players input and send this to remote peer
Wait for remote input (i.e. “lock”)
Update the game with everyone’s input and repeat

In fact, this is the method used in DOOM. What is wrong with it? John Carmack in Development on Doom Classic:

Quote:

Doom PC was truly peer to peer ... It also stalled until valid data had arrived from every other player... The plus side of an arrangement like this is that it is truly fair, no client has any advantage over any other, even if one or more players are connected by a lower quality link. Everyone gets the worst common denominator behavior [sic].

It all depends on how long Step 2 takes and how that affects the gameplay. In Chess, if it takes 250 milliseconds (i.e., a quarter of a second) to be notified, your opponent has completed their turn, no problem. In a shooter which you are used to updating at 60 frames per second (i.e., every 16.6’ milliseconds), the game would freeze for that quarter of a second, i.e., 15 frames! This is why DOOM multiplayer is fine over LAN but becomes intolerable over the Internet the higher your ping gets.

What Are We Working With?

So what are we up against, how long does it take to network stuff over the Internet? How long it takes depends on the distance between the peers and what the signals have to go through some of which is variable: networking devices, quality of line, congestion, etc. Congestion can happen on any link when its bandwidth is exceeded and starts backlogging data - queueing it for transmission later, data can even be dropped when these queues become full. Anyway, the points are:

It takes a variable amount of time to each peer (latency) and to a lesser degree, a variable amount time each time (jitter).
If you are sending/receiving more than the weakest link can deal with (in addition to everything else going through that link), data will be delayed. So we have another limit: our available bandwidth.
Data is transmitted in discrete units: packets, which can go through different routes and arrive out-of-order to that they were sent in, not arrive at all: packet loss, or arrive more than once: duplication! Packet loss can occur for an infinite number of reasons in addition to exceeding bandwidth excessively, e.g., dog ate my network cable; and can just as quickly start working normally again, e.g., ISP patched network cable.

OK, so we cannot wait for data all the time (i.e., lock-step) as that means freezing our fast-paced game for potentially intolerable periods. Let’s just keep updating and use the data when we get it - when remote player tells us they are at this position, put them at that position! Sorry, that is not going to work either:

What if that information got re-ordered in transit and we receive newer information before the older and process the older later?
What if the information got lost in transit?
They are probably not at the position now, they were at that position some time ago. If you fire your rocket launcher and hit them at this position on your machine, your rocket could go whizzing past them on their machine!

I will quickly cover out-of-order, lost and duplicate packets to the extent they are a concern to the game logic. A networking layer usually abstracts away sending and receiving data so duplicates are discarded and you can specify whether the data should be processed in-order, reliably (i.e., if lost send it again till it is received) or both. Sending things reliably generates extra traffic because we have to acknowledge receipt of traffic to be sure the other end got it, also should the other end not get it we have to send it again. If we send something in-order and reliably and one packet gets lost (as it inevitably will over the Internet), we have to wait until that one packet is re-sent and gets through before subsequent packets can be processed. Since the sender only knows to re-send after some time, this delays things considerably, up to seconds, too long in a fast-paced game. (This is the main reason UDP is used, TCP/IP sends everything reliably and in-order which simply delays things too much, so throughout this article, I am only talking about UDP as it really is the only option for fast-paced games over the Internet). So we send most in-game data in-order but not reliably – packets are processed in the order they were sent but we may get more packets lost as packets arriving out-of-order will not be processed at all if a later packet already has. Critical events in the game still need to be known (such as player kills and deaths and of course game over) so are sent reliably (without specifying in-order as well so future packets are not delayed even if the reliable one has to be re-sent). Phew!

What are our constraints? From a study conducted by Bungie for Halo 3 Beta 2007:

200ms one-way latency (i.e. 400 ping) between any two peers with 10% jitter
8 KB/s bandwidth up and 8 KB/s down
up to 5% packet loss

99% of Xbox’s had these values or better. We could probably sneak in a few KB more bandwidth now days but remember, it is the weakest link taking into account everything else going through that link – if you are downloading from another peer, their upstream is probably the weakest link which is usually a lot less than one’s download. Remember keeping 99% happy is still leaving 1 in every 100 gamers unhappy and they are likely to be more vocal about their experience than the happy ones!

First Attempt

OK, so let’s just send all our state in-order then! Simply sending a few attributes of a single player in an 8 player game such as: position, angle facing, weapon equipped and a firing flag and you could exceed your bandwidth limits! No kidding, let's add it up:

Attribute	Data Type	Number of bytes
Position	3 Singles	12
Angle	Single	4
Weapon Equipped	byte	1
Is Firing	byte	1
Overhead	bytes	34
	Total	52 bytes

Wait, what is “Overhead”? It’s the IP header information included on every packet (at least 20 bytes) plus the UDP header (8 bytes) plus the networking layer header (with information such as sequence numbers so packets can delivered in-order, 5 bytes is conservative - for example XNA’s Networking layer adds about 23 bytes). OK so now we are at 52 bytes:

52 bytes x 60 times a second x 7 other players = 21840 bytes or approximately 21KB/s!

And that is only a single player! What about all the other objects that need to be networked?!

Why did we multiply by the number of players? Over the Internet, you have to send unicast, i.e., to every peer individually. But what if I have a Server-Client topology then I need to only send to the server and the server relays the necessary info onto all other peers? True, however “the server” can be one of the other players so someone has to send all to everyone; or it could be a dedicated server (note dedicated server or not it is still going to be sending to everyone so the server is going to need the upstream bandwidth and the clients the downstream); however that comes with its own set of problems for instance increasing latency – now if you play against your neighbour, you have to go via a dedicated server which it would be great if it is close by but if not, no matter how good your Internet connection is, gameplay will be laggy. For this reason, Titan Fall multiplayer is not available worldwide – personally, I think that sucks having grown up in a remote corner of the world – it disenfranchises part of the world from your game. One solution is to allow anyone to run a dedicated server – preferably someone at your ISP does – but what if no one does? I digress, network topology can be a whole topic in itself, the point is we cannot send state every frame to keep things in sync, furthermore that state would be old by the time remote peers get it.

At this point, you will find game network programmers cowering under their desk dreaming of simpler times when they were writing classical physics algorithms.

Most cases are better than the parameters I gave but some are worse, generally a game should be playable without players noticing and being annoyed by anomalies at these values. You can shake your fist at these all you like, I did, but it is not going to help. In fact, take a moment to reflect how remarkable it is if we can send information half way around the world in a quarter of second! Have a read of Shawn Hargreaves excellent presentation: Networking Traffic Jams and Schrodingers Cat, from which I learnt an enormous amount. Start early thinking how you get away with sending as little as possible and don’t count on things getting better. Latency is here to stay, the speed of light is not getting any faster and it does not appear to be physically possible to communicate faster than that! Perhaps with more bandwidth, we can send more to predict more.

Prediction

By way of a simple example: if we know a player was at position: p, moving with a velocity v, t seconds ago - where are they now assuming constant velocity? Easy:

p2 = p + vt

Their position after t seconds is p2 which we can predict given p, v, and t. (Of course, your objects likely take a more complex path than the straight line of constant v, but the idea is the same given all the variables you can solve it). So a remote player only needs to tell us when their velocity changes and from what position they were at when it changed. Since a change in player’s movement relies on input from a slow human - that is going to happen far less frequently than 60 times a second – typically only a few times a second. Great, so we can cut down on our bandwidth considerably to solve the remote player’s position. However, there are some new problems now – we extrapolated their new position based on their old state with knowledge of how they move but what if over time t the player:

Collided with something, e.g., a wall, a projectile or a gorilla (hey I don’t know what game you are making)
Changed direction

To detect for collisions that could happen over the period, you will have to work it out by interpolating from their old position, in fact depending on how the physics in your game works rather than extrapolating their new position, you could simply set the new information on the player, then pretend time t has elapsed so the physics plays out as it would have had it have the new information t seconds ago. Integrating prediction with your physics can be one of the most challenging parts of all this, don’t be afraid to think out of the box, different solutions work for different games and different objects within those games.

If the player changed direction soon after the update, we should get that update soon too but in the meantime, we have the player moving in potentially the opposite direction. This amplifies a problem with the delayed updates: when we receive the new information and calculate their new position, remote players will appear to jerk or “pop” to their new position because we had them travelling in another direction for longer than they actually were. There is no 100% perfect solution to this beyond a lock-step approach – remotely controlled travelling objects will overshoot where there is latency and the greater the latency and change in velocity, the greater the amount they will overshoot by. The first step is to reduce the issue as much as we can: important objects such as players should send their changes in state (that remote peers should know about to make correct predictions) ASAP. Beyond that, we need to resolve the players “popping” issue – it ruins the virtual reality of the game.

Smoothing

When we receive an update telling us an object was here, we predict where it is now. If this is further away from the current position than the object normally moves in one frame, the object will appear to “pop” to the new position. When can reduce this effect by “smoothing” – track a position of where the object most likely is and move the visual to it over a period of time (at faster than the normal rate so it can catch up) rather than all at once. It may also be an idea to move the physics quicker than the visual to the most likely position so hit boxes and such are more in sync – but that would be at the cost of having the physics out of sync with the visual on the local machine. This may all sound like smoke and mirrors because that is exactly what it is – we are just maintaining the illusion the game is playing out in wall clock time even though updates are arriving from the past.

Sometimes updates arrive too late and/or objects are moving too fast that having a more accurate state is preferential to having a normal looking but out of sync state. So we give-up and let the object’s “pop” to the current position. This is often visible in FPSs with remote peers to which you have a high latency to (possibly via a server) as players “pop” from one position to another. (Note this doesn’t mean the higher latency the higher the accuracy! The next update from a lagging peer will also be too late so their position would have got way out again causing them to “pop” again. Smoothing is preferential to not smoothing at lower latencies even though it results in remote player being more out-of-synch than need be temporarily because not only does it look better, but humans can better judge the normal moving thing than an erratic one)

What Is the Time?

How do we know t used in the prediction calculations (i.e., the time elapsed since an update was sent and we receive it)? We can measure Round Trip Times (RTT) on packets which we know solicit an immediate response (the networking layer may do this for you) and divide that by 2 (we are only interested in the one-way latency). Unfortunately, RTT is not constant, this difference from the average RTT is called jitter. So we can only estimate how long a packet took to arrive on receipt based on previous RTTs. More elaborate techniques to try deduce this period such as including timestamps in the packet can be used however timestamps are only useful if you have synchronized clocks to within a few milliseconds – which system time is certainly not! A problem exasperated by time only being an estimate and receiving delayed updates is: things will not always play out the same on all peers. What if I hit the enemy remote peer with my sniper rifle on my machine but they had just moved out the way on their machine?

Who is da Authority?

Ultimately we cannot have it both ways – play our game at wall clock time while receiving updates from the past and have all peers be perfectly in sync. Nor can we leave each peer to determine its own version of events – if we let one peer think a player was still alive while they are dead on another, do we show their dead body flying around shooting because the player is still alive on the controlling peer? Who wins the match? We can resolve what actually happened on one peer and say what happened on that peer is the version everyone else must go with. In Server-Client topologies, it makes sense for this authority to be the server since it is in direct contact with each peer already (logically in the centre of network). Also having a server authority can help prevent cheats who have modified their game to give themselves an unfair advantage because they cannot modify the server.

Anyway in any topology, it could be any peer, furthermore different peers could be the authority on different things. What do we do when a peer who is not the authority got it wrong? That peer has to recover from their “mistake”. One way is to simply wait for the authority to give us the “true” version of events so we cannot make mistakes. However for fast-paced games, that could mean delaying things players expect to happen immediately. Anticipate these “mistakes” in your gameplay, for example: if I push Humpty Dumpy on my machine, but from the authority’s view point, Humpty Dumpty moved aside immediately before I pushed him – what do we do then? How do we recover from Humpty Dumpty’s fall that didn’t happen – everyone knows you cannot stick Humpty Dumpty back together again! We anticipate we could get it “wrong” so we give ourselves some leeway by building in some delay till we know the “true” version – Humpty could wobble for half a second when you push him (providing immediate feedback) before he falls, giving you enough time to learn whether the Humpty Dumpty Authority also registered him has being pushed – if not (which will hopefully be rare because of your prediction) you can recover from a wobbling Humpty Dumpty easier than a broken one!

Old School Tricks

Prediction helps keep our data rate down considerably so we don’t have to constantly send state but there are often lots things that need to be networked which when added up can still cause us to exceed our available bandwidth. Unfortunately, we cannot just gzip our packets like we could say a HTML document – up to a few hundred bytes of binary data will compress very little and will just cost CPU. Notice how I said the “is firing” flag in the above example took one byte – it doesn’t need to (most programming languages just store Boolean data types as such) we could squeeze 8 Boolean flags into a byte using bitwise operations. Do we really need to know the precise angle other players are facing in when they are not firing? We could use half precision floats instead of single precision (2 bytes instead of 4). We often store numbers as 4 byte integers but will they always be positive and never exceed 255 (could fit in 1 byte) or 65535 (could fit in 2 bytes). This compacting of values will get you someway but before you do these optimisations consider:

Delaying non-time critical updates so they can be sent the same time as other stuff so it can be appended to the same packet and reduce the number packets which saves on overhead each packet has.
Only sending network updates when they matter – if another player rotates a small amount but is still in the same place, does everyone need to know this? Perhaps only when they rotate 1/16^th of circle need we let others know.
Does it need to be sent at all? Perhaps this is the most important question you can ask yourself – remote players won’t care so much if the leaves on the tree are out of synch. Better yet given enough thought many things can be made deterministic so do not need to be networked. For instance, even seemingly random events such as what power up does this crate contain can be based on a pre-determined pseudo random seed.

Wrapping Up

Fast-paced multiplayer games over the Internet are hard, but possible. First understanding your constraints then building within them is essential. I hope I have shed some light on what those constraints are and some of the techniques you can use to build within them. No doubt there are other ways out there and ways yet to be used. Each game is different and has its own set of priorities. Learning from what has been done before could help a great deal. Hopefully, we will see more Indie games standing on the shoulders of the giants that have come before them to explore the limits and create new multiplayer gaming experiences possible on the ever improving Internet.

Useful Links

(A lot of what I learnt is from these links and I have regurgitated them in some form or another in-directly after they have been percolating in my brain for a year or so).

Hargreaves, Shawn. Date? Making Networked Games with the XNA Framework: http://www.shawnhargreaves.com/MakingNetworkedGames.pdf
Hargreaves, Shawn. Dates? Networking category in index of blogs http://www.shawnhargreaves.com/blogindex.html#networking
Fiedler, Glenn. Date? Game Networking (various in-depth programming game networking articles): http://gafferongames.com/networking-for-game-programmers/
Aldridge, David. 2011. I Shot You First: Networking the Gameplay of Halo: Reach: http://www.gdcvault.com/play/1014345/I-Shot-You-First-Networking
Sanglard, Fabien. 2012. Quake 3 Source Code Review: Network Model (Part 3 of 5): http://fabiensanglard.net/quake3/network.php
Valve Corp. Ongoing? Source Multiplayer Networking (many links hang off this page): https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking

History

1^st September, 2015 - Added to Code Project from my original Gamasutra post with small amendments to bring 2^nd paragraph up-to-date now game has been released