Lately I am dealing a lot with security issues and, as I am thinking about security all the time, I decided to write this post. Yet, don't expect me to talk about the newest cases.
So, first, what is considered a security issue?
I don't have a perfect answer but maybe we can say that if anything in an application can be used in a manner that wasn't expected and it can make you lose money or somehow cause you trouble, then it is a security issue. Note that the problem doesn't need to affect you directly, as an issue that causes a problem to your customers can become a problem to you if they decide to sue you.
I know, such generic description doesn't explain any technical detail of a security issue... but actually that's the best trait: It is not exactly what's technically wrong that matters. It is how bad are the consequences.
When I was 13 years old and people used paid BBSs, some of them had games where you could bet some of your remaining minutes/downloads and, if you were lucky, you will receive that bet as extra minutes/downloads. If not, well, you will lose them.
One game in particular was a race between four horses and, considering any relatively good randomization, it was expected that people would be losing 3/4 of the time and winning only 1/4 of the time but, "somehow", many people started to win most of the time.
I don't even remember how I was talking to the owner of one of those BBSs but when he told me about the game and that people were winning I decided to try myself. He and some other people that knew the case believed that a specially crafted hacker application could be previewing the randomization or something similar... but I didn't need to play three times to discover what was wrong.
I did a first fair play and lost. Then I did what I considered an unfair play... I actually won the race, but ended up losing because of my unfair try. I already knew what was wrong, but I played unfairly again, my horse lost and I won!
So, what was I doing?
My bet was a negative value.
So, every time I chose the right horse I actually "lost". Every time I chose the wrong horse I actually "won". Considering the odds, by playing all the time choosing the same horse and "losing" most of the time, I was "winning".
Important things to observe
- The error was simply a missing
if (value <= 0) condition;
- People naturally thought about much more complex kinds of attack. Yet, the simplest one was biting them;
- This error would not be a security concern if it was a normal game. It would still be a bug, but it only became a security issue because people winning minutes/downloads that way weren't paying for those minutes/downloads anymore.
This second case I saw sometime later. Much before OneDrive and "the cloud" we already had sites that allowed us to store some files, either as backup or as a way to publish our own sites.
In this particular case, the site allowed people to store up to 1MB of files (at the time that was a lot) for free. To use more than that, users were required to pay.
One user was not supposed to either read or write other user's files. When listing the files one will access a link similar to http://www.somesite.com/list.asp?folder=/MyDocuments, which was actually mapped to a server folder in a path like C:\SiteUsers\LoggedUser\MyDocuments.
The site didn't use a real windows user or anything, yet as the login to access the site was used as part of path, the developer believed he was constraining users to only see their own folders.
That is, if I had a user account named PauloZemek, I was only supposed to see files that are under C:\SiteUsers\PauloZemek. If a folder like /Documents/SecretDocuments was provided, the path would be translated to C:\SiteUsers\PauloZemek\Documents\SecretDocuments. So far, everything seemed fine.
Users simply didn't have access to that
C:\SiteUsers string and they simply couldn't remove their own login name when accessing any path, so everything was safe, right? I (PauloZemek login) wouldn't be able to access files from SomeOtherUser, right?
Well, what happens if I provided the following path ../SomeOtherUser/ to list.aspx?
What folder will I be accessing?
The page will translate that folder to C:\SiteUsers\PauloZemek\..\SomeOtherUser\ but the actual file system would translate it to C:\Users\SomeOtherUser\. This happens because the ".." actually means parent folder. So, from my user, I would ask to go to the parent folder (which has all users) and then I could choose to which user to go.
Notice that the problem here is worse than simply accessing other users files. If I provided only the .. path I would be able to see all the users. It would be even worse if I used ../.. as I would be able to access the C:\ folder, having access to the entire drive.
I was talking only about the
list.aspx, which list folder contents, but remember that the purpose of the site was to store files. Because that kind of mistake was made in all action, like read, write and delete, I was able to read, write and delete files anywhere, being able even to take over the entire site or to destroy important files from the operating system.
Maybe verifying if the path included a ".." to block it would be a solution, yet there are many ways to solve the problem and verifying the string itself is one of the worst. Personally, I consider that the OS should only allow the ".." to work in the beginning of the path, never on the middle or the end. Nonetheless, we can't count on that behavior changing. So, in this case, we must be aware that it doesn't matter if we put a "prefix" to the path, people could always "go backward" on that prefix. So, another kind of validation was required (and maybe using different OS users to access those folders will greatly improve security).
The previous case is not SQL Injection, yet it shares its root. When navigating folders, the developer was sure that by starting things with C:\Users\UserName\ it would be impossible to access things from some other user, but the .. has a special meaning. Instead of going any deeper in the folder tree, it was actually going back a level. SQL Injection shares the issue that some characters have special meaning and, if you simply add what users write to your own string, bad things can happen. In particular, the ' character is the problem in SQL. For example, a developer may be doing this query:
SELECT Name FROM Users WHERE Name LIKE 'user provided string here';
And the user provided string could be:
'; DROP TABLE Sales; SELECT Name FROM Users WHERE NAME LIKE '
Which will become:
SELECT Name FROM Users WHERE Name LIKE ''; DROP TABLE Sales; SELECT Name FROM Users WHERE Name LIKE '';
Which will be a valid SQL command. It will do SELECT, then it will drop a table named Sales and then it will do another SELECT. The second select was done only to guarantee that the entire command will be seen as valid, as finishing the command by '; would not work in the case of the DROP statement and would invalidate the entire statement.
For the ' character only, there's an easy patch. It is enough to replace a single ' by two of them, like ''. Inside the string, the '' are read as a single real ' and not as a string ending delimiter. Yet, there are other characters that can cause issues, so the best thing is to use named parameters, like:
SELECT Name FROM Users WHERE Name LIKE @Name
And then, the @Name is filled by another statement in the actual programming language, and a ' will always be seen as a ' and not as a string delimiter.
The previous example can be hard to understand if you don't know SQL. Yet, as it is quite common, I decided to show it.
Actually, SQL injections usually require the attacker to know something about the database. How can someone drop a table they don't know to exist?
In the case of folders, everybody that used those "cd" commands knows about the .. folder. In the case of SQL Injection, knowing SQL doesn't mean you will know the database structure. Yet, there are two things that can be done:
- Guessing. Some database structures are so simple that it is easy to guess (tables like Person, User or similar are very common);
Information disclosure. This one may happen by misconfiguring a web-server or by trying to give full error messages to your users. For example, in the case of that wrong select, a user can provide the string 'a and the database can throw something like:
Invalid token at column 42: SELECT Name FROM Users WHERE Name LIKE ''a';
Great. With this, the attacker knows that a table named Users exist. If the table was named U1_Users, well, the attacker will get that.
In fact, there are lots of ways to cause an information disclosure. The SQL Injection itself can be used to read different tables and fields and that may include passwords, for example, causing an information disclosure... yet, some error messages may be showing what's wrong with so much detail that attackers can benefit from them. So, if from one side it's terrible to receive generic messages like "Something went wrong", from the other side it is terrible to give attackers error messages so complete that help them improve their attacks.
Even if all the examples I gave so far were happening on servers and there was communication involved, they weren't caused by the communication itself. If those applications/sites could be used locally you could also explore their errors locally, independently on how bad would be the consequences.
Well, many attacks actually don't care about the application itself, they try to exploit vulnerabilities on the communication method being used. For example, if a site is using HTTP during login someone can "snif" the network (process in which they receive all packets sent over the network) and they will easily see the login and password sent to the server. Do I need to say how bad is this?
But there's much more that can go wrong on the communication itself that can become an issue and, many times, frameworks that help write things "easier and faster" are the source of lots of vulnerabilities, independently if the application main logic doesn't have any issues. So, let's see some of them:
An easy way to cause a DoS (Denial of Service) is for the data being read to request the creation of an array extremely big (like 1gb) and for such data to actually come very, very slowly (possibly one byte every 50 seconds). Open some connections at the same time, do some of those requests until you start receiving out-of-memory exception... and keep your connections active. Real users will not be able to use the application/site anymore because all the memory is being used by those attack requests.
This kind of problem will happen on a naively implemented communication framework that uses serialization/deserialization over TCP/IP (or similar).
To those who don't know, serialization is the process of converting programming objects into bytes. Deserialization is the process of converting those bytes back into programming objects. Not all communication requires objects, yet many of the modern communication frameworks do it through objects because it is easier that way.
Actually the problem happens because of a combination of what are generally good practices (SRP - Single Responsibility Principle, SoC - Separation of Concerns and others):
- The application tells the communication framework which methods/functions can be invoked by a remote call;
- The communication framework keep requesting to deserialize objects, which are supposed to tell which functions to call;
- Those objects are actually deserialized from data read from a transport/stream of bytes (like TCP/IP);
- That stream of bytes is only supposed receive bytes and nothing else. It doesn't know what objects are going to be created or what methods/events will be invoked.
Actually, the stream of bytes usually has a configurable time-out, but if Separation of Concerns/Single Responsibility Principle is really followed, even the time-out will be done as a separate "layer" on top of the stream.
So, the problem is this:
- A time-out of 60 seconds is usually applied to every stream packet and packets are limited to a small size (like 8kb). It doesn't matter if the expected object fits the 8kb limit or not. As long as there's a packet coming (and one byte can be all that's contained in a packet) the connection stays alive;
- The deserializer doesn't care about packets. It reads bytes and usually determines the kind and size of object to be created from the first bytes read (so, an attacker can send bytes telling to create an array of bytes with any size he wants). It doesn't matter if the packets of the transport are limited to 8kb, the deserializer will actually keep reading new packets of data until it can fill the array;
- The code that's going to use the objects may actually not support arrays at all. Yet, it only asks to deserialize an object and then casts the object to a specific interface or has a switch (or similar) code to decide what to do with the object just deserialized.
Actually, when the last step happens, it is too late. The attack is to actually keep lots of memory allocated while sending data very slowly. When the object is completely deserialized, an error may be raised and the memory may be reclaimed, but that will happen only after the server was already out for a lot of time. And it is quite easy to request a new giant array as the next step.
What could be the solution?
- Before deserializing, it must be set which types of objects can be deserialized, and the biggest size of arrays/memory that can be consumed. Instead of deserializing the entire object to then discover it was the wrong one, don't even allocate memory for an object that's not supposed to be deserialized;
- The time-out must be set for the entire deserialization, not per packet. That is, it should not be on top of the stream of bytes, it should be on top of the deserializer.
The biggest problem is that most deserializers don't allow you to validate the objects that are about to be created or to tell which are the valid types. They do their job and you are not supposed to interfere (actually we can say they are correctly following the Single Responsibility Principle).
Notice that such naive implementation of a communication protocol works quite fine when nobody is trying to attack the application (on local environments, for example). It is guaranteed that the application will always send entire objects as soon as possible and that only valid objects will be serialized. Yet, an attacker can completely ignore the process of object creation and data-serialization and send its own bytes the way he sees fit.
Other deserialization issues
There are two other deserialization issues that may happen, and both are related to deserializers that can actually load assemblies. Thing about it: When you are closing your application, you serialize some objects that represent the current application state. Those objects and their libraries are in memory. Yet, when you open your application, those objects and libraries are still not in memory, and the purpose of the deserialization is to recover a previous state. In that scenario, you obviously want to load the required libraries.
So, the problems that may happen are:
- An attacker may make force your application to load lots of assemblies and types your application is not expecting, making it use more memory than expected and even causing some conflicts as many libraries and types have their own initialization, which is particularly problematic when two versions of the same library are loaded;
- Some deserializers will keep a cache even of failed libraries/types. So, it is easy to cause a denial of service by requesting to load an infinite list of libraries, like a.dll, b.dll, c.dll etc. It doesn't matter those libraries don't exist, all the memory will be consumed to store the errors in the cache.
Multiple Authentication Methods - The Attacker's Choice
Many communication frameworks allow the creation of multiple bindings in parallel. Many times, it is an easy decision like:
- Create a binary and unencrypted binding for local communications, as it is the fastest;
- Create an HTTPS binding for external communications, as it is more standard and secure.
But, sometimes, people forget a little detail: They keep the binary and unencrypted binding open to the internet. Some will actually give the excuse:
- "External users aren't supposed to use the binary serialization. If they do that and someone steals their credentials, it would be their fault."
And that's right. But we return to the problem of deserialization. An attacker can naturally target the binary binding from the outside, simply because he knows how to cause a Denial of Service using it. It doesn't matter if all the real external users are only using HTTPS. The hacker still has the opportunity to use the vulnerable protocol.
Transparent Remoting versus Communication Framework
Maybe there's no official difference between a remoting framework and a communication framework, yet I usually see the "remoting" term being used when the communication is "transparent". The class was not created to be used on a remote communication, but thanks to a remoting framework it is possible to do so.
In this definition, a communication framework is different. Classes that are accessible by the communication framework must be created to be used that way.
I must say that I usually prefer remoting frameworks. This means I can create a library that works well locally and, if my needs change, I can create a service on top of this library and then make it accessible to many different servers. That is, it will help me scale from a simple library to a Service Oriented Architecture without too many issues. Yet this is rarely suitable to make the objects available to any external network.
The issue here is subtle, but the fact is that most libraries allow the developer to do much more than usually required. A good application will only use the valid calls, though. But what happens when you expose such a library through a remoting framework?
The real application will keep using the library correctly. Yet the library is now open to be used by other applications. Who guarantees that other applications (including those created by an attacker) will not do bad things?
Are Communication Frameworks better?
If transparent remoting is bad, how is a communication framework better?
Well, actually, it is not really "better". It is "explicit".
In transparent remoting we simply get a class and say: It is available to be used from the external world. You don't even need to know all the methods of the class, you simply made it available. Very easy, but you may be exposing much more than needed.
With a communication framework, you must be explicit on what you want to expose. This will not guarantee you will not expose too much but, definitely, if you forget to expose a method, it will not be available. It is much easier to see that you forgot to expose a required method, which will make your application fail, than it is to see that you exposed too much, which will only become visible if an attacker actually explores such a vulnerability.
Also, most developers have a different mindset when they are creating classes they know will be exposed on the internet than when they are creating classes to be used locally. The "security" traits for local calls is usually limited to validating input parameters and thread-safety, while the security traits of objects exposed to the internet, well, they are much more complex.
That's all for now
When I started writing this post I had more "low-level" attacks in mind but, personally, I think this list is enough to show how "easy" things can go wrong when we talk about security.
Maybe at another time I will explore more low-level cases and show some actual code.
I hope you liked and also that this post make you think more about security and how easily things can go wrong, even when the application is "working as expected".