Click here to Skip to main content
15,616,746 members
Please Sign up or sign in to vote.
5.00/5 (4 votes)
See more: (untagged)

I have been given the source code of a large tailor made .Net Windows Application and im required to scan through it and certify that it is safe and Malware-free. Are there any tools out there that actually scan .Net source code to detect possible embedded/hidden Malware code?

A logical strategy may be to look for code that sends sensitive data outside the application (such as by email, WCF, web services...etc), correct? If not, what else should i look for?

Any advice would be greatly appreciated...


Sergey Alexandrovich Kryukov 25-Jan-11 3:04am    
Interesting question, even though if source code is provided, in real life it is very unlikely it is malicious. Who knows, though, so I voted "5" for this Question. Sending outside is not enough.
Sandeep Mewara 25-Jan-11 3:15am    
My 5! too.

Strictly speaking, it is not possible in principle. To proof illustrate that, one contradictory example is enough, right?

The example about sending out sensitive information is very good. This criteria is not enough, but if such data is sent out, this is a problem. Suppose the detection of this situation is detected. There are non-malicious applications designed to send such information, for example, error information is sent (hopefully based on the customer consent) to the company for the purposes of support and bug fixing (the ugly truth is, the customers are unable to describe what they see, instead they would rather tell what they think they see, which is usually not true). So, the detector of malicious activity will detect legitimate program as malicious.

Is the attempt to delete a file should be considered as malicious? Apparently not, because we should be able to implement file commanders.

On can say, let's demand that a non-malicious operation is only allowed on user's consent. The problem is: it is theoretically impossible to calculate based on the algorithm review. Why? There is a fundamental result of computer science (computability theory,[^]): it is impossible in general to predict what a Turing-complete program will do over an arbitrarily long time. Likewise, a program may contain a code for user's consent, but how to compute that the program will ever reach this fragment of code? The detector is bound to produce false positives and false negatives.

At the same time, it may be note completely hopeless. In real life a detector would be useful if it could classify all programs into certainly malicious, certainly non-malicious and uncertain.

The biggest problem is definition of what to consider malicious -- I'm pretty skeptical about such a prospect.
Share this answer
GPUToaster™ 25-Jan-11 4:33am    
In reality this is all related to behaviors and patterns which are quite specific.
Sergey Alexandrovich Kryukov 25-Jan-11 13:51pm    
Does it justify my skepticism? :-)
GPUToaster™ 28-Jan-11 9:47am    
It was just a summary ... thats all! :)
Sergey Alexandrovich Kryukov 28-Jan-11 11:56am    
OK, thanks. Keep toasting and have fun :-)
Espen Harlinn 6-Feb-11 14:35pm    
Good answer, as usual :)
This is a great question. And I can say, there is a huge gap of tools supporting that assist, support or (semi) automate security reviews (especially smartphone apps).

I dream of a tool that is able to categorize, group and rank commands, function and literals by danger. maybe by using some ontology and natural language processing, with a plugin system and a community driven heuristic ai and such...

in fact we have all that by now.

there are ontology frameworks and nlp processor projects for the English language out there. on android, the well known security suites watch out for "spy function" of apps like "read contacts, send messages" permissions etc. and at least warn you. further more there are wordpress plugins, that scan the source for "evil" eval() and other known harmful functions. we have static code analysis for e.g. proper escaping. take that further, and you will get checks for missing input sanitization. if we create a database with hash sums (instead of just publishing them on a project website) of known libraries to ensure they have not been compromised (e.g. gather the md4 and sha1 hash files from github, or use hashed generated by the versioning system). check for up do date dependencies (every package system does that) to ensure all security holes are closed (like secunia does).

For a fist step, I would take a abstract syntax tree parser (AST) eather for c#, vb or even better of the intermediate language, geather all function names, and list all unique string literals in a human readable manner. this output can now be easily analysed for "http" or other keywords that indicate a connection string to leak data. in the next step this has to be atomized and checked against a library. next step is to put a "risk ranking" on each buzzword in the database. the we would implement regex matching to search for patterns like ipv4, ipv6-addresses. etc.

maybe we end up by looking into obfuscated code, like base64 encoded connection strings (a common method for hiding Trojans in php and js code) or backdoor passwords. but even this is not new and therefor doable - see js beautifier which has a detect obfuscator capability, like also most disassemble have.

by analysing code that was already optimized by a precompile (assuming it does concatenation) we may easily discover splitted strings like 'ht'+'tp'

it should even be possible to "teach" a heuristic algorithm that a function like "saveToDisk" need at least write permissions for local storage, which is indicated be the words "save" and "disk". any other use of a function or external library should be reported.

the heuristic search could be trained with well reviewed and trusted code, just like a virus/ malware scanner. in fact such tool would be nothing more than a virus scanner on higher extraction level (e.g. il or c# code)

another feature of a real security audit tool would be a long term behaviour analysis of a program run in a sandbox where all network traffic, all file access is log, so we know if the program "reads our contacts" or "send data to 3rd party servers" etc.

As we can see, this is all a question of time, money and expertise.
Share this answer

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900