How to remove non ASCII characters from a string

Question

4.80/5 (2 votes)

See more:

I receive a message from a remote device, but the string always have some sort of characters, which makes my string invalid. for instance, say I get the message hello, it would be displayed with either heart, club, spade, or smiley face in front of the text. This messes with the operations that need to be done with that string. How can I get rid of the non ASCII characters in a string?

I don't know if I am looking at this totally wrong?

I need to get rid of that little image in front of the text.

Below I have posted an example on how the text looks before printing to the consoole, which shows a club before the text hello.

"|hello"

Another thing I have just noted now is that when copying this value while debugging in the list, it just gives the value "hello". So why in my list when debugging do I see "|hello", but when copying the value it gives me hello?

Posted 24-May-12 22:11pm

Andrew797

Add a Solution

2 solutions

Solution 1

You are probably look at the problem wrong.
It depends on the remote device and how you are talking to it, but there is a good chance that the first character is part of the message structure rather than the message content. For example, it may be a start-of-message character, or a byte indicating the length of the data to follow. Look at where you get the data from: do you receive it as bytes and convert it to a string? If so, then you need to check what the remote device is sending as there may be other trailing characters for example.

Posted 24-May-12 22:18pm

OriginalGriff

Comments

Andrew797 25-May-12 4:23am

Okay, I get what you are saying. Another problem is that sometimes it is in front of the text and other times at the end of the text. There is no specification as to state that the bytes will have trailing/leading characters indicating that the message starts here and ends here.

OriginalGriff 25-May-12 4:30am

Ouch!
Then your only option is to remove all "odd" characters - a nasty task in unicode - do you get it as ASCII bytes?

Andrew797 25-May-12 4:37am

yes, using Encoding.ASCII.GetString() method. I was hoping I could avoid that process.

OriginalGriff 25-May-12 4:48am

Then do the compare and remove on the original ASCII - It's a whole load simpler, as it is basically char >= space AND char <= '~'

I would probably log all received bytes for a couple of messages though, and look to see if there is any pattern to the "rubbish" - it may be possible to do a more intelligent removal (if it is a length, then long strings will be prefixed by a valid printable character, which it would be nice to remove if you can identify it)

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Volynsky Alex · Accepted Answer · 2012-05-25T10:29:00

I think, the following posts can help you:
http://stackoverflow.com/questions/123336/how-can-you-strip-non-ascii-characters-from-a-string-in-c[^]
http://stackoverflow.com/questions/1342000/how-to-replace-non-ascii-characters-in-string[^]
http://forums.devshed.com/asp-programming-51/regex-how-to-remove-non-ascii-characters-from-string-395202.html[^]