|
You don't need a Regex to remove a series of characters:
public static string MakeSafeFileName(string input)
{
char[] invalidChars = Path.GetInvalidFileNameChars();
int validCharsIndex = 0;
char[] validChars = new char[input.Length];
foreach (char c in input)
{
if (Array.IndexOf(invalidChars, c) == -1)
{
validChars[validCharsIndex] = c;
validCharsIndex++;
}
}
if (validCharsIndex == 0) return string.Empty;
if (validCharsIndex == input.Length) return input;
return new string(validChars, 0, validCharsIndex);
}
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I've used Path.GetInvalidPathChars and Path.GetInvalidFileNameChars to sanitize file and pathnames on hundreds of thousands of files over the years and it's never let me down. Now that .NET has been open sourced or whatever it has been. I though it would be good to look at the source. Here's what I found. Both methods point to an array
public static readonly char[] InvalidPathChars = { '\"', '<', '>', '|', '\0', (Char)1, (Char)2, (Char)3, (Char)4, (Char)5, (Char)6, (Char)7, (Char)8, (Char)9, (Char)10, (Char)11, (Char)12, (Char)13, (Char)14, (Char)15, (Char)16, (Char)17, (Char)18, (Char)19, (Char)20, (Char)21, (Char)22, (Char)23, (Char)24, (Char)25, (Char)26, (Char)27, (Char)28, (Char)29, (Char)30, (Char)31 };
and
private static readonly char[] InvalidFileNameChars = { '\"', '<', '>', '|', '\0', (Char)1, (Char)2, (Char)3, (Char)4, (Char)5, (Char)6, (Char)7, (Char)8, (Char)9, (Char)10, (Char)11, (Char)12, (Char)13, (Char)14, (Char)15, (Char)16, (Char)17, (Char)18, (Char)19, (Char)20, (Char)21, (Char)22, (Char)23, (Char)24, (Char)25, (Char)26, (Char)27, (Char)28, (Char)29, (Char)30, (Char)31, ':', '*', '?', '\\', '/' };
So there you go ...
|
|
|
|
|
First, great tip. Thanks
This is an assumption, so take it for what it's worth...
I'm assuming you're needing this because something, or someone, is trying to save a file and you want to make sure the path doesn't throw.
So if you modify the path, someone might not know the new path where it's being saved.
If a user were at the controls, would it not be better to show a warning saying "This path has some invalid characters"? Then the user could fix it and still know where their file ends up.
Just a thought.
If it's not broken, fix it until it is
|
|
|
|
|
I'd agree - and one way to do it would be to remove the illegal characters, then log the "old" and "new" values if they don't match so they can be tied up manually if necessary. (That's what my code is doing, anyway)
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
It can be written more rationally
|
|
|
|
|
Please, do enlighten me - I'd be very interested to hear your thoughts on how to do that.
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
If you were processing a lot of strings then there might (will) be a better solution to use rather than Regex that will give better performance. In the case of cleaning a file name of bad characters Regex works fine.
For a Windows Form application the better solution for file and path names would be to use the SaveFileDialog control that does all path and file validation for you and will not allow the user to input invalid names or navigate to a directory they do not have permission for.
Ran tests a few times to get some timings and setting the options to Compiled is slower but if you blink you would not notice. The average using your example:
With 00:00:00.0007916
Without 00:00:00.0000159
The advantage? Using the Path you get ALL the non-valid characters, not just the ones you can think of.
|
|
|
|
|
... quicker than looping Path.GetInvalidFileNameChars() and replacing.
|
|
|
|
|
RemoveAll makes sense as String Extension method too
|
|
|
|
|
OriginalGriff wrote: ... not to mention wasteful, since it creates a new string for each character you try to remove.
You seem to imply that the regex would give you better performance, but wouldn't the regex engine create and discard even more strings while evaluating and replacing?
What is this talk of release? I do not release software. My software escapes leaving a bloody trail of designers and quality assurance people in its wake.
|
|
|
|
|
I don't thinks so - Regex doesn't seem work like that: the NFA engine doesn't appear to "collect" substrings.
http://msdn.microsoft.com/en-us/library/e347654k(v=vs.110).aspx[^]
Instead, it looks like this might be a poor idea for performance anyway: http://msdn.microsoft.com/en-us/library/gg578045(v=vs.110).aspx[^] kinda implies that it might be better to use a fixed constant string rather than the concatenated version using the char array that Path.GetInvalidFileNameChars returns.
Hadn't thought about that - might have to do some performance timing on this one, but that's gonna be complicated...maybe at the weekend...
Thanks for the idea - this needs a bit of thinking!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
My regex knowledge is woefully lacking, so I'll take your word for that until I have time to read the pages you linked.
OriginalGriff wrote: might have to do some performance timing on this one I did a very basic performance comparison (just running a few hundred thousand strings through each method and timing it) and the regex took about double the time of the chained Replace calls.
What is this talk of release? I do not release software. My software escapes leaving a bloody trail of designers and quality assurance people in its wake.
|
|
|
|
|
|
Excellent Trick.. Liked it 
|
|
|
|
|
OG,
Very nicely written with all the : smiley's, great content, and enlightening use of REGEX . I have bookmarked this one. I frequently got this error while unit testing.
My Thanks,
Rahul
|
|
|
|
|
|
OG, I recommend replacing:
Path.GetInvalidFileNameChars()
with:
new string (Path.GetInvalidFileNameChars()) + new string (Path.GetInvalidPathChars())
This would increase the utility of the tip.
Thanks,
/ravi
|
|
|
|
|
For Windows apps (which nearly all C# app are) GetInvalidPathNameChars returns a subset of GetInvalidFileChars:
char[] fnOnly = Path.GetInvalidFileNameChars();
char[] pnOnly = Path.GetInvalidPathChars();
var diff1 = fnOnly.Except(pnOnly);
var diff2 = pnOnly.Except(fnOnly);
Console.WriteLine("{0} : {1}", diff1.Count(), diff2.Count());
Prints "5 : 0" on my system (Win 7)
So it doesn't add anything except visual complication and a sense of correctness
And concatenating strings is baaaaaad anyway!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
Ah. Agreed on both counts!
/ravi
|
|
|
|
|
Instead of a fixed string why not use the character arrays defined in System.IO.Path to get the complete set of invalid characters:
Path.GetInvalidFileNameChars
Don't forget to Regex.Escape them when you create the expression as definitely \ is in the character list.
|
|
|
|
|
Because I didn't know it existed...
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
I've added your version - but it's a bit less readable!
Technically much, much better though.
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
|
You're welcome!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|