Another option, similar to Griff's, would be to process line by line without loading the whole file into memory
to get an
for the input
pass that through the
which gives another
for the output.
Then you can use
to make an output file, or use a
loop to write all the lines to the
So now the exercise is to write a small class that implements
. This can split the string into the parts and use whatever a priori
information you may have about them to check if they match (and ensure matching inputs have the same HashCode).
There a couple of other optimizations I can think of, but I'll leave those as "exercises".
* The .Distinct() does internally build a representation that collects one entry for each unique string, but this is (potentially) much smaller than the whole file, and definitely smaller than both the whole file collection and the .Distinct() internal representation.