Please see my comment to the question.
The problem is quite trivial, the only question is how efficient your search can be. First of all, your "in a way that I know which list it came from" means pretty big thing. As your solution will list all encountered words, and each word can be found in both lists or in a subset of lists (in case of just two, in one or another) "the way to know" would mean that each element of solution should contain the following components: 1) a word itself, 2) a set (or any other collection) of references to original lists. The whole solution would be something like
Dictionary<string, HashSet<string>> =
The key of your dictionary would represent a word, and the value type — the set of references to your original lists. I suggested the type
for purely "ideological" reasons: the order of references may not matter; but the reason to choose dictionary is more important: this is the performance of finding already added word you will need to do on each step, which will be of the time complexity
of O(1), in contrast to list's O(N).
Now, as you will need to perform the search many times, the
type is not the best choice. I would populate the data in
, which provides the time complexity
for search of O(1):
] (search method).
At the same time, if the lists are small, the overhead of re-population of data may not pay off, so using lists in all cases will be good enough; and you can use LINQ to Objects
You can iterate this collection is a usual way with the usual way,
, because this type implements
So, you would need to iterate one set, checking presence of each word in a second one and adding it in the solution dictionary. After that, you will need to do it with the second set, because some words may never appear in the first one. When you add data to the dictionary, you use the method
If the word is already found, you try to add the source set reference(s) to the value part of the dictionary element, if not, you add the element by the new key (source word).
That's all. Your overall time complexity will be of O(1), which is the fastest possible. However, the absolute time could be better for, say, lists, if the input data set is very small, but combination of set and dictionary will very quickly outperform any other solution as size of data grows.