Click here to Skip to main content
15,902,276 members
Articles / Programming Languages / C#
Tip/Trick

Finding Items in a Collection that Match one of...

Rate me:
Please Sign up or sign in to vote.
4.85/5 (6 votes)
19 Oct 2016CPOL1 min read 18.1K   4   8
Use LINQ to get the members of a collection that match the members of another collection

Introduction

Recently, I have answered a couple of Quick Answers where the user wanted a list of items that match a condition from another collection (similar to SQL's IN operator. The trick to solving this is reversing the order of the operands in the Where Linq extension from the norm.

Using the Code

The normal structue of a Where predicate follows the pattern:

C#
myCollection.Where(item=>item.Property matches value)

where Property is the property you want to match to a particular value using whatever operator or method is appropriate to return the required result. However, what if value is a collection of values? The trick to extracting your items is basically to reverse the conditional statement, and, depending on the way the match is made, optionally apply an Any, All, or Contains method to the values collection. So the basic pattern becomes:

C#
myCollection.Where(item=>values.Any(value=>value matches item.Property));

Some Examples

Problem

Get all items from List1 that are one of the items in List2. Comparison should be case-insensitive. This is basically the same as a standard SQL IN clause:

SQL
SELECT Field FROM TABLE WHERE Field IN (value1, value2,....valueN)

Solution

C#
List<string> values = {value1, value2...valueN};
var results = table
    .Select(rec=>rec.field)
    .Where(field=>values.Contains(field, StringComparer.CurrentCultureIgnoreCase));

Problem

Get all items where the inventory_code is prefixed with one of a set of standard prefixes(strings).

Solution

C#
string[] prefixes = {"N01", "N02", "M01", "M02" };
var results = inventoryList
    .Where(item=>prefixes.Any(prefix=>item.inventory_code.StartWith(prefix)));

Problem

Get paragraphs that contain all of the pre-defined words.

Solution

This solution performs the trick twice. Upon separating out each paragraph, the Where is issued with the SearchWords array as the LHS of the condition. It then issues the All condition for each word in SearchWords with the words in the paragraph as the LHS of the condition.

C#
string[] searchwords = {....};
string[] paragraphs = {
    "The quick brown fox jumps over the lazy dog.",
    "The quick brown fox jumps over the lazy cat.\nNotice the substitution of 'cat' for 'dog'.",
    "The quick brown fox jumps over the lazy cat." };
string[] searchWords = { "FOX", "DOG" };
char[] wordDelimiters = { ' ', ',', '.', ';', ';','\n', '"', '\'' };
var results = paragraphs.Where(paragraph => searchWords
  .All(word => paragraph.Split(wordDelimiters, StringSplitOptions.RemoveEmptyEntries)
  .Contains(word, StringComparer.CurrentCultureIgnoreCase)));

As evidenced by these examples, the concept of using a pre-defined collection as the LHS operand in a Where albeit seeming a little unnatural, can be a useful tool.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Australia Australia
Been programming for 40 years now, starting when I was 13 on DEC PDP 11 (back in the day of paper tape storage, and hex switch boot procedures). Got right into micro-computers from an early age, with machines like the Dick Smith Sorcerer and the CompuColor II. Started CP/M and MS-DOS programming in the mid 1980's. By the end of the '80's, I was just starting to get a good grip on OOP (Had Zortech C++ V1.0).

Got into ATL and COM programming early 2002. As a result, my gutter vocabulary has expanded, but it certainly keeps me off the streets.

Recently, I have had to stop working full time as a programmer due to permanent brain damage as a result of a tumour (I just can't keep up the pace required to meet KPI's). I still like to keep my hand in it, though, and will probably post more articles here as I discover various tricky things.

Comments and Discussions

 
QuestionNot most optimal Pin
irneb21-Oct-16 0:15
irneb21-Oct-16 0:15 
GeneralRe: Not most optimal Pin
Midi_Mick21-Oct-16 1:25
professionalMidi_Mick21-Oct-16 1:25 
GeneralRe: Not most optimal Pin
irneb21-Oct-16 2:44
irneb21-Oct-16 2:44 
I understand ... the article is more intended around Linq. And for that it is definitely very good. I just mentioned this because I've seen so many go and use such Linq statements and then find they run extremely slow. They then implement an imperative method using such more conducive data structures and finding it runs orders of magnitude faster. Then using that to blame Linq for the slowness, to the point where they then stipulate that Linq "should never be used". Even to the point where they then state that functional (i.e. what Linq is attempting to bring into C#) is the culprit.

You're correct. If the search list is small, you'd definitely not see much (if any) speed improvements, it may even cause degradations instead. Though I think the cutoff is a bit variable, could be anywhere between 5 and 500. You'd need to profile to make sure where such happens.

For the word searches in paragraphs it may be conducive to generate a HashSet of the word list for each paragraph - i.e. just clone the standard word list. Then run through the paragraph, removing each word from the set. At the end, check if the temporary set count == 0. Otherwise you could do it the other way, split the paragraph and generate a hashset from that, then use the all on the wordlist to test if all of them are inside that set. My guess is the first would perform better than the second - if the clone operation is simply a memcopy idea. But such on-the-fly set generation is definitely extra overhead. This may cause the cutoffs to be much larger before seeing any improvements.

For the variable length prefix searches I don't think a normal SortedSet would suffice. Same problem as with a HashSet. My idea would be to create a custom tree structure (N-ary Tree) where each node is either a full string (at each leaf) or a partial string with a hashset of child nodes. This way you'd get something looking like this:
-N
   -N0
      -N01
      -N02
   -N1
      -N11
      -N12
-M
   -M0
      -M01
      -M02
That way the prefixes can be any length. Your search would start with the first character in the item of the filter list. Check if that char is in the root's children set. If not, there's no suffix matching this item. If so, check if the 2nd char is in that node's children set, rinse and repeat until you reach a node which has no more children. Only then did you find a full prefix.

Strictly this would be O(N * K) where K is the length of the longest prefix. At least if you use a HashSet or HashTable for each children list of each node. If you use a sorted set, then it's a bit more complicated, probably something like an O(N * K * log(K)).
GeneralRe: Not most optimal Pin
Midi_Mick21-Oct-16 3:22
professionalMidi_Mick21-Oct-16 3:22 
GeneralRe: Not most optimal Pin
irneb21-Oct-16 4:10
irneb21-Oct-16 4:10 
PraiseRe: Thank you Pin
Roy Ben Shabat20-Oct-16 23:22
professionalRoy Ben Shabat20-Oct-16 23:22 
GeneralRe: Thank you Pin
Midi_Mick21-Oct-16 0:13
professionalMidi_Mick21-Oct-16 0:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.