Click here to Skip to main content
15,436,928 members
Please Sign up or sign in to vote.
0.00/5 (No votes)

I have a list of keywords (sometimes with non-alphanumeric characters) that I’d like to find in a list of files. I can do that with the code below, but I want to avoid matching keywords if they are found inside another word, e.g.:

Keywords.csv:
Keywords
Lo.rem <-- Match if not prefixed by nor suffixed with a letter
is <-- Same
simply) <-- Match if not prefixed by a letter
printing. <-- Same
(text <-- Match if not suffixed with a letter
-and <-- Same

Files.csv:
Files
C:\AFolder\aFile.txt
C:\AFolder\AnotherFolder\anotherFile.txt
C:\AFolder\anotherFile2.txt

What I have tried:

Here's my code so far if useful:

PowerShell
$keywords = (((Import-Csv "C:\Keywords.csv" | Where Keywords).Keywords)-replace '[[+*?()\\.]','\$&') #Import list of keywords to search for
$paths = ((Import-Csv "C:\Files.csv" | Where Files).Files) #Import list of files to look for matching keywords
$count = 0

ForEach ($path in $paths) {
$file = [System.IO.FileInfo]$path
Add-Content -Path "C:\Matches\$($count)__$($file.BaseName)_Matches.txt" -Value $file.FullName #Create a file in C:\Matches and insert the path of the file being searched

$hash = @{}
Get-Content $file |
  Select-String -Pattern $keywords -AllMatches |
  Foreach {$_.Matches.Value} | 
%{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} | #I don't remember what this does, probably fixes error messages I was getting
Out-File -FilePath "C:\Matches\$($count)__$($file.BaseName)_Matches.txt" -Append -Encoding UTF8 #Appends keywords that were found to the file created
$count = $count +1
}


I’ve tried playing with regex negative lookahead/lookbehind but did not get anywhere, especially since I’m a beginner in PowerShell, e.g.:

PowerShell
Select-String -Pattern "(?<![A-Za-z])$($keywords)(?![A-Za-z])" -AllMatches


Any suggestions? Much appreciated
Posted
Updated 12-Jul-22 4:40am
Comments
Peter_in_2780 11-Jul-22 18:47pm     CRLF
I don't know about PowerShell's regex engine, but all the others I know have atoms that match the edges of words. For example, "man grep" includes the following: The Backslash Character and Special Expressions The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it's not at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]].
Soskipic 12-Jul-22 10:46am     CRLF
@Peter_in_2780 Thanks a lot for your suggestion. I've received an answer on Stack Overflow which works fine. Hope someone else in a similar situation may make use of your suggestion.
Soskipic 12-Jul-22 15:20pm     CRLF
I've actually ended up combining both your suggestion and Stack Overflow's to come up with the exact desired outcome. Select-String -Pattern "\b($($keywords -join '|'))\b"-AllMatches

1 solution

Select-String -Pattern "\b($($keywords -join '|'))\b"-AllMatches
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900