Hi all,
Here is a tricky question for regular expression. I want to match some words, say "red" in text body only, but NOT in tags; e.g., "<font color="red">this is red, test red 123</font>" I only want to grep "red" from "this is red, test red", but NOT in "<font color="red">". Nor can I remove tags before or after the match.
Thanks,
Ning

how to NOT match text in tag
Kevinmac
Bill Brennan
Daniel,
two more points:
1. your pattern
Regex regex = new Regex(@"( <=\>[^\<]+)red");
will fail to match on red in
<font color=""red"">red, test red 123</font>
because u used "+"-quantifier instead of "*"
2. u don't have to escape ">" and "<" when the chars are in the character class
Regex regex = new Regex(@"( <=\>[^<]+)red");
ANS-Denver
Thanks guys,
I have this sorted out myself
"red( :[^>]*(<|$))"
Thanks,
Ning
Daniel Karanov
no, Daniel, u cannot [always] use the positive look-ahead to find those entries. For example, your logic will fail to match on the first occurrence of RED in the following input:
RED <font color="red">this is red, test red 123</font>
for tasks like this u need to use a negative look-ahead OR look-behind, as I suggested earlier. It is more robust, especially when u r dealing with random chunks of html code.
Marcin_Zawadzki
Sergei,
you're right, it must be an '*', not a '+'. But I still think that the positive look-ahead is a good choice. HTML will always begin with '< xml', '<!DOCTYPE' or '<html', so you'll always have a tag at the beginning that ends with '>'.
--
Regards,
Daniel Kuppitz
Mike Batton
try
red( ![^<>]*>)
with SingleLine Option ON
srfitz2000
using System; using System.Text.RegularExpressions;
namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string test = @"<font color=""red"">this is red, test red 123</font>";
Regex regex = new Regex(@"( <=\>[^\<]+)red"); MatchCollection matches = regex.Matches(test);
foreach (Match m in matches) { Console.Write("Position/Index {0}: ", m.Index); Console.WriteLine(test.Substring(m.Index, m.Length)); } } } }
This will even work with line breaks in text and/or tags.
--
Regards,
Daniel Kuppitz