Regex match from hell

For all of the amount of time I've been wrestling with this, I could have coded an (inelegant) solution from scratch, but it's bugging me to distraction and I won't sleep until I figure it out.

I'm doing a conversion of some data pulled from a mainframe into a text file, turning it into an XML document. The rules are:

  1. Each line consists of a number of elements, separated by commas.
  2. Each element can either be a double-quote surrounded string, or nothing.
  3. The last element has no following comma.

So a line of data could look like this:

"Value1","Value2",,,"Value3",,"Value4","Value5"

I thought this would be easily handled with a regular expression:

(".*") ,

It all started to unravel once it occurred to me that the last element could actually have a trailing comma, if the final element is, in fact, nothing. As usual, it was anything but easy. I've gone back and forth with all sort of grouping combinations, but the bottom line is I can't make it work and I think I'm too close to it to see the (elegant) solution. Can somebody help Thanks.



Answer this question

Regex match from hell

  • Abhishek_SE

    You may want to take a look at regexlib.com


  • Kamii47

    That pattern is similar to that of a CSV file. Why not just use an external CSV parser

    http://www.codeproject.com/cs/database/CsvReader.asp

    That's one.

    If you want to write your own, go ahead, but I don't suggest doing that.




  • cunyalen

    Wasn't the point. I've already written a method to parse the files (not that big a deal). I'm just trying to understand regular expressions a little better, because I know my 25-line method could have been handled with one or two lines with a call to Regex...
  • NelG1

    Thanks for the link. Looked at the website. None of the patterns is an exact match, but the following was the closest:

    (("(\\\\|\\"|[^"])* ", )|([^"]* ,))|([^"]+)

    It leaves the trailing commas, but otherwise does what I need. Thanks.


  • Regex match from hell