simple text processing e.g. regex search and replace

I've got an nvarchar(max) column that I need to transform with some simple text processing: insert some markup at the very beginning, and insert some markup just before a particular regular expression is matched (or at the end, if no match is found).

Since the SSIS expression language doesn't support anything like this, is a Script Component the only way to go Does Visual Basic .NET provide regular expression matching

Thanks!




Answer this question

simple text processing e.g. regex search and replace

  • LamptonWorm

    Duane Douglas wrote:
    Kevin Rodgers wrote:

    Duane Douglas wrote:

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

    Unfortunately that component can only process DT_STR and DT_WSTR columns. Here's the actual error message:

    Component operates only on text input. Other types are forbidden.

    I can't convert my DT_NTEXT column to DT_WSTR for input to the Regex component because SSIS limits DT_WSTR columns to 4000 bytes (only 2000 characters, which is too lame for words). Is anyone here smart enough to enhance Regex.cs to handle DT_NTEXT columns, or am I at Microsoft's mercy

    did you try converting to DT_TEXT

    How would that help me DT_TEXT is not supported by the Regex component either, and is also limited to 4000 bytes (which in this case is 4000 characters).



  • Manoj Kumbhar

    Kevin, SQLIS.com has a great task to do this in a can here:http://www.sqlis.com/default.aspx 91

    I have a blog post here about how to do it through a script transform: http://www.whiteknighttechnology.com/cs/blogs/brian_knight/archive/2006/01/12/81.aspx

    If you find yourself needing to do this more than a few times, go with the transform that SQLIS has.

    -- Brian Knight



  • Anand Raman - MSFT

    Duane Douglas wrote:
    Kevin Rodgers wrote:
    Duane Douglas wrote:
    Kevin Rodgers wrote:

    Duane Douglas wrote:

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

    Unfortunately that component can only process DT_STR and DT_WSTR columns. Here's the actual error message:

    Component operates only on text input. Other types are forbidden.

    I can't convert my DT_NTEXT column to DT_WSTR for input to the Regex component because SSIS limits DT_WSTR columns to 4000 bytes (only 2000 characters, which is too lame for words). Is anyone here smart enough to enhance Regex.cs to handle DT_NTEXT columns, or am I at Microsoft's mercy

    did you try converting to DT_TEXT

    How would that help me DT_TEXT is not supported by the Regex component either, and is also limited to 4000 bytes (which in this case is 4000 characters).

    actually, DT_TEXT is 2 GB, not 4000 bytes: http://msdn2.microsoft.com/en-us/library/ms141036.aspx

    Indeed, sorry for throwing that red herring in there.

    you can edit the component source code to support DT_TEXT

    If I could, I would. That's why I'm asking the experts here to do it (and submit the enhancement to Microsoft).



  • bslim

    Duane Douglas wrote:

    microsoft has a formal process for submitting suggestions. suggestions for sql server can be submitted here: http://connect.microsoft.com/feedback/default.aspx SiteID=68

    i would gladly vote for an appropriate suggestion for this issue.

    It took Microsoft only 4 hours to decide they won't do anything:

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx FeedbackID=184284



  • Phil026

    Kevin Rodgers wrote:
    Duane Douglas wrote:
    Kevin Rodgers wrote:

    Duane Douglas wrote:

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

    Unfortunately that component can only process DT_STR and DT_WSTR columns. Here's the actual error message:

    Component operates only on text input. Other types are forbidden.

    I can't convert my DT_NTEXT column to DT_WSTR for input to the Regex component because SSIS limits DT_WSTR columns to 4000 bytes (only 2000 characters, which is too lame for words). Is anyone here smart enough to enhance Regex.cs to handle DT_NTEXT columns, or am I at Microsoft's mercy

    did you try converting to DT_TEXT

    How would that help me DT_TEXT is not supported by the Regex component either, and is also limited to 4000 bytes (which in this case is 4000 characters).

    actually, DT_TEXT is 2 GB, not 4000 bytes: http://msdn2.microsoft.com/en-us/library/ms141036.aspx

    you can edit the component source code to support DT_TEXT



  • ColinZ

    KirkHaselden wrote:

    See SQLIS.com for an article about using REGEX in a custom component.

    http://www.sqlis.com/default.aspx 91

    That page only mentions validation and doesn't say whether it can be used for substring selection or replacement, which is what I need.



  • Spyder_Snyper

    Kevin Rodgers wrote:
    Duane Douglas wrote:

    microsoft has a formal process for submitting suggestions. suggestions for sql server can be submitted here: http://connect.microsoft.com/feedback/default.aspx SiteID=68

    i would gladly vote for an appropriate suggestion for this issue.

    It took Microsoft only 4 hours to decide they won't do anything:

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx FeedbackID=184284

    ok. i suggest that you post this link in a new thread so that it's called to the attention to the microsoft ssis team. perhaps they will be of some assistance.

    good luck.



  • ClemensT

    Duane Douglas wrote:

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

    Unfortunately that component can only process DT_STR and DT_WSTR columns. Here's the actual error message:

    Component operates only on text input. Other types are forbidden.

    I can't convert my DT_NTEXT column to DT_WSTR for input to the Regex component because SSIS limits DT_WSTR columns to 4000 bytes (only 2000 characters, which is too lame for words). Is anyone here smart enough to enhance Regex.cs to handle DT_NTEXT columns, or am I at Microsoft's mercy



  • David Cowell

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

  • Adil Hussain

    Kevin Rodgers wrote:

    Duane Douglas wrote:

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

    Unfortunately that component can only process DT_STR and DT_WSTR columns. Here's the actual error message:

    Component operates only on text input. Other types are forbidden.

    I can't convert my DT_NTEXT column to DT_WSTR for input to the Regex component because SSIS limits DT_WSTR columns to 4000 bytes (only 2000 characters, which is too lame for words). Is anyone here smart enough to enhance Regex.cs to handle DT_NTEXT columns, or am I at Microsoft's mercy

    did you try converting to DT_TEXT

  • Taliesin*

    Kevin Rodgers wrote:
    Duane Douglas wrote:
    Kevin Rodgers wrote:
    Duane Douglas wrote:
    Kevin Rodgers wrote:

    Duane Douglas wrote:

    there is no need to use a script component or create a custom component because microsoft already provides a regex component for ssis: http://www.microsoft.com/downloads/details.aspx familyid=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en

    Unfortunately that component can only process DT_STR and DT_WSTR columns. Here's the actual error message:

    Component operates only on text input. Other types are forbidden.

    I can't convert my DT_NTEXT column to DT_WSTR for input to the Regex component because SSIS limits DT_WSTR columns to 4000 bytes (only 2000 characters, which is too lame for words). Is anyone here smart enough to enhance Regex.cs to handle DT_NTEXT columns, or am I at Microsoft's mercy

    did you try converting to DT_TEXT

    How would that help me DT_TEXT is not supported by the Regex component either, and is also limited to 4000 bytes (which in this case is 4000 characters).

    actually, DT_TEXT is 2 GB, not 4000 bytes: http://msdn2.microsoft.com/en-us/library/ms141036.aspx

    Indeed, sorry for throwing that red herring in there.

    you can edit the component source code to support DT_TEXT

    If I could, I would. That's why I'm asking the experts here to do it (and submit the enhancement to Microsoft).

    microsoft has a formal process for submitting suggestions. suggestions for sql server can be submitted here: http://connect.microsoft.com/feedback/default.aspx SiteID=68

    i would gladly vote for an appropriate suggestion for this issue.



  • Benorama

    See SQLIS.com for an article about using REGEX in a custom component.

    http://www.sqlis.com/default.aspx 91

    Kirk Haselden
    Author "SQL Server Integration Services"



  • simple text processing e.g. regex search and replace