What happens underneath the BulkCopy Object of ADO.NET 2.0

Hi ,

Does someknow how does new Bulkcopy work in ADO.NET 2.0

What happens underneath because of which the performance gets better with BulkCopy

Regards,

Anjana




Answer this question

What happens underneath the BulkCopy Object of ADO.NET 2.0

  • Abhishek Chadha

    There are a number of reasons why bulk-copy is faster. Here is a summary:

    • No per-row statement execution. When you do multiple inserts without bulk-copy, each insert is a statement in itself (regardless of whether it's batched together with other statements). With bulk-copy, we don't incur the cost of executing a statement for each row, the whole copy operation is a single thing.
    • No multiple network round-trips. Once the bulk-insert operation is setup, we send rows from the client to the server continously, without going back-and-forth over the wire.
    • Server storage engine also can greatly optimize how rows are inserted when performing a bulk-copy operation. How much can be optimized depends a lot on the recovery model the tarder database is set to; in "simple" and "bulk logged" the overhead of logging is greatly reduced during bulk-copy operations, helping a lot with performance.

    Now, as to "how" that happens, it's hard to describe without going down to the details of the SQL Server client-server protocol. The short story is that we setup a BCP operation by sending a special statement to the server that includes metadata about the row-set we're about to set; that switches the session to bulk-copy mode, at which point the client and start sending the row stream to the server one after the other; as rows come the server does minimal processing in the upper layers, rows go straight to the storage layer. Once the client is done it sends a "done" marker and the operation is completed in the server. This operation can happen in "batches" of rows or all at once.

    Hope this helps.

    Pablo Castro
    Program Manager - ADO.NET Team
    Microsoft Corporation



  • mig16

    Thanks Pablo..

    That was very helpful.

    Regards,

    Anjana



  • M.Glenn

    Hi,

    I'm using SqlBulkCopy class to copy the records from One table to another table. I want to count the no. of records copied during the bulkcopy. I tried with using SqlRowsCopied event, but it seems like the event will get trigger only based on NotifyAfter property i.e by setting a value for NotifyAfter.

    I dont want this to happen using NotifyAfter property, instead the event should get fired after all the records copied to the destination table.

    please share your ideas on this...

    Regards,
    Prabhu


  • DVAz

    Yeah, there should be another event that you can trap the invalid row, similar to the SqlRowsCopied event except you can access the invalid row and possibly try to correct it. It should have the SqlException or whatever as an inner exception, and also be able to specify whether to retry, skip, or abort (or maybe even try an update instead of an insert for that row) via the EventArgs similar to SqlRowsCopiedEventArgs.Abort.

    Possibly a wrapper class (since you can't inherit from it) could trap the exception, set batchsize and notifyafter = 1(brutal but hopefully effective), add an SqlRowsCopied event handler which will increment a row count and then retry, and when the exception occurs it could trigger this type of event, using the row count to extract the offending row from the original table, and on return after taking the specified action reset batchsize, notifyafter to their original values and remove the event handler. Problem is in this scenario, when an exception occurs on the current row, how do you know whether the SqlRowsCopied event was triggered for that row It appears that if the exception is SqlException then the event was triggered (which makes sense since that event comes from the server) and for other types of exception apparently not. This also assumes that in the original scenario (batchsize != 1) the entire batch was rolled back. This appears to be the case regardless of whether or not the UseInnerTransaction option is set. The documentation is very confusing on this, the examples contradict it and sometimes refer to a UseExternalTransaction option. As far as I can see setting the UseInnerTransaction option has no effect and the batch will always be rolled back.



  • LouArnold

    Wait just a second... SqlBulkCopy does not fire triggers... how do we "validate it after sending it" using triggers In the sql 2000 bulk copy or bcp operation, a hint FIRE_TRIGGERS is an option if you rely on table triggers. However, this becomes a major problem when ADO.Net 2.0 does not support FIRE_TRIGGERS as an option. How do we get around this issue

    thanks,

    Rob

  • raghu_grdr

    Thanks Pablo,

    I have few more doubts.

    What i undestand from your reply is..

    1.Client sends a statement to the server indicating to switch to bulk copy mode

    2. then probably server changes the mode and sends a response that the mode is changed

    3. then client sends the data stream

    4. server receives and does minimum processing and sends the data to the database

    5. once completed client sends a 'done' message to the server

    6.server switches back to normal mode.

    Correct me if my understanding is wrong somewhere..

    What i would like to know is what does it mean by " server does minimal processing in the upper layers, rows go straight to the storage layer". What happens otherwise

    Thanks in advance

    Regards,

    Anjana

     



  • Ron Liu

    Your high-level description of the sequence of events is correct.

    As for details on "server does minimal processing in the upper layers", here is some of the steps at a high-level that the server goes through for each batch/rpc request that's executed (I'm writing this off the top of my head so I'll probably miss something, but you'll get the idea):

    1. After the request is read parsed out from the network by an I/O thread it's turned into an internal representation and queued for execution
    2. One of the server worker tasks will pick it up
    3. Ad-hoc SQL requests need to be parsed; for RPCs the stored-proc name needs to be looked up. In either case, the request (along with some session-level information) is used to perform a look up in the cache
    4. If there is no compatible entry in the cache, then the query needs to go through various translation and optimization phases and the physical plan is created as a result
    5. Once the physical plan is there (from the cache or freshly built), the query execution infrastructure executes the query; the query execution code will interact with the storage engine as needed for reads (data, indexes, etc.) or for writes (for DML statements to affect data pages, index pages, the log, etc.)

    This process happens for each INSERT statement (which is the operation most closely comparable to bulk-insert) that you execute (if you put various INSERTs in a single batch them you save some cycles here because some of these are per-batch, or at least part of the cost is per batch).

    With bulk-insert, a lot of the query processing pipeline is bypassed once the session is receiving rows.

    In addition to this, when recovery model is set to simple or bulk-logged, even the work within the storage engine is simplified; extent logging is used instead of regular logging. I believe there is some details about this in SQL Server books online.

    Pablo Castro
    Program Manager - ADO.NET Team
    Microsoft Corporation



  • Thomas Petersen

    hi,

    Those were grt suggetions, but can i hav some more clarification, do i hav a provision to put validation for the data to be entered thrrough bulk copy, as its the real useful feture for me but i how can i ensure some validation(say.., i want value in col2=null if(col1==null) how can i ensure this)

    i cheked it by specying this in js/vb script file opeing during bulk copy wizard, but with no luck

    one more trial i made is ensured a triggerd to be fired on each new insert after, but its almost doubling the time.

    any suggetions frm ur side will be greatly appretiated.


  • Balbiesas

    I'm not sure which wizard you're mentioning, there is no wizard for ADO.NET SqlBulkCopy (note there ARE wizards for bulk-loading data into the server, they come with SQL Server; but those won't generate stand-alone C#/VB programs).

    If you want to validate the data sent to the server through SqlBulkCopy, you have two options:

    - Validate before sending it. Either by preprosessing the data before creating the input to BCP, or by intercepting the input. One way of doing this in a streaming fashion is to create your own DataReader class that consumes data from a real DataReader and simply exposes it again, but injects validations in-between. Creating a DataReader like this is relatively straightforward, but a bit time consuming because it has a lot of methods.

    - Validate after sending it. Either add a trigger to the table your copying to, or execute a SELECT statement to try to detect invalid rows.

    In all cases, validation will certainly slow-down the process. As you observed, triggers will have a measurable impact on the copy time. It's a trade-off between speed and functionality.

    Pablo Castro
    Program Manager - ADO.NET Team
    Microsoft Corporation



  • Zindros

    ok, so in a matter of speaking, if I insert 1000 rows using SqlBulkCopy, it will be one statement rather than 1000 statements (per row).

    Problem: Can I specify whether it should ignore duplicates It doesnt appear so.

    Assuming I have a unique constraint on a field, if one of my 1000 records has a match, the entire insert fails. Is there an option or switch somewhere to force it to ignore this Meaning, just dont insert that dup record, BUT continue inserting the other 999 valid records.

    Insight here would be appreciated.

  • What happens underneath the BulkCopy Object of ADO.NET 2.0