Hi ,
Does someknow how does new Bulkcopy work in ADO.NET 2.0
What happens underneath because of which the performance gets better with BulkCopy
Regards,
Anjana
Hi ,
Does someknow how does new Bulkcopy work in ADO.NET 2.0
What happens underneath because of which the performance gets better with BulkCopy
Regards,
Anjana
What happens underneath the BulkCopy Object of ADO.NET 2.0
Abhishek Chadha
There are a number of reasons why bulk-copy is faster. Here is a summary:
Now, as to "how" that happens, it's hard to describe without going down to the details of the SQL Server client-server protocol. The short story is that we setup a BCP operation by sending a special statement to the server that includes metadata about the row-set we're about to set; that switches the session to bulk-copy mode, at which point the client and start sending the row stream to the server one after the other; as rows come the server does minimal processing in the upper layers, rows go straight to the storage layer. Once the client is done it sends a "done" marker and the operation is completed in the server. This operation can happen in "batches" of rows or all at once.
Hope this helps.
Pablo Castro
Program Manager - ADO.NET Team
Microsoft Corporation
mig16
Thanks Pablo..
That was very helpful.
Regards,
Anjana
M.Glenn
Hi,
I'm using SqlBulkCopy class to copy the records from One table to another table. I want to count the no. of records copied during the bulkcopy. I tried with using SqlRowsCopied event, but it seems like the event will get trigger only based on NotifyAfter property i.e by setting a value for NotifyAfter.
I dont want this to happen using NotifyAfter property, instead the event should get fired after all the records copied to the destination table.
please share your ideas on this...
Regards,
Prabhu
DVAz
Yeah, there should be another event that you can trap the invalid row, similar to the SqlRowsCopied event except you can access the invalid row and possibly try to correct it. It should have the SqlException or whatever as an inner exception, and also be able to specify whether to retry, skip, or abort (or maybe even try an update instead of an insert for that row) via the EventArgs similar to SqlRowsCopiedEventArgs.Abort.
Possibly a wrapper class (since you can't inherit from it) could trap the exception, set batchsize and notifyafter = 1(brutal but hopefully effective), add an SqlRowsCopied event handler which will increment a row count and then retry, and when the exception occurs it could trigger this type of event, using the row count to extract the offending row from the original table, and on return after taking the specified action reset batchsize, notifyafter to their original values and remove the event handler. Problem is in this scenario, when an exception occurs on the current row, how do you know whether the SqlRowsCopied event was triggered for that row It appears that if the exception is SqlException then the event was triggered (which makes sense since that event comes from the server) and for other types of exception apparently not. This also assumes that in the original scenario (batchsize != 1) the entire batch was rolled back. This appears to be the case regardless of whether or not the UseInnerTransaction option is set. The documentation is very confusing on this, the examples contradict it and sometimes refer to a UseExternalTransaction option. As far as I can see setting the UseInnerTransaction option has no effect and the batch will always be rolled back.
LouArnold
thanks,
Rob
raghu_grdr
Thanks Pablo,
I have few more doubts.
What i undestand from your reply is..
1.Client sends a statement to the server indicating to switch to bulk copy mode
2. then probably server changes the mode and sends a response that the mode is changed
3. then client sends the data stream
4. server receives and does minimum processing and sends the data to the database
5. once completed client sends a 'done' message to the server
6.server switches back to normal mode.
Correct me if my understanding is wrong somewhere..
What i would like to know is what does it mean by " server does minimal processing in the upper layers, rows go straight to the storage layer". What happens otherwise
Thanks in advance
Regards,
Anjana
Ron Liu
Your high-level description of the sequence of events is correct.
As for details on "server does minimal processing in the upper layers", here is some of the steps at a high-level that the server goes through for each batch/rpc request that's executed (I'm writing this off the top of my head so I'll probably miss something, but you'll get the idea):
This process happens for each INSERT statement (which is the operation most closely comparable to bulk-insert) that you execute (if you put various INSERTs in a single batch them you save some cycles here because some of these are per-batch, or at least part of the cost is per batch).
With bulk-insert, a lot of the query processing pipeline is bypassed once the session is receiving rows.
In addition to this, when recovery model is set to simple or bulk-logged, even the work within the storage engine is simplified; extent logging is used instead of regular logging. I believe there is some details about this in SQL Server books online.
Pablo Castro
Program Manager - ADO.NET Team
Microsoft Corporation
Thomas Petersen
hi,
Those were grt suggetions, but can i hav some more clarification, do i hav a provision to put validation for the data to be entered thrrough bulk copy, as its the real useful feture for me but i how can i ensure some validation(say.., i want value in col2=null if(col1==null) how can i ensure this)
i cheked it by specying this in js/vb script file opeing during bulk copy wizard, but with no luck
one more trial i made is ensured a triggerd to be fired on each new insert after, but its almost doubling the time.
any suggetions frm ur side will be greatly appretiated.
Balbiesas
I'm not sure which wizard you're mentioning, there is no wizard for ADO.NET SqlBulkCopy (note there ARE wizards for bulk-loading data into the server, they come with SQL Server; but those won't generate stand-alone C#/VB programs).
If you want to validate the data sent to the server through SqlBulkCopy, you have two options:
- Validate before sending it. Either by preprosessing the data before creating the input to BCP, or by intercepting the input. One way of doing this in a streaming fashion is to create your own DataReader class that consumes data from a real DataReader and simply exposes it again, but injects validations in-between. Creating a DataReader like this is relatively straightforward, but a bit time consuming because it has a lot of methods.
- Validate after sending it. Either add a trigger to the table your copying to, or execute a SELECT statement to try to detect invalid rows.
In all cases, validation will certainly slow-down the process. As you observed, triggers will have a measurable impact on the copy time. It's a trade-off between speed and functionality.
Pablo Castro
Program Manager - ADO.NET Team
Microsoft Corporation
Zindros
Problem: Can I specify whether it should ignore duplicates It doesnt appear so.
Assuming I have a unique constraint on a field, if one of my 1000 records has a match, the entire insert fails. Is there an option or switch somewhere to force it to ignore this Meaning, just dont insert that dup record, BUT continue inserting the other 999 valid records.
Insight here would be appreciated.