Hi,
In order to be scalable we have two servers running our workflows. Any server should be able to take over at any time. The workflows running on the servers make asynchronous requests via msmq to other services. The replies should be able to come back to any one of the servers and still be processed.
What happens is that when the reply comes back to a server which did not make the original request (and then wait on an event), the server cannot fire the event on the persisted workflow, because it's still "locked" for use by the workflow. I am currently configuring the workflow instance ownership to be 2 seconds, and it takes longer than that before the response comes back to the server.
I'm using the released version of netfx3, and I've tried using the constructor that takes timespans and the constructor that takes a NameValueCollection.

OwnershipTimeoutSeconds does not work
N5GE
Saurabh G
jkeele
Wow, THE Jon Flanders responding to my post.... Yeah, it's definitely persisting, it takes quite a while to get a response back from the services we call. We use an external data exchange service to call another service facade using msmq (using WCF - not that it makes a difference). Then we immediately wait for the event using the wait for external event shape. At this point it persists. We actually see it in the tracking db before a response needs to be handled.
When the message comes back into the other server (we configured our machines to force the scenario - with the request-reply messaging pattern) it throws the well known exception that it can't trigger the event on the workflow (can't get the exact message now, I'm at home). In the persistence db the wf instance is still marked as locked. The reason this is a concern is because I do not want to bind a particular address to handle the response. Any host should be able to get it and unpersist the workflow to fire the event on it, so that we can scale out as needed.
This was picked up in stress testing. We need to go to production pretty soon, we've been working with WF for the whole year and know it in and out by now. According to the parameters the constructor takes, any wf runtime should be able to fire the event on the flow. It's a bit weird, and a huge concern for our team.
Shameless self promotion (The only way i can maybe get Jon F to look at my code):
For interest sake, have a look at this custom ifelse activity we wrote: http://dotnet.org.za/hendrik/archive/2006/07/04/53953.aspx
We did not want our business users to learn code to direct the flow that we expose to them. So we developed a custom ifelse activity which takes activities as conditions (on the branches), and off course these activities can take input criteria themselves.
Thanks!
Hendrik
Asassin
Álvaro Peñarrubia
robinjam
Yeah, the UnloadOnIdle flag is definitely set.
I've tried it with the following constructors:
NameValueCollection parameters = new NameValueCollection();
parameters.Add("ConnectionString", "theconnstring");
parameters.Add("OwnershipTimeoutSeconds", "1");
parameters.Add("UnloadOnIdle", true.ToString());
parameters.Add("LoadIntervalSeconds", "30");
SqlWorkflowPersistenceService persistenceService = new SqlWorkflowPersistenceService(parameters);
and
NameValueCollection parameters = new NameValueCollection();
parameters.Add("ConnectionString", "theconnstring");
parameters.Add("UnloadOnIdle", true.ToString());
parameters.Add("LoadIntervalSeconds", "30");
SqlWorkflowPersistenceService persistenceService = new SqlWorkflowPersistenceService(parameters);
and
SqlWorkflowPersistenceService persistenceService = new SqlWorkflowPersistenceService(connstr, true,TimeSpan.FromMilliseconds(1), TimeSpan.FromSeconds(1));I also subscribe to the wf runtime's WorkflowUnloaded event. This event is definitely firing, but if you go and look in the InstanceState table the instance has the following values:
unlocked = 1
blocked = 1
SQLme
DatabaseOgre
Make your ownership timeout something larger (like 1000 seconds). Try that.
I found this a few weeks ago but haven't had a chance to determine if this is a bug or by design. If Joel or someone else from MS can inform us that would be great - but I found if the ownershiptimeout was set to a very small value it caused a lock when it shouldn't have.