OwnershipTimeoutSeconds does not work

Hi,

In order to be scalable we have two servers running our workflows. Any server should be able to take over at any time. The workflows running on the servers make asynchronous requests via msmq to other services. The replies should be able to come back to any one of the servers and still be processed.

What happens is that when the reply comes back to a server which did not make the original request (and then wait on an event), the server cannot fire the event on the persisted workflow, because it's still "locked" for use by the workflow. I am currently configuring the workflow instance ownership to be 2 seconds, and it takes longer than that before the response comes back to the server.

I'm using the released version of netfx3, and I've tried using the constructor that takes timespans and the constructor that takes a NameValueCollection.



Answer this question

OwnershipTimeoutSeconds does not work

  • N5GE

    Are your workflows actually persisting between messages That will depend largely on the design of your activities that listen for messages and whether or not you have PersistOnIdle set to true.

  • Saurabh G

    Are you sure you have the UnLoadOnIdle flag set to true on the SqlWorkflowPersistence provider Unloading is a different concept than persisting, an instance can be persisted in the database while still being loaded in memory, instances are removed from memory only when they are "unloaded", which happens always after a persistence operation. But a mere persistence operation doesnt mean the instance was unlaoded.

  • jkeele

    Wow, THE Jon Flanders responding to my post.... Yeah, it's definitely persisting, it takes quite a while to get a response back from the services we call. We use an external data exchange service to call another service facade using msmq (using WCF - not that it makes a difference). Then we immediately wait for the event using the wait for external event shape. At this point it persists. We actually see it in the tracking db before a response needs to be handled.

    When the message comes back into the other server (we configured our machines to force the scenario - with the request-reply messaging pattern) it throws the well known exception that it can't trigger the event on the workflow (can't get the exact message now, I'm at home). In the persistence db the wf instance is still marked as locked. The reason this is a concern is because I do not want to bind a particular address to handle the response. Any host should be able to get it and unpersist the workflow to fire the event on it, so that we can scale out as needed.

    This was picked up in stress testing. We need to go to production pretty soon, we've been working with WF for the whole year and know it in and out by now. According to the parameters the constructor takes, any wf runtime should be able to fire the event on the flow. It's a bit weird, and a huge concern for our team.

    Shameless self promotion (The only way i can maybe get Jon F to look at my code):

    For interest sake, have a look at this custom ifelse activity we wrote: http://dotnet.org.za/hendrik/archive/2006/07/04/53953.aspx
    We did not want our business users to learn code to direct the flow that we expose to them. So we developed a custom ifelse activity which takes activities as conditions (on the branches), and off course these activities can take input criteria themselves.

    Thanks!

    Hendrik


  • Asassin

    Hmmm, false alarm. It seems that we had a mismatch between the two different machines's workfloes which caused the error. It's only on closer inspection that we realized this. When we fixed it, it works just fine.
  • Álvaro Peñarrubia

    In the future always remember to subscribe to the "ServiceExceptionNotHandledEvent" on the WorkflowRuntime. This will always expose sure issues and in production help you debug problems quicker.

  • robinjam

    Yeah, the UnloadOnIdle flag is definitely set.

    I've tried it with the following constructors:

    NameValueCollection parameters = new NameValueCollection();
    parameters.Add(
    "ConnectionString", "theconnstring");
    parameters.Add(
    "OwnershipTimeoutSeconds", "1");
    parameters.Add(
    "UnloadOnIdle", true.ToString());
    parameters.Add(
    "LoadIntervalSeconds", "30");
    SqlWorkflowPersistenceService persistenceService = new SqlWorkflowPersistenceService(parameters);

    and

    NameValueCollection parameters = new NameValueCollection();
    parameters.Add(
    "ConnectionString", "theconnstring");
    parameters.Add(
    "UnloadOnIdle", true.ToString());
    parameters.Add(
    "LoadIntervalSeconds", "30");
    SqlWorkflowPersistenceService persistenceService = new SqlWorkflowPersistenceService(parameters);

    and

    SqlWorkflowPersistenceService persistenceService = new SqlWorkflowPersistenceService(connstr, true,TimeSpan.FromMilliseconds(1), TimeSpan.FromSeconds(1));

    I also subscribe to the wf runtime's WorkflowUnloaded event. This event is definitely firing, but if you go and look in the InstanceState table the instance has the following values:

    unlocked = 1
    blocked = 1


  • SQLme

    So if you instances are marked as "locked" then it means the instance is still running from the POV of the workflowruntime. If an instance on ServerA is idle (meaning there are not any activities currently scheduled) it should persist if you have UnloadOnIdle= true. So I am confused why they are locked in memory.

  • DatabaseOgre

    Make your ownership timeout something larger (like 1000 seconds). Try that.

    I found this a few weeks ago but haven't had a chance to determine if this is a bug or by design. If Joel or someone else from MS can inform us that would be great - but I found if the ownershiptimeout was set to a very small value it caused a lock when it shouldn't have.



  • OwnershipTimeoutSeconds does not work