GithubHelp home page GithubHelp logo

microsoft / durabletask-mssql Goto Github PK

View Code? Open in Web Editor NEW
81.0 11.0 30.0 976 KB

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework

License: MIT License

C# 86.76% TSQL 12.20% PowerShell 0.91% Dockerfile 0.13%

durabletask-mssql's Introduction

Durable Task SQL Provider

Microsoft SQL Provider for the Durable Task Framework and Durable Functions

Build status License: MIT

The Microsoft SQL provider for the Durable Task Framework (DTFx) and Azure Durable Functions is a storage provider that persists all task hub state in a Microsoft SQL database, which can be hosted in the cloud or in your own infrastructure.

The key benefits of this storage provider include:

  • Data portability: This provider supports both Microsoft SQL Server and Azure SQL Database. Microsoft SQL Server is supported by all major cloud providers and can also be run in your own infrastructure. Because the data is stored in a single database, you can also easily backup the data and migrate it in a new server or service as necessary.

  • Data control: You have full control over the database, the logins, and have direct access to the runtime data, making it easy to protect and secure as necessary. Microsoft SQL also has great support for encryption and business continuity, ensuring that any apps you build can meet the compliance requirements of your enterprise.

  • Multitenancy: Multiple applications can share the same database in a way that isolates the data between each app using low-privilege SQL login credentials.

  • 3rd party app integrations: This provider comes with a set of stored procedures, SQL functions, and views that allow you to easily integrate Durable orchestrations and entities into your existing SQL-based applications.

Downloads

The Durable SQL provider for Durable Functions and DTFx are available as NuGet packages.

Package Latest Version Description
Microsoft.Azure.Functions.Worker.Extensions.DurableTask.SqlServer NuGet Use this package if using Azure Durable Functions with the .NET out-of-process worker.
Microsoft.DurableTask.SqlServer.AzureFunctions NuGet Use this package if building serverless Function apps with Azure Durable Functions (for everything except the .NET out-of-process worker).
Microsoft.DurableTask.SqlServer NuGet Use this package if using DTFx to build .NET apps.

Documentation

Want to learn more? Detailed information about this provider and getting started instructions can be found here.

If you use Azure Durable Functions and want to learn more about all the supported storage provider options, see the Durable Functions Storage Providers documentation.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

Running Tests

Tests will attempt to connect to an instance of SQL Server installed on the local machine. When running on Windows, you'll need to ensure that SQL Server Mixed Mode Authentication is enabled.

Code of Conduct

This project has adopted the Microsoft Open Source Code of conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

durabletask-mssql's People

Contributors

andreirr24 avatar bachuv avatar bhugot avatar cgillum avatar chiangvincent avatar davidmrdavid avatar dependabot[bot] avatar dmetzgar avatar greybird avatar hsnsalhi avatar igx89 avatar jaah avatar jasonwun avatar jviau avatar matei-dorian avatar microsoft-github-policy-service[bot] avatar mivano avatar surgupta-msft avatar tompostler avatar tsuyoshiushio avatar usemam avatar wsugarman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

durabletask-mssql's Issues

Input payload duplication when IncludeParameters is set to true

When setting TaskHubWorker.TaskOrchestrationDispatcher.IncludeParameters to true, we store two copies of input payloads for activities and sub-orchestrators into the dt.Payloads table, one for the task message and one for the history record. This is a waste of space and I/O. Ideally both the task messages and the history events should reference the same data payload record.

Durable Entities - State Storage Location

Hi,

I'm testing out the new Azure SQL storage provider for Durable Functions, but I can't seem to find a table where the state of my Durable Entities is being persisted?

KR,
Dustyn

Timer events fire before completion

When there are multiple event all timer fire up before reaching the scheduled time, the relatives tasks resulting as completed.

I think the issue originated in the stored procedure _LockNextOrchestration, in fact while the condition at line 38 prevents to select an orchestration awaiting a single Timer event before the scheduled time it not prevents the selection of orchestrations having multiple events (including, eventually some timer).

In order to resove the issue I suggest to add the same condition at line 65 and 98.

Tanks.
Gaetano.

Dynamic concurrency throttles

Concurrency settings are currently static and default to the number of cores for both activity tasks and orchestrations. For simple "hello, cities" scenarios, this often results in underutilized CPU, based on recent performance tests using Azure Functions.

The Azure Functions runtime team is working on a dynamic concurrency throttle feature that would dynamically determine an ideal concurrency limit based on a variety of factors, including CPU usage, memory, and possibly network connectivity. This work requires individual triggers to participate in the throttling behavior to ensure messages don't get picked up and then throttled in-memory. This issue tracks integration with this Azure Functions throttling feature, when it is available.

In the case of DTFx apps, an extension point will be made available to allow for custom concurrency throttling implementations.

NOTE: It's not yet clear whether throttle participation work would need to happen in the Durable SQL provider or if can happen in the core Durable Task Framework dispatcher, in which case different backends benefit automatically. If it's the latter, then this issue can be migrated to the Azure/durabletask repo.

GetScaleRecommendation query issue

Hi @cgillum

I am currently using the mssql scaler with KEDA and my durable function application by following the process described here:
https://keda.sh/docs/2.0/migration/
https://microsoft.github.io/durabletask-mssql/#/

Everything works as expected except the dt.GetScaleRecommendation scaler function. When there are no events/requests it returns a value of 0. This causes the number of pods in the cluster to scale down to 0 which in turn makes the cluster unavailable when hitting the external IP as there are no instances of the durable function running to process the request.

To remedy this I added a line before the return statement in the script:
IF (@recommendedWorkersForOrchestrations + @recommendedWorkersForActivities) < 1 SET @recommendedWorkersForOrchestrations = 1

Which simply ensures that a minimum of 1 is returned by the scaler function. I'm not sure if this is best practice but it immediately fixed the issue and the cluster is scaling up and down (to a minimum of 1 pod) perfectly.

I've opened this ticket in response to your comment in:
kedacore/keda-external-scaler-azure-durable-functions#18

Please let me know if I should provide any more information and thanks for the great scaler!

Cheers,
Sam

Execution slowed by deadlocks when using slow Kubernetes cluster

The following exceptions were seen in the Docker logs when running a 100 orchestration "HelloSequences" test on a Kubernetes cluster.

warn: DurableTask.SqlServer[308]
      20210413-111157-0000000000000019: A transient database failure occurred and will be retried. Current retry count: 0. Details: Microsoft.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 102) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
         at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
         at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
         at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
         at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
         at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)
         at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
         at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
         at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
         at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
      --- End of stack trace from previous location where exception was thrown ---
         at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /durabletask-mssql/src/DurableTask.SqlServer/SqlUtils.cs:line 420
      ClientConnectionId:1bfd1928-097b-4daf-8bff-59a0c90cd87c
      Error Number:1205,State:51,Class:13.

Deadlocks are not normally seen when running on fast hardware, so this might be something that was missed during local testing. Apps in this cluster were running slowly, resulting in a variety of other issues. The slowness of this cluster is likely a contributor to the problem. Also, the app was scaled out to 5 replicas using KEDA.

Even for slowly running clusters, the sprocs should be designed in such a way that simple scenarios like these should not result in deadlocks.

DurableClient.ListInstancesAsync() forgets to convert TimeFrom/TimeTo to UTC

Issue is similar to this one.

Typically it is the callee method's responsibility to convert DateTime parameters to whatever time zone it wants them to be. That's also how the default storage provider behaves.

MSSQL provider doesn't seem to be doing that. As a result, if you pass From/To values with Type=Local, the method treats it as UTC and produces incorrect results.

Transient SQL deadlock on history insert

This was encountered during a load test run (StartManySequences) for v0.8.0-beta. The deadlock was auto-retried, which succeeded, so it's not a blocking issue. Out of 240K orchestrations, 11 were impacted by this (0.005%), and all around the same time.

Investigation notes

  • Each transaction should be referencing a completely different row/key in the PK_Payloads clustered index (all deadlocks happened on different instance IDs, and the instance ID is part of the primary key). I'm not sure how it's possible for there to be a deadlock here.
  • I believe the reason we grab a shared lock on PK_Payloads when inserting into dt.History is because of the foreign key relationship between these tables.
  • All 13 failed transactions seem to be deadlocked on the same resource (KEY: 5:72057594049134592)
  • The data has been purged so it's not possible to go back and figure out what the actual row was.

Deadlock graph
image

Deadlock XML

<deadlock>
  <victim-list>
    <victimProcess id="process184f978f088" />
  </victim-list>
  <process-list>
    <process id="process184f978f088" taskpriority="0" logused="1956" waitresource="KEY: 5:72057594049134592 (397656291a2e)" waittime="14900" ownerId="44313" transactionname="user_transaction" lasttranstarted="2021-05-20T03:58:30.610" XDES="0x184f7324428" lockMode="S" schedulerid="3" kpid="106672" status="suspended" spid="191" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2021-05-20T03:58:30.610" lastbatchcompleted="2021-05-20T03:58:30.603" lastattention="1900-01-01T00:00:00.603" clientapp="dfpref-premium-sql" hostname="RD2818784E5FA0" hostpid="2404" loginname="cgillum" isolationlevel="read committed (2)" xactid="44313" currentdb="5" currentdbname="DFPerfHub" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
      <executionStack>
        <frame procname="63615880-2989-4f6a-ba4d-c59ccc350797.dt._CheckpointOrchestration" queryhash="0x7a52845cc3192c5d" queryplanhash="0x7a52845cc3192c5d" line="230" stmtstart="17946" stmtend="19186" sqlhandle="0x03000500c8d5d14f878386012dad000001000000000000000000000000000000000000000000000000000000">
INSERT INTO History (
        [TaskHub],
        [InstanceID],
        [ExecutionID],
        [SequenceNumber],
        [EventType],
        [TaskID],
        [Timestamp],
        [IsPlayed],
        [Name],
        [RuntimeStatus],
        [VisibleTime],
        [DataPayloadID])
    SELECT
        @TaskHub,
        H.[InstanceID],
        H.[ExecutionID],
        H.[SequenceNumber],
        H.[EventType],
        H.[TaskID],
        H.[Timestamp],
        H.[IsPlayed],
        H.[Name],
        H.[RuntimeStatus],
        H.[VisibleTime],
        H.[PayloadID]
    FROM @NewHistoryEvents    </frame>
      </executionStack>
      <inputbuf>
Proc [Database Id = 5 Object Id = 1339151816]   </inputbuf>
    </process>
    <process id="process1850b5c84e8" taskpriority="0" logused="7684" waitresource="KEY: 5:72057594049134592 (6cb689a1ff67)" waittime="108" ownerId="41401" transactionname="user_transaction" lasttranstarted="2021-05-20T03:58:30.470" XDES="0x1851c980428" lockMode="S" schedulerid="4" kpid="24312" status="suspended" spid="162" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2021-05-20T03:58:30.470" lastbatchcompleted="2021-05-20T03:58:30.463" lastattention="1900-01-01T00:00:00.463" clientapp="dfpref-premium-sql" hostname="RD2818784E5FA0" hostpid="2404" loginname="cgillum" isolationlevel="read committed (2)" xactid="41401" currentdb="5" currentdbname="DFPerfHub" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
      <executionStack>
        <frame procname="63615880-2989-4f6a-ba4d-c59ccc350797.dt._CheckpointOrchestration" queryhash="0x7a52845cc3192c5d" queryplanhash="0x7a52845cc3192c5d" line="230" stmtstart="17946" stmtend="19186" sqlhandle="0x03000500c8d5d14f878386012dad000001000000000000000000000000000000000000000000000000000000">
INSERT INTO History (
        [TaskHub],
        [InstanceID],
        [ExecutionID],
        [SequenceNumber],
        [EventType],
        [TaskID],
        [Timestamp],
        [IsPlayed],
        [Name],
        [RuntimeStatus],
        [VisibleTime],
        [DataPayloadID])
    SELECT
        @TaskHub,
        H.[InstanceID],
        H.[ExecutionID],
        H.[SequenceNumber],
        H.[EventType],
        H.[TaskID],
        H.[Timestamp],
        H.[IsPlayed],
        H.[Name],
        H.[RuntimeStatus],
        H.[VisibleTime],
        H.[PayloadID]
    FROM @NewHistoryEvents    </frame>
      </executionStack>
      <inputbuf>
Proc [Database Id = 5 Object Id = 1339151816]   </inputbuf>
    </process>
  </process-list>
  <resource-list>
    <keylock hobtid="72057594049134592" dbid="5" objectname="63615880-2989-4f6a-ba4d-c59ccc350797.dt.Payloads" indexname="PK_Payloads" id="lock1850af6e200" mode="X" associatedObjectId="72057594049134592">
      <owner-list>
        <owner id="process1850b5c84e8" mode="X" />
      </owner-list>
      <waiter-list>
        <waiter id="process184f978f088" mode="S" requestType="wait" />
      </waiter-list>
    </keylock>
    <keylock hobtid="72057594049134592" dbid="5" objectname="63615880-2989-4f6a-ba4d-c59ccc350797.dt.Payloads" indexname="PK_Payloads" id="lock1850d0ce280" mode="X" associatedObjectId="72057594049134592">
      <owner-list>
        <owner id="process184f978f088" mode="X" />
      </owner-list>
      <waiter-list>
        <waiter id="process1850b5c84e8" mode="S" requestType="wait" />
      </waiter-list>
    </keylock>
  </resource-list>
</deadlock>

Kusto query for repro (Internal-only link)

DurableFunctionsEvents
| where TIMESTAMP between (datetime(2021-05-20 03:58:29) .. 30s)
| where AppName == "dfpref-premium-sql"
| where ProviderName != "WebJobs-Extensions-DurableTask"
| where InstanceId == 'EP1-max1-sql8-10000-20210520-035829-000000000000000D'
| take 5000

No support for custom schema name

The existing DB setup/migration capabilities do not offer the possibility to customize the schema name for DTF. So when multiple services are using the same database, you only have the option to use the dt schema in the multitenancy setup, but don't have the option to use a schema per service approach.

Is there a plan for supporting this in the future?

Azure Sql serverless sku

Hello,

when using the serverless sku of an Azure Sql there can be times in which the db is sleeping/paused. It may take a minute or so to get spun-up again. Does this have any impact on durable orchestrations? Other than having to wait for the db to be ready of course.

Cheers

datediff function resulted in an overflow

We use the durabletask-mssql extension for Azure Functions in a multi-tenant database. We use user connection strings to separate our tenants, which are Live Dev Env, Live Test Env, and individual ones for our developers.

Yesterday our Live Dev Env stopped responding to requests and our Azure Functions host is reporting this error below over and over again when the host tries to startup. This causes it to never be able to accept any requests or fully startup. This has not affected our developer's local environments (unique task hubs in DB), or our Live Test Env (also has it's unique task hub in the same DB). No unique changes were made to the dev environment to cause this.

It is stating that TaskActivityDispatcher has Failed to fetch a work-item because of a SQL exception The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.

Because we don't do any manual interaction with the MSSQL queue database ourselves, it seems to be coming from the SQL Connection or Durable Task extension of Azure Functions.

@cgillum FYI we've been building this solution with Azure Durable Functions and the MSSQL Task Hub extension for over 6 months and we've been absolutely loving its capabilities <3 Especially when you add KEDA! We're moving this solution into production to and scaling the solution globally very soon so it is crucial we get this resolved, understand the cause of the issue, and create a strategy for preventing/triaging this issue in the future. Any advice you can offer to help here is much appreciated.

Details:

  • Python 3.9 Azure Durable Functions Project
  • Containerized Function App hosted in AKS
  • Using Azure SQL for MSSQL Queue DB (currently using 48GB of storage space, I'm guessing mostly payloads)
  • Extensions versions are shown at the bottom

Full Exception:

ClientConnectionId:5093d664-5c8b-4308-a4a1-3c863255c4c6
Error Number:535,State:0,Class:16
ClientConnectionId before routing:6ff9c21f-29a9-43f0-8a7d-72f5e7779702
Routing Destination:bb08fcc51e7d.HS2.tr6036.centralus1-a.worker.database.windows.net,11005
fail: DurableTask.Core[25]
      TaskActivityDispatcher-37b55ccda45246ec9acf45def0121d13-0: Failed to fetch a work-item: Microsoft.Data.SqlClient.SqlException (0x80131904): The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.
         at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
         at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
         at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
         at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
         at Microsoft.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
         at Microsoft.Data.SqlClient.SqlDataReader.ReadAsyncExecute(Task task, Object state)
         at Microsoft.Data.SqlClient.SqlDataReader.InvokeAsyncCall[T](AAsyncCallContext`1 context)
      --- End of stack trace from previous location where exception was thrown ---
         at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskActivityWorkItem(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 332
         at DurableTask.Core.WorkItemDispatcher`1.DispatchAsync(WorkItemDispatcherContext context) in C:\source\durabletask\src\DurableTask.Core\WorkItemDispatcher.cs:line 262
      ClientConnectionId:5093d664-5c8b-4308-a4a1-3c863255c4c6
      Error Number:535,State:0,Class:16
      ClientConnectionId before routing:6ff9c21f-29a9-43f0-8a7d-72f5e7779702
      Routing Destination:bb08fcc51e7d.HS2.tr6036.centralus1-a.worker.database.windows.net,11005
Microsoft.Data.SqlClient.SqlException (0x80131904): The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
   at Microsoft.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
   at Microsoft.Data.SqlClient.SqlDataReader.ReadAsyncExecute(Task task, Object state)
   at Microsoft.Data.SqlClient.SqlDataReader.InvokeAsyncCall[T](AAsyncCallContext`1 context)
--- End of stack trace from previous location where exception was thrown ---
   at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskActivityWorkItem(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 332
   at DurableTask.Core.WorkItemDispatcher`1.DispatchAsync(WorkItemDispatcherContext context) in C:\source\durabletask\src\DurableTask.Core\WorkItemDispatcher.cs:line 262

Extesions.csproj:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <WarningsAsErrors></WarningsAsErrors>
    <DefaultItemExcludes>**</DefaultItemExcludes>
  </PropertyGroup>

  <!-- Need to rename extensions.deps.json to function.deps.json for the native SQL dependencies to get loaded -->
  <Target Name="PostBuild" AfterTargets="PostBuildEvent">
    <Move SourceFiles="$(OutDir)/extensions.deps.json" DestinationFiles="$(OutDir)/function.deps.json" />
  </Target>

  <ItemGroup>
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.DurableTask" Version="2.5.0" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.SignalRService" Version="1.5.0" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Script.ExtensionsMetadataGenerator" Version="1.2.3" />
    <PackageReference Include="Microsoft.DurableTask.SqlServer.AzureFunctions" Version="0.10.0-beta" />
  </ItemGroup>
</Project>

Sync AKS ScaledObject with host.json

I have the following configuration in my host.json file to control my Function App's maxConcurrentOrchestratorFunctions and maxConcurrentActivityFunctions limits:

"extensions": {
    "durableTask": {
      "maxConcurrentOrchestratorFunctions": 2,
      "maxConcurrentActivityFunctions": 2,
      "storageProvider": {
        "type": "mssql",
        "connectionStringName": "SQLDB_Connection",
        "taskEventLockTimeout": "00:02:00",
        "partitionCount": 3
      }
    }
  }

When I generate my deployment using func kubernetes deploy it creates a ScaledObject for me with the GetScaleRecommendation function setup, however the function parameters are out of sync with my host.json file.

I would expect the parameters to line up with my host.json settings like SELECT dt.GetScaleRecommendation(2, 2) though instead I get this ScaledObject configuration:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: bat-artwork-review-api
  labels: {}
spec:
  scaleTargetRef:
    name: bat-artwork-review-api
  triggers:
  - type: mssql
    metadata:
      query: SELECT dt.GetScaleRecommendation(10, 1)
      targetValue: "1"
      connectionStringFromEnv: SQLDB_Connection

Is there any way to sync these parameters currently?

Support scale-out to 200 replicas

The MSSQL backend currently uses constant-rate polling to find new work to execute. As the number of replicas running this code increases, the load on the MSSQL database increases with it to the point where adding new replicas can reduce overall throughput.

This item tracks doing whatever work is necessary (like dynamically changing polling frequency, etc.) to allow scale-out to as many as 200 replicas sharing a single, fixed-size database while still increasing throughput. The 200 number was chosen to match the Azure Functions maximum replica scale limit.

Retrieve the total number of filtered instances by SqlOrchestrationQuery

Hello,
The GetManyOrchestrationsAsync returns a list of orchestration states. We can set the paging parameters (pageNumber and pageSize) with the SqlOrchestrationQuery object.
My problem is that this method cannot tell us how many instances match the defined query filter.
Can we add a GetOrchestrationsCountAsync method that accepts the same SqlOrchestrationQuery object as input (without paging information) and returns that total number of matching instances.
Thank you

Support Orchestration restart

Trying to manage an Eternal Orchestration is difficult due to:

  • Must provide instance id at startup and check if already running.
  • If orchestration was somehow terminated, it is impossible to restart.

I believe there is a TODO item in the CreateInstance stored procedure which would address this, but it appears the database schema itself blocks this since it would cause a duplicate key. Perhaps I'm misunderstanding, shouldn't ExecutionId be used as part of the unique identifier? In other words, a combination of TaskHub, InstanceId, and ExecutionId signify a unique orchestration, with ContinueAsNew starting a new Orchestration with the same InstanceId and different ExecutionId.

This ticket requests a solution for restarting orchestrations using the same InstanceId.

KEDA scaler support

We want the Durable SQL provider to work great in Kubernetes. One of the strategies is making auto-scale work with KEDA (Kubernetes Event-Drive Autoscaler). Rather than requiring a scaler that is specific to the Durable SQL backend, a user should be able to use a generic mssql scaler. One doesn't exist yet, but it's in the committed backlog here: kedacore/keda#674

The work on the Durable SQL provider is to expose a stored procedure that can be consumed by a native KEDA mssql scaler to make scale decisions for Durable Functions and/or DTFx workloads running within a Kubernetes cluster.

Provide a way to correlate orchestrations with their suborchestrations

To show fancy diagrams like this:
image
DurableFunctionsMonitor needs to be able to fetch ids of suborchestrations that were called by a given orchestration.
DurableOrchestrationStatus.History doesn't provide that info, but fortunately the default storage provider stores those ids in XXXHistory table, and DfMon is able to fetch that info by doing a simple direct query against that table.
MsSql storage provider doesn't do that, and there seems to be no way to fetch that info from the database.

Can you please add that data, so that execution history can be populated with child orchestration ids?

Ideally, of course, it would be great if DurableOrchestrationStatus.History had that field included (so that DfMon didn't need to make any custom roundtrips), but I guess that's a much longer story...

Platform Not Supported Exception for Microsoft.Data.SqlClient

I'm running a brand new Python Durable Function app locally with the SQL Server Durable Task extension. I have provided the Connection String to my Azure SQL server in a local.settings.json environment variable and am referencing that variable in my host.json configuration (all shown below).

When I run func start I get the following error towards the bottom saying that Microsoft.Data.SqlClient is not supported on this platform:

Found Python version 3.9.5 (python3).

Azure Functions Core Tools
Core Tools Version:       3.0.3477 Commit hash: 5fbb9a76fc00e4168f2cc90d6ff0afe5373afc6d  (64-bit)
Function Runtime Version: 3.0.15584.0

[2021-07-22T00:12:50.711Z] Cannot create directory for shared memory usage: /dev/shm/AzureFunctions
[2021-07-22T00:12:50.711Z] System.IO.FileSystem: Access to the path '/dev/shm/AzureFunctions' is denied. Operation not permitted.
[2021-07-22T00:12:51.431Z] A host error has occurred during startup operation '4b33ffdd-e1a1-4a0e-b558-a0d07aa07d3f'.
[2021-07-22T00:12:51.431Z] Microsoft.Data.SqlClient: Microsoft.Data.SqlClient is not supported on this platform.
Value cannot be null. (Parameter 'provider')

Host.json

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    },
    "logLevel": {
      "DurableTask.SqlServer": "Information",
      "DurableTask.Core": "Warning"
    }
  },
  "extensions": {
    "durableTask": {
      "storageProvider": {
        "type": "mssql",
        "connectionStringName": "SQLDB_Connection",
        "taskEventLockTimeout": "00:02:00"
      }
    }
  }
}

local.settings.json

{
  "IsEncrypted": false,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "SQLDB_Connection": "<REDACTED>"
  }
}

extensions.csproj

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
	<WarningsAsErrors></WarningsAsErrors>
	<DefaultItemExcludes>**</DefaultItemExcludes>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.DurableTask" Version="2.5.0" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Script.ExtensionsMetadataGenerator" Version="1.1.3" />
    <PackageReference Include="Microsoft.DurableTask.SqlServer.AzureFunctions" Version="0.9.1-beta" />
  </ItemGroup>
</Project>

.NET version 5.0.302
.NET Core 3.1
Functions Core Tools 3.0.3477
Running on MacOS Big Sur

This occurs with both version 0.9.1-beta and 0.10.0-beta of the Durable Functions SQL Server extension.

I'm really excited to use this new event source but I'm blocked here. Can someone please help?

Documentation: Guidance for DBAs

The MSSQL backend for the Durable Task Framework, more than other storage providers, could benefit from documentation that is oriented towards database administrators or operators. Existing features like multitenancy, KEDA support, and co-residency with other applications are documented, but not in a central place. This issue tracks creating documentation that is targeted towards DBAs that covers best practices and how-to guides for all these topics in a more intentional way.

Custom Task Hub name is ignored

Here is my host.json:

{
    "version": "2.0",
    "extensions": {
        "http": {
            "routePrefix": "tino"
        },
        "durableTask": {
            "hubName": "VerySpecialTaskHub",
            "storageProvider": {
                "type": "mssql",
                "connectionStringName": "SqlConnectionString",
                "taskEventLockTimeout": "00:02:00"
            }
        }
    }
}

And yet still everything goes into the default 'dbo' task hub:
image

, and VerySpecialTaskHub never gets created.

What am I doing wrong?

Automatically Create Database If Not Present

This extension is very easy-to-use with a simple configuration change and package import! But unlike the default Azure Storage option, the SQL provider requires that you create the database ahead of time. This isn't very convenient, especially because there may be settings, like the collation, that may only be discovered in the docs. I have also been playing around with azure functions using docker-compose, and it would require less overhead to simply spool up my SQL and Azure Function containers without executing an additional script to create my DB.

If that sounds like something you'd like to include as part of this storage provider, I have created a pull request to demonstrate the start of such a change.

Frequent SQL deadlocks on high sub-orchestration fan-out

It appears that fanning out to a large number of concurrent sub-orchestrations results in a high-frequency of SQL deadlocks. Here is the deadlock graph of one such example:

image

<deadlock>
 <victim-list>
  <victimProcess id="process2173a075468" />
 </victim-list>
 <process-list>
  <process id="process2173a075468" taskpriority="0" logused="7792" waitresource="KEY: 5:72057594044547072 (d49f706572c3)" waittime="5763" ownerId="1101274" transactionname="user_transaction" lasttranstarted="2021-09-27T14:42:30.693" XDES="0x2173abe8428" lockMode="S" schedulerid="5" kpid="26996" status="suspended" spid="69" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2021-09-27T14:42:30.693" lastbatchcompleted="2021-09-27T14:42:30.687" lastattention="1900-01-01T00:00:00.687" clientapp="default" hostpid="22068" loginname="testlogin_ParallelSubOrchestrations_2021092709423010" isolationlevel="read committed (2)" xactid="1101274" currentdb="5" currentdbname="DurableDB" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
   <executionStack>
    <frame procname="DurableDB.dt._CheckpointOrchestration" line="194" stmtstart="15508" stmtend="16444" sqlhandle="0x03000500888ea41e28d30901a3ad000001000000000000000000000000000000000000000000000000000000">
INSERT INTO NewEvents (
        [TaskHub],
        [InstanceID],
        [ExecutionID],
        [EventType],
        [Name],
        [RuntimeStatus],
        [VisibleTime],
        [TaskID],
        [PayloadID]
    ) 
    SELECT 
        @TaskHub,
        [InstanceID],
        [ExecutionID],
        [EventType],
        [Name],
        [RuntimeStatus],
        [VisibleTime],
        [TaskID],
        [PayloadID]
    FROM @NewOrchestrationEvent    </frame>
   </executionStack>
   <inputbuf>
Proc [Database Id = 5 Object Id = 514100872]   </inputbuf>
  </process>
  <process id="process2172628c8c8" taskpriority="0" logused="30084" waitresource="KEY: 5:72057594044547072 (9f766baad1a1)" waittime="5762" ownerId="1101296" transactionname="user_transaction" lasttranstarted="2021-09-27T14:42:30.693" XDES="0x217261b0428" lockMode="S" schedulerid="6" kpid="11848" status="suspended" spid="65" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2021-09-27T14:42:30.693" lastbatchcompleted="2021-09-27T14:42:30.687" lastattention="1900-01-01T00:00:00.687" clientapp="default" hostpid="22068" loginname="testlogin_ParallelSubOrchestrations_2021092709423010" isolationlevel="read committed (2)" xactid="1101296" currentdb="5" currentdbname="DurableDB" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
   <executionStack>
    <frame procname="DurableDB.dt._CheckpointOrchestration" line="194" stmtstart="15508" stmtend="16444" sqlhandle="0x03000500888ea41e28d30901a3ad000001000000000000000000000000000000000000000000000000000000">
INSERT INTO NewEvents (
        [TaskHub],
        [InstanceID],
        [ExecutionID],
        [EventType],
        [Name],
        [RuntimeStatus],
        [VisibleTime],
        [TaskID],
        [PayloadID]
    ) 
    SELECT 
        @TaskHub,
        [InstanceID],
        [ExecutionID],
        [EventType],
        [Name],
        [RuntimeStatus],
        [VisibleTime],
        [TaskID],
        [PayloadID]
    FROM @NewOrchestrationEvent    </frame>
   </executionStack>
   <inputbuf>
Proc [Database Id = 5 Object Id = 514100872]   </inputbuf>
  </process>
 </process-list>
 <resource-list>
  <keylock hobtid="72057594044547072" dbid="5" objectname="DurableDB.dt.Payloads" indexname="PK_Payloads" id="lock21734a1d000" mode="X" associatedObjectId="72057594044547072">
   <owner-list>
    <owner id="process2172628c8c8" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="process2173a075468" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
  <keylock hobtid="72057594044547072" dbid="5" objectname="DurableDB.dt.Payloads" indexname="PK_Payloads" id="lock2173a673580" mode="X" associatedObjectId="72057594044547072">
   <owner-list>
    <owner id="process2173a075468" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="process2172628c8c8" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
 </resource-list>
</deadlock>

Extended Sessions

The MSSQL backend doesn't yet support extended sessions, which could provide significant performance benefits in many scenarios. This issue tracks adding extended session support.

ADO.NET failures under high load: BeginExecuteReader requires an open and available Connection

During a load test run by a partner team, the following exception was observed under heavy sustained load.

System.InvalidOperationException: BeginExecuteReader requires an open and available Connection. The connection's current state is closed.
   at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__169_0(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__274_0(Object obj)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location where exception was thrown ---
   at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries)
   at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 469
   at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func`2 executor)
   at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskOrchestrationWorkItemAsync(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 136
   at DurableTask.Core.WorkItemDispatcher`1.DispatchAsync(WorkItemDispatcherContext context) in C:\source\durabletask\src\DurableTask.Core\WorkItemDispatcher.cs:line 262

Trigger Orchestration from External Client App

@cgillum This is not exactly a SQL Service Durable Task Hub issue, but I would like to know what is the recommend approach for solving the issue below while using SQL Service Durable Task Hubs. You can refer to this public repo to reproduce this on AKS:

https://github.com/marcd123/durabletasktest

When we deploy our Durable Function app to AKS, HTTP (HTTP-Starter) and Non-HTTP (Orchestrator/Activity) functions are separated into two deployments.

Because both deployments have the same host.json and local.settings.json, and are thus pointing to the same SQL task hub - I would think having separate deployments would be fine.

However, when we trigger our HTTP-Starter (OrchClient) which is supposed to trigger the Orchestrator (HelloOrchestrator) in a separate deployment, we get this exception from HTTP-Starter saying the orchestrator function could not be found....

Exception: Exception: The function 'HelloOrchestrator' doesn't exist, is disabled, or is not an orchestrator function. Additional info: No orchestrator functions are currently registered!

From what I've read elsewhere, this is because my HTTP function app is checking for the Orchestrator functions locally, which have actually been separated out to another deployment.

I have also read that it may be possible to bypass this local check for the function by using some ExternalClient binding ((Azure/azure-functions-core-tools#2345 (comment))), but I've only seen C# project examples and am unsure how/if this ExternalClient binding can be used within host.json or individual function.json bindings.

Is it possible to bypass this local check with ExternalClient binding in a Python Function app, and if so could you please provide an example of the host.json/function.json configuration?

Plans to support .NET Framework?

@cgillum
Any plans on bumping TargetFramework for DurableTask.SqlServer package down to netstandard2.0?
This will enable support for older .NET versions, such as .NET Framework 4.8
Thanks!

Activity and sub-orchestration payload IDs are not saved in the history table

When an orchestration schedules an activity or sub-orchestration, the input payload references are correctly saved to the NewEvents or NewTasks tables, but not to the History table. This makes debugging trickier because you can't know what the inputs for activities or sub-orchestrations are when looking at an orchestration's history.

The workaround is to set TaskHubWorker.TaskOrchestrationDispatcher.IncludeParameters to true. However, this API is not at all discoverable and is very poorly designed. At the time of writing, it also exposes a data duplication bug, tracked here: #84.

Ideally, by default, input payloads for activities and sub-orchestrations should be tracked in the history database.

Any good way to mock the data store for testing

Currently we have to provide an actual database connection string when running our unit test. It is fine when it is running on local but when it comes to production environment we don't want to insert test data into prod database.

There used to be a emulator for Azure Storage for unit testing but I'm not sure if there is similar mechanism for Sql Provider as well.

Azure Managed Identity support

Currently we support connecting to a SQL database using Windows Integrated Auth or via username/password. Support for Azure Managed Identity would be helpful for users that are running in Azure (w/out Windows Integrated Auth support) and don't want to put database credentials in their application configuration.

Useful resource for instructions on how this works generally: https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/tutorial-windows-vm-access-sql

Useful code that can be borrowed for obtaining an access token: https://github.com/Azure/azure-functions-durable-extension/blob/47247f9ab322efdf9cd9216582bc5a7dce9bb37f/src/WebJobs.Extensions.DurableTask/ManagedIdentityTokenSource.cs#L55-L79

Documentation: Updated Performance Comparisons

Documentation exists that covers the performance numbers for the MSSQL backend, but it hasn't been updated for quite a while. Since then, several new features have been added and performance improving fixes have been made. As part of the v1.0.0 release, we need to update this documentation with up-to-date numbers. Ideally, we'll also include a couple more scenarios in addition to "Hello Cities".

Flakey CI test: MultiInstancePurge

This test fails very frequently in the CI, sometimes requiring multiple re-runs. Unfortunately, I'm not able to get it to fail locally, making it hard to root cause the issue.

Support Rewind

Hello,

It seems that this provider does not support Rewind ATM, is this plan for the future, even if it's a preview feature with default storage provider it's quite useful

Regards

Fix copyright license headers

The file headers in this project all mention .NET Foundation. However, this project isn't actually part of the .NET Foundation. All files should be updated with the following license header instead, which is consistent with official Microsoft guidelines:

// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

We should also update the .editoconfig file to help enforce this correctly:

file_header_template = Copyright (c) Microsoft Corporation.\nLicensed under the MIT License. 

A good way to migrate a running orchestration instance?

Hi,

Our team is implementing a solution for migrating in-progress orchestration instance from one database to another one and it can be resumed and picked up by the worker after. We believe it would be better to pause the entire orchestration instance before running the SQL command moving the data.

The definition of pausing the migration is quite simple: The state of the orchestration instance will remained still and the worker will not pick it up until we allow it to.

There is no official API available by the SQL Provider for this type of "pause" operation thus we are looking to existing schema and see if we can utilize any of them. For now, LockExpiration in [dt].[Instances] and [dt].[NewTasks] and the VisibleTime in [dt].[NewEvents] seems strong candidates to logically pause the orchestration. We have some PoC that setting LockExpiration and VisibleTime for NewEvents and NewTasks to some far time could keep the worker from picking them up, but we have not tested if it works for Instances as well.

So my question to the owner of SQL Provider is does this approach lead to any unexpected behavior for worker and do you have any suggestion on the migration?

More robust orchestration versioning support

Orchestration versioning is challenging in the current implementation (as of v0.5.0) for a couple reasons:

  • The DB schema doesn't currently expose DTFx version information, making it difficult to implement version-specific logic within an orchestration. More context on orchestration versioning can be found here.
  • Task hub names are not explicitly configurable, so side-by-side deployment strategies with a shared database for versioning require new credentials to be configured.

This issue tracks the work required to simplify versioning when using the SQL backend in DTFx.

Azure Functions Core Tools support for Kubernetes deployments

The Azure Functions Core Tools have explicit support for Kubernetes deployments. However, in order for this to work, the func CLI needs to have specific knowledge about triggers so that it can produce trigger-specific configuration in the generated deployment yaml. It currently has no understanding of Durable Functions triggers (of any backend).

This issue tracks doing the work to support Durable Functions triggers, with SQL backend support, for Kubernetes deployments that use the Functions Core Tools.

Related: #2

Deadlock during long-haul stress test

The long-haul stress test running on v1.0.0-rc shows various deadlocks in the warning logs. Looking at the local SQL Server, all the deadlocks seem to have the same cause. Here is an example:

<deadlock>
 <victim-list>
  <victimProcess id="process1f720001088" />
 </victim-list>
 <process-list>
  <process id="process1f720001088" taskpriority="0" logused="0" waitresource="PAGE: 5:1:250251 " waittime="104" ownerId="33084415" transactionname="user_transaction" lasttranstarted="2022-04-19T21:54:03.373" XDES="0x1f7200a8428" lockMode="U" schedulerid="5" kpid="34052" status="suspended" spid="83" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2022-04-19T21:54:03.373" lastbatchcompleted="2022-04-19T21:54:03.377" lastattention="1900-01-01T00:00:00.377" clientapp="TestHubName" hostname="{MACHINE_NAME}" hostpid="16064" loginname="{username}" isolationlevel="read committed (2)" xactid="33084415" currentdb="5" currentdbname="DurableDB" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
   <executionStack>
    <frame procname="DurableDB.dt._LockNextOrchestration" line="25" stmtstart="1784" stmtend="3092" sqlhandle="0x03000500f01ab84a4ef3a30078ae000001000000000000000000000000000000000000000000000000000000">
UPDATE TOP (1) Instances WITH (READPAST)
    SET
        [LockedBy] = @LockedBy,
	    [LockExpiration] = @LockExpiration,
        @instanceID = I.[InstanceID],
        @parentInstanceID = I.[ParentInstanceID],
        @version = I.[Version]
    FROM 
        dt.Instances I WITH (READPAST) INNER JOIN NewEvents E WITH (READPAST) ON
            E.[TaskHub] = @TaskHub AND
            E.[InstanceID] = I.[InstanceID]
    WHERE
        I.TaskHub = @TaskHub AND
        I.[RuntimeStatus] IN ('Pending', 'Running') AND
	    (I.[LockExpiration] IS NULL OR I.[LockExpiration] &lt; @now) AND
        (E.[VisibleTime] IS NULL OR E.[VisibleTime] &lt; @now    </frame>
   </executionStack>
   <inputbuf>
Proc [Database Id = 5 Object Id = 1253579504]   </inputbuf>
  </process>
  <process id="process1f713fb4108" taskpriority="0" logused="1392" waitresource="KEY: 5:72057594043826176 (aa52126f61e5)" waittime="104" ownerId="33084446" transactionname="user_transaction" lasttranstarted="2022-04-19T21:54:03.380" XDES="0x1f70df48428" lockMode="X" schedulerid="8" kpid="53204" status="suspended" spid="51" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2022-04-19T21:54:03.380" lastbatchcompleted="2022-04-19T21:54:03.377" lastattention="1900-01-01T00:00:00.377" clientapp="TestHubName" hostname="{MACHINE_NAME}" hostpid="16064" loginname="{username}" isolationlevel="read committed (2)" xactid="33084446" currentdb="5" currentdbname="DurableDB" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
   <executionStack>
    <frame procname="DurableDB.dt._CheckpointOrchestration" line="219" stmtstart="16706" stmtend="17362" sqlhandle="0x03000500293fac4b51f3a30078ae000001000000000000000000000000000000000000000000000000000000">
DELETE E
    OUTPUT DELETED.InstanceID, DELETED.SequenceNumber
    FROM dt.NewEvents E WITH (FORCESEEK(PK_NewEvents(TaskHub, InstanceID, SequenceNumber)))
        INNER JOIN @DeletedEvents D ON 
            D.InstanceID = E.InstanceID AND
            D.SequenceNumber = E.SequenceNumber AND
            E.TaskHub = @TaskHu    </frame>
   </executionStack>
   <inputbuf>
Proc [Database Id = 5 Object Id = 1269579561]   </inputbuf>
  </process>
 </process-list>
 <resource-list>
  <pagelock fileid="1" pageid="250251" dbid="5" subresource="FULL" objectname="DurableDB.dt.Instances" id="lock1f717da2b00" mode="IX" associatedObjectId="72057594043629568">
   <owner-list>
    <owner id="process1f713fb4108" mode="IX" />
   </owner-list>
   <waiter-list>
    <waiter id="process1f720001088" mode="U" requestType="wait" />
   </waiter-list>
  </pagelock>
  <keylock hobtid="72057594043826176" dbid="5" objectname="DurableDB.dt.NewEvents" indexname="PK_NewEvents" id="lock1f70fd18f00" mode="U" associatedObjectId="72057594043826176">
   <owner-list>
    <owner id="process1f720001088" mode="S" />
   </owner-list>
   <waiter-list>
    <waiter id="process1f713fb4108" mode="X" requestType="convert" />
   </waiter-list>
  </keylock>
 </resource-list>
</deadlock>

Here is the visual:

image

The deadlocked transactions are automatically retried and the orchestrations continue to run successfully, but this does create delays and also creates noise in the logs. Ideally there should be no deadlocks at all during the long-haul stress test.

Fix documentation on task hub/multi-tenancy to match the v0.7.0 defaults

Regarding these two pages:

The Task Hubs article needs to point out that host.json configuration of task hub names only works if multitenant mode is explicitly disabled.

The Multitenancy article needs to be updated to explain that multitenancy is enabled by default, and should provide instructions for how to disable it.

Problem terminating orchestration with running activity

Hello! When I terminate a (sub) orchestration with a running activity, the orchestration gets properly marked as Terminated in the Instances table but when the activity it was running finishes (less than a minute later), an exception is thrown and the task remains in the NewTasks table. The runtime then picks up the task again every 2min (lock interval), runs the activity again, and keeps throwing the same exception.

Exception and related logs (Reservation_0vl5ureu_BroadcastOrchestrator is the name of the terminated orchestration, ReservationBroadcast_Broadcast is the name of the activity):

[2022-04-27T20:39:43.664Z] Executing 'ReservationBroadcast_Broadcast' (Reason='(null)', Id=f6dced9c-1439-4af7-9b87-0d1e5750d030)
[2022-04-27T20:39:43.681Z] Executed 'ReservationBroadcast_Broadcast' (Succeeded, Id=f6dced9c-1439-4af7-9b87-0d1e5750d030, Duration=17ms)
[2022-04-27T20:39:43.699Z] TaskActivityDispatcher-6466eb79540a465b93abcb100bf5ad0d-0: Unhandled exception with work item 'Reservation_0vl5ureu_BroadcastOrchestrator:0000000000000000': Microsoft.Data.SqlClient.SqlException (0x80131904): The target instance is not running. It may have already completed (in which case this execution can be considered a duplicate) or been terminated. Any results of this task activity execution will be discarded.
[2022-04-27T20:39:43.702Z]    at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
[2022-04-27T20:39:43.706Z]    at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
[2022-04-27T20:39:43.711Z]    at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
[2022-04-27T20:39:43.713Z]    at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
[2022-04-27T20:39:43.716Z]    at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
[2022-04-27T20:39:43.718Z]    at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)
[2022-04-27T20:39:43.723Z]    at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
[2022-04-27T20:39:43.726Z]    at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
[2022-04-27T20:39:43.729Z]    at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
[2022-04-27T20:39:43.731Z]    at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
[2022-04-27T20:39:43.733Z] --- End of stack trace from previous location ---
[2022-04-27T20:39:43.736Z]    at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 500
[2022-04-27T20:39:43.739Z]    at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func`1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 507
[2022-04-27T20:39:43.742Z]    at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func`2 executor) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 434
[2022-04-27T20:39:43.745Z]    at DurableTask.SqlServer.SqlOrchestrationService.CompleteTaskActivityWorkItemAsync(TaskActivityWorkItem workItem, TaskMessage responseMessage) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 387
[2022-04-27T20:39:43.747Z]    at DurableTask.Core.TaskActivityDispatcher.OnProcessWorkItemAsync(TaskActivityWorkItem workItem) in /_/src/DurableTask.Core/TaskActivityDispatcher.cs:line 240
[2022-04-27T20:39:43.749Z]    at DurableTask.Core.TaskActivityDispatcher.OnProcessWorkItemAsync(TaskActivityWorkItem workItem) in /_/src/DurableTask.Core/TaskActivityDispatcher.cs:line 263
[2022-04-27T20:39:43.753Z]    at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 373
[2022-04-27T20:39:43.755Z] ClientConnectionId:8544a171-c7cc-4599-8d48-e9e852613774
[2022-04-27T20:39:43.757Z] Error Number:50003,State:1,Class:16
[2022-04-27T20:39:43.759Z]
[2022-04-27T20:39:43.761Z] Backing off for 1 seconds until 5 successful operations
[2022-04-27T20:39:43.762Z] Core Microsoft SqlClient Data Provider: The target instance is not running. It may have already completed (in which case this execution can be considered a duplicate) or been terminated. Any results of this task activity execution will be discarded.

This issue does not exist with the built-in Azure Storage provider. If the cause is not obvious and you need a repro, just let me know and I'll work on putting one together. It seems the situation is properly identified, but the logic just isn't gracefully handling it (removing the task and not logging an exception).

Thanks!

Clarify supported host.json settings

Can someone please clarify what Azure Function host.json settings are supported when using the MSSQL Durable Task Extension for Azure Functions?

I'm particularly interested in these settings:

- controlQueueBatchSize
- controlQueueBufferThreshold
- partitionCount
- controlQueueVisibilityTimeOut
- workItemQueueVisibilityTimeout
- maxConcurrentActivityFunctions
- maxConcurrentOrchestratorFunctions
- maxQueuePollingInterval

I'm also unfamiliar with taskEventLockTimeout which appears in the Durable Task SQL guide website but isn't explained:

https://microsoft.github.io/durabletask-mssql/#/README

This explanation of each setting has been very helpful on previous Azure Function apps that I've worked on, though those were not using AKS, KEDA, or MSSQL Durable Task Extension:

https://github.com/MicrosoftDocs/azure-docs/blob/master/includes/functions-host-json-durabletask.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.