This feels like the wrong solution to an age old problem solved by the DAG sched...

daxfohl · 2026-06-05T17:28:37 1780680517

Microsoft has their own Durable Task framewor[1] for that kind of stuff, and it supports both running as a self-hosted standalone service like temporal, and running serverless on Azure Functions. It actually predated airflow, temporal, etc., IIRC.

This one seems to be more database-specific use case. The advantage is probably that you can track the exact state of the job in the database itself, rather than having to cross-reference the workflow log with the codebase and trace through it line by line to figure out what the state is. Plus I assume it's less overhead and latency, and operationally one less thing to spin up.

[1] https://learn.microsoft.com/en-us/azure/durable-task/common/...

affandar · 2026-06-05T18:10:01 1780683001

(Author of both durable task framework and pg_durable/duroxide here)

Indeed Durable tasks is an exceptional project and was a unique innovation at the time.

pg_durable brings the same reliability and durablity semantics to long running operations within the database.

We have tons of interesting scenarios on the roadmap. Stay tuned! :)

alex_hirner · 2026-06-05T18:44:14 1780685054

Does ai.backfill() fill incomplete/dirty rows or does pg_durable have some notion of partial completion?

abeomor · 2026-06-05T20:07:28 1780690048

Hi there! PM from the PG AI team, working on both pg_durable and the AI pipeline layer.

ai.backfill() ignores that row-level state entirely and reprocesses everything from scratch. https://learn.microsoft.com/en-us/azure/horizondb/ai/ai-pipe...

pg_durable answers "did this workflow instance finish, and if it crashed, where do I resume?", completed/running/pending/failed per node + checkpoint replay. https://github.com/microsoft/pg_durable/blob/main/USER_GUIDE...

If you want this problem addressed better, please add an issue to the open-source repo, we would love to dig in. https://github.com/microsoft/pg_durable/issues

sgarland · 2026-06-05T21:53:54 1780696434

For one, Airflow (or anything external, for that matter) has no insight into DB load, so when devs slam 200 concurrent workers at the DB, other workloads may be impacted. In contrast, this could (I don’t think it does at this time) get near realtime feedback on performance without the RTT cost, and adjust itself accordingly.

booi · 2026-06-05T23:26:13 1780701973

it also feels strange to query for DB load before starting a job.. i'm not even sure how you would do it, how you would adjust a job given a load value, and what would you do if there's too much load.