Node Maintenance & Cordoning
How to safely cordon Curio nodes for maintenance without disrupting in-progress sealing pipelines.
What cordoning does
Running curio cordon (or setting unschedulable = true in the WebUI) tells the Harmony scheduler to stop scheduling new tasks on a node. Tasks that are already running will finish, but no new work will be picked up.
This is intentionally simple: cordon → wait for running tasks to finish → do maintenance → restart → uncordon.
How cordoning affects the sealing pipeline
Cordoning blocks all new task scheduling on the node, including pipeline continuation tasks like TreeD, TreeRC, SyntheticProofs, and Finalize. If a sector's data is location-bound to the cordoned node (the sector cache lives on local storage), those follow-up tasks cannot run on another node either.
What this means in practice:
If you cordon a node that has sectors mid-pipeline (e.g., SDR completed but TreeRC not yet started), those sectors will be paused until the node is uncordoned.
The sectors are not lost — they will resume once the node is uncordoned and the scheduler picks them up again.
However, if the node stays cordoned for too long, sectors may expire (miss their precommit deadline), wasting the SDR work.
Non-batched sealing operators: SDR takes many hours. If you cordon a node and leave it cordoned past the sector's precommit deadline, that SDR work is lost. Plan maintenance windows accordingly.
Recommended workflows
Quick maintenance (restart, upgrade, config change)
For short interruptions where downtime is minutes, not hours:
Check the pipeline — in the WebUI, verify what stages are in progress on the node.
curio cordon <node>— stop new work from being scheduled.Wait for currently-running tasks to complete (watch the WebUI or logs).
Do your maintenance (restart, upgrade, etc.).
curio uncordon <node>— resume scheduling.
Pipeline sectors that were paused (waiting for their next stage) will resume automatically after uncordon.
Planned extended maintenance (hours)
If the node will be down for an extended period:
Stop starting new sectors first — pause deal intake or CC sector creation so no new SDR work begins on the node.
Wait for in-progress pipelines to clear — let sectors finish through Finalize before cordoning. Monitor via the pipeline view in the WebUI.
curio cordon <node>once pipelines are drained.Do your maintenance.
curio uncordon <node>when ready.
Decommissioning a node or long outage
If you know a node will be offline for a long time (longer than precommit deadlines):
Attach the node's storage to another node — use
curio attachon a different machine to make the sector data accessible elsewhere.Other nodes with access to the storage paths can then pick up the remaining pipeline tasks.
Cordon and shut down the original node.
See Storage Configuration for details on attaching storage.
Key points to remember
Running tasks
Finish normally on the cordoned node
New task scheduling
Blocked — no new work starts
Location-bound pipeline tasks
Paused until uncordon (data is on local storage, other nodes can't run them)
Sector data
Safe — nothing is deleted or moved by cordoning
Precommit deadlines
Still apply — sectors can expire if cordoned too long
Common mistakes
Cordoning during active SDR without a plan to uncordon quickly. SDR is the longest pipeline stage. If you cordon right after SDR completes, the follow-up stages (TreeD, TreeRC, etc.) are blocked. Either wait for the full pipeline to clear first, or ensure you'll uncordon in time.
Forgetting to uncordon after maintenance. The node will sit idle, and any paused pipelines will remain stuck. Set a reminder or check the WebUI after maintenance.
Cordoning when you should be migrating storage. If the node is going away permanently, cordon alone won't help — you need to move or re-attach the storage paths so other nodes can access the sector data.
Last updated