Skip to content

FollowUp: Move agreement negotiation off-chain, keep only mutual commitment on-chain#207

Open
danielbui12 wants to merge 26 commits into
devfrom
tung/provider_replay_window_fixes
Open

FollowUp: Move agreement negotiation off-chain, keep only mutual commitment on-chain#207
danielbui12 wants to merge 26 commits into
devfrom
tung/provider_replay_window_fixes

Conversation

@danielbui12

@danielbui12 danielbui12 commented Jun 17, 2026

Copy link
Copy Markdown
Member

Primary Changes

On-chain validity-window enforcement

  • valid_until must be in range [current block, current block + RequestTimeout]
  • Added an integrity_test asserting RequestTimeout < DeregisterAnnouncementPeriod — this is the core invariant that closes the re-register replay window (an old quote always expires before a provider can deregister + reregister).
  • ProviderMultiaddrUpdated event emit multiaddr.

Provider node: live chain-state sync

  • New chain_state_coordinator.rs introduces ChainState (atomic current_block + RwLock<provider_info>) holds state that frequently change.
  • Fetch RequestTimeout from pallet constant at startup.
  • Resolve valid_until = u32::MAX in /negotiate; stamps valid_until = current_block + request_timeout
  • Migrate @bkontur's reconciler to chain_state_coordiantor

Issue:

Add an upper bound on AgreementTerms.valid_until at both redemption
sites: a quote is now redeemable only within [now, now + RequestTimeout].
Previously valid_until was a lower bound only, leaving every quote
effectively immortal and replayable against a re-registered provider.

- New Error::TermsValidityTooLong rejecting quotes past the cap.
- integrity_test asserts RequestTimeout < DeregisterAnnouncementPeriod,
  making the timing invariant that closes the re-register replay hole a
  build-time check.
- Mock RequestTimeout lowered to 50 (< DeregisterAnnouncementPeriod=100);
  mock + benchmark term builders now produce in-window valid_until.
- Regression tests: too-long validity, re-register replay blocked by
  expiry, and nonce non-reuse after early termination.

No field rename, no ABI/wire change.
Added `client/src/block_subscription.rs`: a `BlockSubscriber` that subscribes to finalized
blocks, fetches all events per block once (every pallet), and dispatches them through a
registry of type-erased `EventPlugin`s. Each plugin pairs a parser with its own handler;
one plugin erroring or panicking is caught and logged — it never aborts the loop or starves
other plugins. The existing `EventSubscriber` / `EventParser` trait in `event_subscription.rs`
was left untouched; the new module reuses the `EventParser` seam.
Fixed a stale event type in `client/src/event_subscription.rs`: `StorageEvent::AgreementRequested`
was removed after agreement negotiation moved off-chain in commit `651726f`. The variant was dead
(parser arm could never fire) and the actual replacement events — `StorageAgreementEstablished` and
`ReplicaAgreementEstablished` — were falling through to `StorageEvent::Unknown`. The fix adds both
replacement variants with correct field shapes matching `pallet/src/lib.rs:626-641`, including a
`field_terms_scalars` helper that reads nested `AgreementTerms` scalar fields gracefully.
…r updates

Replace the plugin-based block subscriber with a Stream of raw finalized
blocks and have provider-node keep its ChainState live from it.

- pallet: ProviderSettingsUpdated / ProviderMultiaddrUpdated now carry the
  updated settings / multiaddr so off-chain consumers can apply changes
  without a re-fetch.
- storage-client: add matching StorageEvent variants + parser arms; replace
  the plugin-based BlockSubscriber (EventPlugin / ParserPlugin / isolation)
  with a Stream-based BlockSubscriberStream that yields raw finalized blocks;
  drop the now-unused plugin re-exports and example.
- provider-node: ChainStateCoordinator drives the stream in a single async
  loop — storing current_block every block and merging this provider's
  settings / multiaddr into the cached ProviderInfo in place, re-fetching the
  full info on registration.
Wire a finalized-block ChainStateCoordinator into the provider node so
it keeps current_block + provider_info in sync with the runtime, and use
current_block + request_timeout to issue bounded negotiate quotes instead
of valid_until: u32::MAX.

- ChainStateCoordinator/Handle subscribe to finalized blocks, update
  current_block on every block, and refresh provider_info on
  ProviderSettingsUpdated/ProviderMultiaddrUpdated events.
- request_timeout bootstrapped from the on-chain StorageProvider::RequestTimeout
  constant via ProviderClient::fetch_request_timeout + substrate constants module.
- negotiate_terms computes valid_until = current_block.saturating_add(request_timeout)
  and returns 503 ChainStateNotReady when either is 0.
…n chain-state readiness

- provider-node: seed a fresh NonceCounter when no on-chain replay state
  exists yet, so newly-registered (or soon-to-be) providers can negotiate
- provider-node: return ChainStateNotReady from /negotiate when the nonce
  counter is absent, and clarify the signing-unavailable error message
- papi demo: add getProviderNodeInfo and wait for the provider node to sync
  registration settings before proceeding; log block numbers; bump demo price
Comment thread examples/papi/common.js Fixed
@danielbui12 danielbui12 force-pushed the tung/provider_replay_window_fixes branch from 4308e19 to d955615 Compare June 17, 2026 07:37
@danielbui12 danielbui12 marked this pull request as ready for review June 17, 2026 08:00
@danielbui12 danielbui12 requested a review from mudigal June 17, 2026 08:01
@danielbui12

Copy link
Copy Markdown
Member Author

@mudigal there is a limitation in provider-node coverage: it is not able to run integration tests related to runtime due to the lack of blockchain startup in the check.yml job. Should we add start-chain to check.yml to increase test coverage, or do you want to keep it as simple as it is?

Reconcile the merge of origin/dev: adopt dev's reconciler-based provider_info
+ bootstrappable nonce_counter + readiness reporting, while keeping this
branch's replay-window fix (bounded valid_until = current_block +
request_timeout, ChainStateNotReady guard).

Move RequestTimeout fetching into the background reconciler (request_timeout
is now AtomicU32) so it self-heals if the chain was unreachable at boot, and
split reconcile_once into focused helpers.
The merge left provider registration info in two places: ProviderState's own
provider_info (reconciler-maintained) and chain_state.provider_info
(coordinator-maintained). Remove the former and route /negotiate, /info, and
the reconciler through chain_state.provider_info as the single source of truth.
…tate

Consolidate all on-chain state the node needs at runtime into ChainState,
with the chain-state coordinator as its only writer. Fixes the dual-writer
race between the polling reconciler and the event coordinator (review #2),
the dropped settings/multiaddr events when provider_info was None (#3), and
removes duplicated ChainState tests (#5).

- ChainState gains `constants` (PalletConstants/RequestTimeout) and
  `nonce_counter`; coordinator fetches constants once per connect, does an
  initial full sync for restart recovery, and re-fetches the full
  ProviderInfo on any relevant provider event so committed_bytes/stake/
  settings stay consistent (no field-patching).
- Drop ProviderState.request_timeout and .nonce_counter; /negotiate and
  /info now read from chain_state.
- Remove the polling reconciler (spawn_chain_reconciler + helpers) and the
  now-unused --reconcile-interval-secs CLI flag and its CI usage.
- examples/papi: replace the broken object-equality registration poll with a
  /info readiness-flag poll plus a hard timeout (review #1).
The method was never called anywhere (single definition, zero call sites).
The coordinator uses its own parse_pallet_events free function which
also filters by pallet name — the stronger, correct implementation.
Closes review finding #4.
- pallet: skip providers with deregister_at set in query_find_matching_providers
- client: surface deregister_at on ProviderInfo; add DeregisterAnnounced,
  ProviderDeregistered, and DeregisterCancelled StorageEvent variants + parsing
- provider-node: coordinator refreshes provider_info on all three deregister
  events; /negotiate returns 503 provider_deregistering when deregister_at is
  set; /info exposes a deregistering readiness flag
- tests: integration coverage for deregister lifecycle (announce, complete,
  cancel) and /info readiness flags
@danielbui12 danielbui12 requested a review from bkontur June 18, 2026 12:33
@danielbui12 danielbui12 self-assigned this Jun 18, 2026
@danielbui12 danielbui12 added the enhancement New feature or request label Jun 18, 2026
@bkontur

bkontur commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

@danielbui12 my assistant says:

The async connect loops are hard to unit-test, but parse_pallet_events and the relevance-filter in process_provider_events are pure and testable with hand-constructed events. Adding those should recover the gate.

Comment thread pallet/src/lib.rs
// more blocks), so an old quote cannot be replayed against the new
// incarnation.
assert!(
T::RequestTimeout::get() < T::DeregisterAnnouncementPeriod::get(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just compile-time check, but we allow to change those with pub storage and setStorage root calls (those calls also does not catch this invariant).

We should definitely add fn try_state checks also, to hit the stuff when doing runtime upgrade at least (@danielbui12 please, create follow-up issue for adding try_state checks and cover some important).

max_bytes: maxBytes,
duration,
price_per_byte: 1n,
price_per_byte: 100n,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielbui12 why 1->100?

Comment thread client/src/provider.rs
.ok_or_else(|| ClientError::Chain("Missing 'accepting_extensions'".to_string()))?,
agreements_total,
challenges_failed,
deregister_at: named_field(&value, "deregister_at").and_then(|v| match &v.value {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielbui12 @ilchu not sure, if this storage item parsing is the right thing, we should use runtime api's or something typed as mentioned here: #125 (comment)

Comment thread client/src/provider.rs
pub max_capacity: u64,
}

/// Mirror StorageProvider::ProviderSettings

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielbui12 where do we need there Default impl? this can easily go out of sync. Can we remove this default impl?

@bkontur

bkontur commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

@danielbui12 also please check the up-to-date docs, my assistant says:

/negotiate gate weakened to is_some(); stale is_bootstrapped comments + misleading test comment — restore gate or fix comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider-side negotiation workflow (manual approve / reject / counter)

2 participants