Skip to content

feat: transactions with retry#5464

Open
sbackend123 wants to merge 21 commits into
masterfrom
feat/new-gas-estimation
Open

feat: transactions with retry#5464
sbackend123 wants to merge 21 commits into
masterfrom
feat/new-gas-estimation

Conversation

@sbackend123

@sbackend123 sbackend123 commented May 18, 2026

Copy link
Copy Markdown
Contributor

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Add automatic gas-fee retry for Ethereum transactions in Bee. Transactions that stay unconfirmed are re-broadcast with the same nonce and an escalated priority fee, using dynamic fees from eth_feeHistory. Retry behaviour is configurable, survives node restarts, and is used for redistribution and postage operations.

Diagram in miro

What changed

  1. Transaction retry (pkg/transaction)
    New SendWithRetry on the transaction service: sends EIP-1559 transactions with dynamic fee estimation from eth_feeHistory and automatic escalation across priority tiers. Requires automatic gas pricing: a request carrying an explicit Gas-Price is rejected.

Fees are derived from fresh fee history (default window: last 100 blocks) at percentiles 10 / 50 / 90, mapped to tiers low / market / aggressive. maxFeePerGas = 2 × baseFee + tip.

Sending walks a tier range [start_tier … end_tier]. Each tier gets 2 broadcast attempts (internal constant). If no receipt arrives within the retry delay, the tx is rebroadcast — reusing the same nonce — at the same tier (with fresh fee history) or escalated to the next tier once the per-tier budget is spent.

Persists retry state in the state store and resumes in-flight retries after restart, skipping nonces already confirmed on-chain.

Stops on non-retryable errors.

  1. Configuration (cmd/bee, pkg/node)
    CLI flags (defaults: start market, ceiling aggressive, 1 min delay, no price cap):

--transaction-fee-priority — starting fee tier (low / market / aggressive), default market
--transaction-fee-priority-max — escalation ceiling tier, default aggressive
--transaction-retry-delay — wait for a receipt before escalating, default 1m
--transaction-fee-max-tx-price-wei — max maxFeePerGas in wei, 0 = no limit
--fee-history-block-count — fee-history window depth, default 100

Constraint: priority <= priority-max, otherwise the node fails to start. Number of attempts per tier and the +15% bump are internal, not exposed as flags.

  1. Call sites
    Redistribution (pkg/storageincentives/redistribution): commit / reveal / claim use SendWithRetry via sendAndWait. Config-only — no header overrides.

Postage (pkg/postage/postagecontract): batch create, top-up, dilute, approve use SendWithRetry when retry is not disabled; otherwise fall back to Send + WaitForReceipt.

More detailed description is in doc

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

#5114

Screenshots (if appropriate):

AI Disclosure

  • This PR contains code that has been generated by an LLM.
  • I have reviewed the AI generated code thoroughly.
  • I possess the technical expertise to responsibly review the code generated in this PR.

@acud acud left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots and lots of code here, i think some complexity can be reduced and readability can be improved

Comment thread pkg/node/node.go Outdated
WhitelistedWithdrawalAddress []string
}

func txRetryConfigFromOptions(o *Options) transaction.TransactionsRetryConfig {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we already have the same logic in the cmd package (which precedes this package execution on runtime), wouldn't it make sense to build the config just once and pass it to this package as the right type?

Comment thread pkg/transaction/wrapped/wrapped.go Outdated
}

var rewardPerc []float64
if len(feeHistoryRewardPercentiles) >= 3 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this a bit confusing. why do we need this? why not just inject the default value and eliminate the choices here?

Comment thread pkg/transaction/send_tx_with_retry.go Outdated
return new(big.Int).Div(new(big.Int).Mul(new(big.Int).Set(tip), big.NewInt(int64(100+increasePct))), big.NewInt(100))
}

// suggestGasFeeGasTipCapWithHistory returns maxFeePerGas (gasFeeCap) and maxPriorityFeePerGas (gasTipCap)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight headache from this method: long variable names and this comment that tries to explain it all makes it really difficult to read for the amount of lines that it has. i'd rather convert this comment to inline comments explaining the branching, and reduce the variable/method names in the PR in general (adding more words to the method name doesn't necessarily add value when reading it)

_ = t.store.Delete(pendingTransactionKey(state.LastTxHash))
}
}
func (t *transactionService) retry(ctx context.Context, txRetryKey string, request *TxRequest) (common.Hash, *types.Receipt, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really hard to review. i'm not sure about this. also, i'm not sure i fully understand why statestore persistence is needed. do we really need to persist all of the data every time we submit a transaction? apart from the nonce and the transaction data, there's no guarantee that the extra data persisted around a transaction will be relevant once a node restarts (it can just be again inferred via RPC calls and configuration)

@sbackend123 sbackend123 Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this briefly on the call, but leaving it here for the record:

I think persistence is needed because:

  1. We choose the next transaction nonce based in part on the transactions we are currently tracking. In this case, we cannot avoid persisting the transaction altogether. And we cannot persist it only once — with each rebroadcast the hash changes because the fees change, and from a monitoring perspective we would end up with several “stuck” transactions sharing the same nonce.

  2. With the default 1 min delay, in the worst case — even with just two adjacent tiers (low → market, market → aggressive) and a price spike — retries can take 3–4 minutes. If the node is shut down during that window, the transaction may already be in the mempool and even mined, but user would have no way to know. As a result, user would send it again and lose fees, and that is exactly the scenario we are trying to avoid in the first place.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mining while node is shut down is a separate use case that needs to be handled separately by the startup logic. what i meant is that if the node is offline, the fee pricing fee goes "out-of-fashion" just because the fee snapshot may be completely different with different values once you start the node next time.

Comment thread pkg/transaction/send_tx_with_retry.go Outdated
gasFeeCapWithEscalatedTip := new(big.Int).Add(new(big.Int).Set(gasFeeCap), escalatedGasTip)
gasFeeCapWithPreviousTip := new(big.Int).Add(new(big.Int).Set(gasFeeCap), prevGasTipCap)

t.logger.V(1).Register().Debug("suggest gas fees for retry",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the register call? why can't you just use t.logger

Comment thread pkg/transaction/send_tx_with_retry.go Outdated
signedTx, err := t.broadcastTx(ctx, request, nonce, txState.PreviousTip, attempt)
if err != nil {
if isErrCritical(err) {
t.logger.Error(err,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using t.logger here and not loggerV1, not sure why both usages are needed

Comment thread pkg/transaction/send_tx_with_retry.go Outdated
}

exhaustionErr := fmt.Errorf("transaction failed after %d attempts (nonce=%d, description=%s)", t.txMaxRetries, txState.Nonce, txState.Description)
t.logger.Error(exhaustionErr,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite a few cases here of log and return error which is against the style guide. in general a bit too much logging i'd say, not sure if this is needed for merging into trunk but ok for now if it is needed for debugging purposes

Comment thread pkg/transaction/send_tx_with_retry.go Outdated
return nil
}

func isErrCritical(err error) bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notRetriable?

@sbackend123 sbackend123 self-assigned this Jun 3, 2026
@sbackend123 sbackend123 marked this pull request as ready for review June 3, 2026 23:07
// escalation. Each tier gets attemptsPerTier broadcast rounds with fresh eth_feeHistory
// data. A +15% mempool bump floor is applied to ensure replacement transactions are accepted.
func (t *transactionService) SendWithRetry(ctx context.Context, request *TxRequest) (txHash common.Hash, receipt *types.Receipt, err error) {
if request.GasPrice != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gas-Price header is broken for postage stamp ops. SendWithRetry rejects any request with an explicit GasPrice (send_tx_with_retry.go:51-56), but the postage call sites (postagecontract/contract.go:168 and :212) still copy GasPrice from the header and route to SendWithRetry by default. So POST /stamps/... with Gas-Price set and no Disable-Retry always fails and since the error isn't a matched sentinel, it surfaces as an opaque HTTP 500 instead of a 400. This is a regression (the old Send path honored Gas-Price) and it affects bee-js.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that.

I see three possible approaches:

  1. Make the error explicit and add Disable-Retry usage in bee-js when Gas-Price is set. This is a breaking change, but on the other hand we are introducing new functionality.

  2. If Gas-Price is set, automatically route to Send without retry.

  3. Keep the old behavior as the default, and instead of a Disable-Retry flag, introduce the opposite — e.g. a With-Retry opt-in flag.

Given existing clients, option 3 seems the most logical to me. I don’t like option 2 because it’s implicit behavior, and we may end up having to revisit it in a later release anyway — which would likely bring us back to option 1.

@gacevicljubisa WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gas-Price is a spend ceiling, retry contradicts it, and it should act as a client override. I would go with option 2. I would even consider droping the Disable-Retry, if you set Gas-Price you've already opted out of retry. If Gas-Price is not used, pricing is given to the node and retry is just part of that. But we should document this in Swarm.yaml that Gas-Price disables escalation.

@martinconic martinconic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds new API headers like Disable-Retry and Fee-Priority, but they're not documented in the OpenAPI spec. Please add all necessary parameters in the corresponding yaml files

Comment thread pkg/transaction/send_tx_with_retry.go Outdated
"tx_hash", txState.LastTxHash,
"nonce", txState.Nonce,
"to", request.To.String(),
"value", request.Value.String(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both To and Value and will cause a panic if nil, right?

Comment on lines +91 to +96
if previousTip != nil && previousTip.Sign() > 0 {
bumpedTip := applyMempoolBump(previousTip)
if tip.Cmp(bumpedTip) < 0 {
tip = bumpedTip
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same floor should apply to the fee cap as well, not only for the tip?

Comment on lines +134 to +137
gasLimit, err := t.estimateGasLimit(ctx, request)
if err != nil {
return nil, err
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous attempt for gas limit as well can be reused

if state.LastTxHash != (common.Hash{}) {
if !keepLast {
_ = t.store.Delete(pendingTransactionKey(state.LastTxHash))
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if keepLast, should we call t.waitForPendingTx(state.LastTxHash) because the last one may still be mined?

loggerName = "redistributionContract"
loggerName = "redistributionContract"
// BoostTipPercent is used where the node sends transactions without retry.
BoostTipPercent = 50

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now just used in one place (agent.go) and i wonder if that callsite also needs to be migrated or if it is covered in a subsequent PR

vals = append(vals, new(big.Int).Set(row[idx]))
}
if len(vals) == 0 {
return big.NewInt(0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to slow things down about this, but if there's no entries here, does it mean that all subsequent transactions sent using this quantile's tip value will fail because the value they're using is 0?


// ParseFeePriority validates a fee priority tier name (for API headers and config).
func ParseFeePriority(s string) (string, error) {
t, err := parseFeeTier(s)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this function needed for? why can't you just inline the inner unexported function here?

// SendWithRetry sends an EIP-1559 transaction using fee-history tiers with automatic
// escalation. Each tier gets attemptsPerTier broadcast rounds with fresh eth_feeHistory
// data. A +15% mempool bump floor is applied to ensure replacement transactions are accepted.
func (t *transactionService) SendWithRetry(ctx context.Context, request *TxRequest) (txHash common.Hash, receipt *types.Receipt, err error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i have a feeling this is a little bit misplaced. also, the fact that another metrics collector has been created here (which also partially overlaps with the metrics collector in the wrapped package smells like it should really be there alongside, and you should just reuse the existing metrics collector there.


gasFeeCap, gasTipCap, err := t.suggestGasFeeForTier(ctx, tier, previousTip)
if err != nil {
return nil, err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if the suggest gas fee failed for whatever reason, then the whole transaction is fail and not even sent? there is a soft-fail there which fails when fh is empty

nonce = *fixedNonce
} else {
t.lock.Lock()
defer t.lock.Unlock()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

danger... this will only unlock once the whole method runs to completion.
see example here: https://go.dev/play/p/Cg6j6EpNtKl

_ = t.store.Delete(pendingTransactionKey(state.LastTxHash))
}
}
func (t *transactionService) retry(ctx context.Context, txRetryKey string, request *TxRequest) (common.Hash, *types.Receipt, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mining while node is shut down is a separate use case that needs to be handled separately by the startup logic. what i meant is that if the node is offline, the fee pricing fee goes "out-of-fashion" just because the fee snapshot may be completely different with different values once you start the node next time.

"description", request.Description)
}

monitorLast := errors.Is(terminateTxErr, ErrTxMaxPriceExceeded) || errors.Is(terminateTxErr, ErrAllAttemptsExhausted)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't understand what is monitorLast... it is just a bool, what does it mean

}

monitorLast := errors.Is(terminateTxErr, ErrTxMaxPriceExceeded) || errors.Is(terminateTxErr, ErrAllAttemptsExhausted)
t.deleteRetryStateAndPending(txRetryKey, txState, monitorLast)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure you're not ending up with stale zombie db records here? for me this bool toggle is a bit tricky since the downstream logic may or may not remove some of the database records and that may lead to statestore growth. i see that there is some unit test coverage for db entry assertion (for deletion) but it's another 1000 lines of machine written code to go through

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants