Skip to content

OCPBUGS-62790: Restore agent MASTER_MEMORY default to 16GiB#1819

Open
zaneb wants to merge 1 commit into
openshift-metal3:masterfrom
zaneb:bootstrap-memory
Open

OCPBUGS-62790: Restore agent MASTER_MEMORY default to 16GiB#1819
zaneb wants to merge 1 commit into
openshift-metal3:masterfrom
zaneb:bootstrap-memory

Conversation

@zaneb

@zaneb zaneb commented Dec 1, 2025

Copy link
Copy Markdown
Member

The default MASTER_MEMORY was changed in some agent scenarios in order
to work around a problem in OKD where FCOS would use only 20% of RAM for
the ephemeral storage instead of 50%. Other scenarios were later changed
to match when tests started failing in 4.19 due to OCPBUGS-62790.

Return the defaults to the minimum values validated by assisted-service
in ABI, so that any future regressions like OCPBUGS-62790 are caught
immediately.

This depends on openshift/installer#10133, including backports to 4.20 and 4.19.

@openshift-ci

openshift-ci Bot commented Dec 1, 2025

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2025
@zaneb zaneb marked this pull request as ready for review December 16, 2025 19:40
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2025
@openshift-ci openshift-ci Bot requested review from bfournie and sadasu December 16, 2025 19:40
@zaneb

zaneb commented Dec 16, 2025

Copy link
Copy Markdown
Member Author

openshift/installer#10133 (4.21), openshift/installer#10140 (4.20), and openshift/installer#10153 (4.19) have all merged.

@zaneb

zaneb commented Dec 18, 2025

Copy link
Copy Markdown
Member Author

Tests appear to be still failing with "no space left on disk" when pulling images, even though the fix has worked as intended and created a separate tmpfs for the ostree (and even though there is still 815MiB free):

Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.2G  122M  3.1G   4% /run
tmpfs           7.9G  6.8G  1.1G  87% /run/ephemeral_base
/dev/loop0      7.8G  7.0G  815M  90% /run/ephemeral
tmpfs           4.0G  2.4G  1.7G  60% /usr
tmpfs           7.9G   84M  7.8G   2% /tmp

This implies that even without the coreos layering change, we would be exceeding the limit for 16GiB servers now. In my local testing 16GiB servers worked OK, so I am not sure what is different here.

@zaneb

zaneb commented Jan 9, 2026

Copy link
Copy Markdown
Member Author

/retest
This works fine locally with 4.21 nightly. Unclear why it is not working here (where we're also using 4.21 nightly).

@zaneb

zaneb commented Jan 9, 2026

Copy link
Copy Markdown
Member Author

/retest

@zaneb

zaneb commented Feb 10, 2026

Copy link
Copy Markdown
Member Author

Although it's still not clear to me why this always worked for me locally, it does appear that OCPBUGS-76450 is a likely reason for it failing here.

@zaneb

zaneb commented Feb 13, 2026

Copy link
Copy Markdown
Member Author

/retest

2 similar comments
@zaneb

zaneb commented Feb 15, 2026

Copy link
Copy Markdown
Member Author

/retest

@zaneb

zaneb commented Feb 23, 2026

Copy link
Copy Markdown
Member Author

/retest

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026
@zaneb zaneb force-pushed the bootstrap-memory branch from 3678ebd to 6332002 Compare March 30, 2026 22:01
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2026
@zaneb

zaneb commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

/retest

@zaneb

zaneb commented Apr 1, 2026

Copy link
Copy Markdown
Member Author

/test e2e-agent-compact-ipv4-iso-no-registry

1 similar comment
@zaneb

zaneb commented Apr 9, 2026

Copy link
Copy Markdown
Member Author

/test e2e-agent-compact-ipv4-iso-no-registry

@bfournie

bfournie commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Do we need more master memory when running iso-no-registry? It looks like its getting stuck at the hosts discovery page and log shows host status: <pending-for-input>"

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2026
The default MASTER_MEMORY was changed in some agent scenarios in order
to work around a problem in OKD where FCOS would use only 20% of RAM for
the ephemeral storage instead of 50%. Other scenarios were later changed
to match when tests started failing in 4.19 due to OCPBUGS-62790.

Return the defaults to the minimum values validated by assisted-service
in ABI, so that any future regressions like OCPBUGS-62790 are caught
immediately.
@zaneb zaneb force-pushed the bootstrap-memory branch from 6332002 to 804d1ef Compare June 8, 2026 04:23
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 8, 2026
@zaneb

zaneb commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Do we need more master memory when running iso-no-registry? It looks like its getting stuck at the hosts discovery page and log shows host status: <pending-for-input>"

Yes, log shows:

{ID:has-memory-for-role Status:failure Message:Require at least 17.15 GiB RAM for role master, found only 16.00 GiB}

I've upped it to 17.5GiB in that scenario. 16GiB seems to be fine for workers.

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown

@zaneb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-5control-ipv4 804d1ef link false /test e2e-agent-5control-ipv4
ci/prow/e2e-agent-ha-dualstack 804d1ef link false /test e2e-agent-ha-dualstack
ci/prow/e2e-agent-compact-ipv4-iso-no-registry 804d1ef link false /test e2e-agent-compact-ipv4-iso-no-registry
ci/prow/e2e-metal-ipi-serial-ipv4-1of2 804d1ef link true /test e2e-metal-ipi-serial-ipv4-1of2
ci/prow/e2e-metal-ovn-arbiter 804d1ef link false /test e2e-metal-ovn-arbiter
ci/prow/e2e-metal-ipi-virtualmedia 804d1ef link false /test e2e-metal-ipi-virtualmedia

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bfournie

bfournie commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

/approve

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bfournie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants