Skip to content

refactor: migrate container_run_log_driver_syslog_test.go to nerdtest.Setup#4898

Open
ogulcanaydogan wants to merge 4 commits into
containerd:mainfrom
ogulcanaydogan:refactor/nerdctl-tigron-syslog-test
Open

refactor: migrate container_run_log_driver_syslog_test.go to nerdtest.Setup#4898
ogulcanaydogan wants to merge 4 commits into
containerd:mainfrom
ogulcanaydogan:refactor/nerdctl-tigron-syslog-test

Conversation

@ogulcanaydogan
Copy link
Copy Markdown
Contributor

Closes part of #4613.

What

Migrates cmd/nerdctl/container/container_run_log_driver_syslog_test.go from the legacy testutil.Base / testutil.NewBase pattern to the nerdtest.Setup / test.Case framework, following the same approach used in #4897 (multi_platform_linux_test.go) and #4641.

Design notes

The three tests (TestSyslogNetwork, TestSyslogFacilities, TestSyslogFormat) each expand a cross-product of parameters into independent sub-cases via buildSyslogSubTests:

  • Each sub-case manages its own syslog server lifecycle (start in Setup, receive in Cleanup) using closure-captured variables (addr, done, closer, containerName, tag, msg), which is necessary because the channel-based frame validation cannot be expressed through data.Labels().
  • CA and cert are shared from the outer fixture to sub-cases via pointer-to-pointer (**testca.CA, **testca.Cert) populated in the outer testCase.Setup.
  • testca.New requires testing.TB; it is passed as *testing.T through the newSyslogTestCase(t *testing.T) helper rather than helpers.T(), which only implements the narrower tig.T interface.
  • Network skip logic respects the rootless case with a distinct message per the existing convention in the repo.

Validator functions (rfc5424Validator, rfc3164Validator, emptyFormatValidator) are extracted as package-level helpers; their logic is identical to the originals.

Test plan

  • go vet ./cmd/nerdctl/container/... passes with no errors.
  • Logic is a mechanical translation of the existing test; no behaviour changes.

….Setup

Part of the ongoing Tigron migration tracked in issue containerd#4613.

Converts the three syslog log-driver tests (TestSyslogNetwork,
TestSyslogFacilities, TestSyslogFormat) from the legacy testutil.Base
pattern to the nerdtest.Setup / test.Case framework.

Key design decisions:
- The cross-product of (network x facility x format) is expanded into
  independent SubTests via buildSyslogSubTests, mirroring the structure
  of the original table-driven loop.
- Each sub-case owns its syslog server lifecycle (Start in Setup,
  receive in Cleanup) through closure-captured variables (addr, done,
  closer, containerName, tag, msg) so the channel-based validation
  can survive the Setup -> Command -> Cleanup split.
- CA and cert are shared from the outer fixture to sub-cases via
  pointer-to-pointer (**testca.CA, **testca.Cert) populated in the
  outer testCase.Setup and read by each sub-case.
- testca.New requires testing.TB; passed as *testing.T through the
  newSyslogTestCase(t) helper rather than helpers.T() which only
  implements the narrower tig.T interface.
- Network skip logic honours rootless: the rootless path produces a
  more descriptive skip message per the existing pattern in the repo.

Validator functions (rfc5424Validator, rfc3164Validator,
emptyFormatValidator) are extracted as package-level helpers, identical
in logic to the originals.

Signed-off-by: Ogulcan Aydogan <ogulcanaydogan@hotmail.com>
The runPacketSyslog goroutine in testsyslog has a 300ms window
(3 x 100ms deadlines) before it gives up and sends an empty string
on the done channel. When StartServer was called in Setup, the
Tigron framework overhead between Setup and Command could exceed
300ms, causing the goroutine to time out before the container sent
its first log entry -- resulting in rcvd="" and a validation failure.

Moving StartServer to the Command callback ensures the goroutine
starts immediately before nerdctl run -d, eliminating the gap.

Signed-off-by: Ogulcan Aydogan <ogulcanaydogan@hotmail.com>
@AkihiroSuda
Copy link
Copy Markdown
Member

=== Failing tests ===
TestLoadStdinFromPipe
TestSyslogNetwork
TestSyslogNetwork/udp_1_rfc5424
TestSyslogNetwork/udp_user_rfc5424
TestSyslogNetwork/unixgram_1_rfc5424
TestSyslogNetwork/unixgram_user_rfc5424

https://github.com/containerd/nerdctl/actions/runs/25862295284/job/75995732134?pr=4898

runPacketSyslog uses 4x100ms read deadlines (~400ms total). When
subtests run in parallel on slow arm runners, container startup latency
exceeds that window, causing the server to drain and send an empty
string on the done channel before the log entry arrives.

Setting NoParallel matches the original sequential t.Run behaviour.

Signed-off-by: Ogulcan Aydogan <ogulcanaydogan@hotmail.com>
@ogulcanaydogan
Copy link
Copy Markdown
Contributor Author

Thanks for catching this. The root cause: runPacketSyslog has a 4x100ms read window (~400ms total). Running the subtests in parallel (Tigron default) puts the arm runner under load, pushing container startup past that window so the server drains before the log entry arrives. Fixed by setting NoParallel: true on each case, matching the original sequential t.Run behaviour.

UDP and unixgram transports rely on a deadline-based loop in
runPacketSyslog. The original 4x100ms (400ms) window was too short
on slow ARM runners where container startup + syslog driver
initialization exceeds the budget, causing the server to close the
socket and send an empty string on the done channel.

Increase the pre-packet retry count from 3 to 20 (2s total). Once
the first datagram arrives, reset to the original 3 retries (400ms)
to drain any remaining bytes promptly.

Signed-off-by: Ogulcan Aydogan <ogulcanaydogan@hotmail.com>
@ogulcanaydogan
Copy link
Copy Markdown
Contributor Author

Thanks for catching this. The root cause is in runPacketSyslog: UDP and unixgram use a deadline-based read loop with a 4x100ms (400ms) hard cap. On slow ARM runners, container startup plus syslog driver initialization exceeds that budget, so the server closes the socket and sends an empty string on the channel before any datagram arrives.

Fixed in 66c2358 by increasing the pre-packet retry count from 3 to 20 (2s total). Once the first datagram arrives, it resets to 3 retries (400ms drain) to avoid lingering unnecessarily.

for fmtK, fmtValidFunc := range fmtValidFuncs {
fmtK := fmtK
fmtValidFunc := fmtValidFunc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just create a combination structure first and then later iterate over and run over them. I think its much readable and better

type params struct {
		network      string
		fPriK        string
		fPriV        syslog.Priority
		fmtK         string
		fmtValidFunc func(string, string, string, string, syslog.Priority, bool) error
	}
	var cobintaions []params
	for _, network := range networks {
		for rFK, rFV := range syslogFacilities {
			for _, fPriK := range []string{rFK, strconv.Itoa(int(rFV) >> 3)} {
				for fmtK, fmtValidFunc := range fmtValidFuncs {
					combos = append(combos, params{network, fPriK, rFV, fmtK, fmtValidFunc})
				}
			}
		}
	}

args = append(args, testutil.CommonImage, "echo", msg)
return helpers.Command(args...)
},
Expected: test.Expects(0, nil, nil),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use exit code constants from the tigron framework

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants