Skip to content

Specify timeout when running cmd_list in CleanUnknownWorker#182

Open
FrostyX wants to merge 1 commit into
praiskup:mainfrom
FrostyX:cmd-list-timeout
Open

Specify timeout when running cmd_list in CleanUnknownWorker#182
FrostyX wants to merge 1 commit into
praiskup:mainfrom
FrostyX:cmd-list-timeout

Conversation

@FrostyX
Copy link
Copy Markdown
Collaborator

@FrostyX FrostyX commented Apr 2, 2026

Fix #178

Comment thread resallocserver/manager.py Fixed
@FrostyX FrostyX force-pushed the cmd-list-timeout branch from 48a3478 to a29dcc0 Compare April 2, 2026 11:33
Comment thread resallocserver/manager.py Outdated
Comment thread resallocserver/manager.py Outdated
Comment thread resallocserver/manager.py Outdated
@FrostyX
Copy link
Copy Markdown
Collaborator Author

FrostyX commented Apr 24, 2026

I have an easy reproducer with all corner-cases here
https://gist.github.com/FrostyX/73379ff510cd7acf897925199575299e

I am planning to incorporate the code into this PR if we agree that the approach is correct.

Comment thread resallocserver/manager.py Fixed
@FrostyX FrostyX force-pushed the cmd-list-timeout branch 2 times, most recently from 2f46abe to 6baa202 Compare May 3, 2026 15:06
Comment thread resallocserver/manager.py
while True:
# How many seconds are left before we want to timeout
remaining = deadline - time.monotonic()
if remaining <= 0:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we put here some log message that tells it timeouted? It could be useful for debugging

Comment thread resallocserver/manager.py Outdated
remaining = deadline - time.monotonic()
if remaining <= 0:
sp.kill()
break
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on line 108 there is return {'status': 124} which returned 124 on timeout.... now because we break here, it exits but with a -9 as sigkill signal instead (eventhough the reason was timeout)

Comment thread resallocserver/manager.py
break

return {
'status': sp.wait(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix 6baa202#r3201157553 I think we need to add here 124 if timed_out else sp.wait(), to be consistent

Comment thread resallocserver/manager.py Outdated
# We encountered EOF but we still have unprocessed bytes
elif buffer:
# Treat partial line as a complete line
# TODO do we want this or rather discard the incomplete line?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would discard it for catch_stdout_lines_securely. A truncated resource name would end up in the unknown_resources set and trigger a bogus cmd_delete.

A truncated line like vm-00<MISSING> (instead of e.g. vm-002) would end up in unknown_resources and trigger a cmd_delete for a resource that either doesn't exist or shouldn't be deleted.

Suggested change
# TODO do we want this or rather discard the incomplete line?
if not catch_stdout_lines_securely:
buffer += b'\n'

@FrostyX FrostyX force-pushed the cmd-list-timeout branch from 6baa202 to 51ce4b7 Compare May 7, 2026 14:58
Comment thread resallocserver/manager.py Fixed
@FrostyX FrostyX force-pushed the cmd-list-timeout branch from 51ce4b7 to 0d114df Compare May 7, 2026 15:02
@FrostyX
Copy link
Copy Markdown
Collaborator Author

FrostyX commented May 7, 2026

Thank you for the review @nikromen, all good points. Updated, PTAL

Comment thread resallocserver/manager.py

# Either we get a readable file descriptors back or we get nothing
# because the `select` timeouted on `read`
ready, _, _ = select.select([sp.stdout], [], [], remaining)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use fileno() here, too, just so we are consistent

Comment thread resallocserver/manager.py
catch_stdout_bytes=5120,
catch_stdout_lines_securely=True,
timeout=120,
discard_partial_line=False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be true? With catch_stdout_lines_securely=True, we don't want a truncated resource name to end up in unknown_resources and trigger a bogus cmd_delete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't close tickets because of xmlrpc timeout

4 participants