Skip to content

Submission cleanup only recovers Running submissions, not Submitted/Preparing/Scoring #2413

Description

@hanane-ca

Problem

The submission_status_cleanup() task only recovers submissions stuck in Running state. Submissions stuck in Submitted, Preparing, or Scoring will hang forever and never be cleaned up.

Root Cause

In src/apps/competitions/tasks.py, the cleanup task filters for:

submissions = Submission.objects.filter(
    status=Submission.RUNNING,  # Only Running!
    has_children=False,
).select_related('phase', 'parent')

Additionally, the task uses started_when to calculate the deadline, which is null for submissions that never reached Running state.

Impact

  • Submissions can get stuck before reaching Running (during submission queue processing, preparation, or scoring re-enqueue)
  • No recovery mechanism exists for these states
  • Users see permanently stuck submissions with no way to recover

This bug was discovered during the EEG Foundation Challenge incident analysis.

Solution

  1. Extend cleanup to all non-terminal states: Submitted, Preparing, Running, Scoring
  2. Add fallback logic: Use created_when when started_when is null
  3. Same deadline calculation: 24h + execution_time_limit from reference_time

New Flow

non_terminal_statuses = [
    Submission.SUBMITTED,
    Submission.PREPARING,
    Submission.RUNNING,
    Submission.SCORING,
]
submissions = Submission.objects.filter(
    status__in=non_terminal_statuses,
    has_children=False,
).select_related('phase', 'parent')

for sub in submissions:
    # Use started_when for Running, created_when as fallback for others
    reference_time = sub.started_when if sub.started_when else sub.created_when
    deadline = reference_time + timedelta(
        milliseconds=(3600000 * 24) + sub.phase.execution_time_limit
    )
    
    if now() > deadline:
        sub.cancel(status=Submission.FAILED)

Testing

Comprehensive test suite included:

  • Unit tests: src/apps/competitions/tests/test_submissions.py (4 new tests)
  • Integration tests: tests/k6/ (K6 orchestrator + conservation harness)

Run integration tests:

cd tests/k6
./run_cleanup_test.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions