Skip to content

Commit c7a2b59

Browse files
authored
Merge pull request #4 from renderinc/sync-skills
Sync skills from render-oss/skills
2 parents f928bb2 + 614706a commit c7a2b59

72 files changed

Lines changed: 7416 additions & 225 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
name: render-background-workers
3+
description: >-
4+
Sets up and configures background workers on Render for queue-based job
5+
processing. Use when the user needs to process async jobs, consume from a
6+
queue, run Celery/Sidekiq/BullMQ/Asynq/Oban workers, handle graceful
7+
shutdown with SIGTERM, wire a worker to Key Value (Redis), or choose between
8+
workers and cron jobs for background work.
9+
Trigger terms: background worker, async jobs, queue consumer, Celery,
10+
Sidekiq, BullMQ, Asynq, Oban, job processing, SIGTERM, graceful shutdown.
11+
license: MIT
12+
compatibility: Render background worker services
13+
metadata:
14+
author: Render
15+
version: "1.0.0"
16+
category: compute
17+
---
18+
19+
# Render Background Workers
20+
21+
This skill explains **worker** services on Render: processes that **consume jobs from a queue** instead of serving HTTP. Pair with **render-blueprints**, **render-env-vars**, and **render-networking** when wiring `render.yaml` and private connectivity.
22+
23+
## When to Use
24+
25+
- Designing or debugging **queue-backed workers** (Celery, Sidekiq, BullMQ, Asynq, etc.)
26+
- Choosing between a **worker**, **Cron Job**, or **Workflow** for background work
27+
- Configuring **Render Key Value** as a **broker** (not a cache) with correct **eviction policy**
28+
- Implementing **graceful shutdown** so in-flight jobs are not lost on deploy
29+
30+
Per-framework setup and signal-handling detail: `references/queue-framework-setup.md`, `references/graceful-shutdown.md`.
31+
32+
## How Workers Work
33+
34+
- **Long-running services** with **no inbound (HTTP) traffic**. Render does not expose a public URL or internal hostname for workers the way it does for web or private services—**workers cannot receive private network traffic directed at them**.
35+
- The typical pattern is a **poll loop**: the process connects to a **queue backend** (often **Render Key Value**, Redis-compatible **Valkey 8**) and **pulls jobs**.
36+
- Workers **can initiate outbound connections** on the private network—to **PostgreSQL**, **Key Value**, **private services**, **web services** (internal URLs), and the public internet—subject to your plan and firewall rules.
37+
38+
## Queue Framework Overview
39+
40+
| Framework | Language | Queue backend | Notes |
41+
|-----------|----------|---------------|--------|
42+
| Celery | Python | Redis / Key Value | Most common Python task queue |
43+
| Sidekiq | Ruby | Redis / Key Value | Standard for Rails |
44+
| BullMQ | Node.js | Redis / Key Value | Modern Node queue (Redis-based) |
45+
| Asynq | Go | Redis / Key Value | Go async task processing |
46+
| Oban | Elixir | **Postgres** (not Redis) | Queue stored in the database |
47+
48+
## Pairing with Key Value
49+
50+
- Use **Render Key Value** as the **job broker** when your framework expects Redis.
51+
- Set **maxmemory policy** to **`noeviction`**. **`allkeys-lru`** and similar policies are for **caches**; evicting queue keys **drops jobs**.
52+
- Wire **`REDIS_URL`** (or your framework’s equivalent) via **`fromService`** with `type: keyvalue` and `property: connectionString` in the Blueprint.
53+
- **Blueprints require `ipAllowList`** on Key Value—include the CIDRs that should reach the instance (often `[]` for private-network-only access; see **render-blueprints** / Key Value field reference).
54+
55+
See `references/queue-framework-setup.md` for minimal app + YAML examples.
56+
57+
## Worker vs Cron vs Workflow
58+
59+
| Need | Use | Why |
60+
|------|-----|-----|
61+
| Always-on queue consumer | **Background Worker** | Polls continuously; long-lived process |
62+
| Periodic scheduled task | **Cron Job** | Runs on a schedule, **exits**; **12h max** per run |
63+
| Distributed parallel compute | **Workflow** | Each run gets its own instance; fan-out patterns |
64+
| High-volume or bursty jobs | **Workflow** | Scales per run; **no idle instance cost** between runs |
65+
66+
## Graceful Shutdown
67+
68+
- Before stopping an instance, Render sends **`SIGTERM`**, then waits up to **`maxShutdownDelaySeconds`** (**1–300**, **default 30**) before **`SIGKILL`**.
69+
- Workers should: **(1)** stop accepting new jobs, **(2)** finish the current job or **checkpoint** progress, **(3)** close connections, **(4)** exit **0**.
70+
- Set **`maxShutdownDelaySeconds`** to at least your **longest safe job duration** (see Dashboard or Blueprint).
71+
72+
Language- and framework-specific handlers: `references/graceful-shutdown.md`.
73+
74+
## Blueprint Configuration
75+
76+
Minimal pattern: **`type: worker`**, **`runtime`**, **`buildCommand`**, **`startCommand`**, and **`envVars`** wired from Key Value.
77+
78+
```yaml
79+
services:
80+
- type: keyvalue
81+
name: jobs
82+
plan: starter
83+
region: oregon
84+
ipAllowList: []
85+
86+
- type: worker
87+
name: task-worker
88+
runtime: python
89+
region: oregon
90+
plan: starter
91+
buildCommand: pip install -r requirements.txt
92+
startCommand: celery -A tasks worker --loglevel=info
93+
envVars:
94+
- key: REDIS_URL
95+
fromService:
96+
name: jobs
97+
type: keyvalue
98+
property: connectionString
99+
```
100+
101+
Optional: **`maxShutdownDelaySeconds`** on the worker service for longer draining jobs.
102+
103+
## References
104+
105+
| Topic | File |
106+
|--------|------|
107+
| Celery, Sidekiq, BullMQ, Asynq, Oban setup + YAML | `references/queue-framework-setup.md` |
108+
| SIGTERM, `maxShutdownDelaySeconds`, per-language patterns | `references/graceful-shutdown.md` |
109+
110+
## Related Skills
111+
112+
- **render-deploy** — First deploy, CLI, service creation
113+
- **render-blueprints** — Full `render.yaml` schema, `fromService`, projects
114+
- **render-networking** — Private URLs, what can call what
115+
- **render-scaling** — Worker plans, instance counts, limits
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
{
2+
"skill_name": "render-background-workers",
3+
"models": [
4+
"claude-sonnet-4-6",
5+
"gpt-5.4",
6+
"gemini-3.1-pro-preview"
7+
],
8+
"evals": [
9+
{
10+
"id": 1,
11+
"prompt": "I need to set up a BullMQ worker in Node.js on Render to process async jobs. I want it connected to a Redis-compatible queue. Can you give me a complete render.yaml blueprint and explain the Key Value eviction policy I should use?",
12+
"expected_output": "The agent provides a complete render.yaml with a Key Value service (ipAllowList, noeviction policy note) and a worker service (type: worker, runtime: node, buildCommand: npm ci, startCommand: node worker.js) with REDIS_URL wired via fromService. It explains that maxmemory-policy must be set to noeviction to prevent queue keys from being evicted.",
13+
"files": [],
14+
"integrations": {},
15+
"assertions": [
16+
"The response includes a render.yaml with type: keyvalue and type: worker services",
17+
"The worker service uses runtime: node and startCommand referencing node worker.js or equivalent",
18+
"REDIS_URL is wired using fromService with type: keyvalue and property: connectionString",
19+
"The response explicitly states the Key Value maxmemory policy must be set to noeviction",
20+
"The Key Value service includes an ipAllowList field",
21+
"The response explains that allkeys-lru or similar eviction policies are for caches and will drop jobs"
22+
]
23+
},
24+
{
25+
"id": 2,
26+
"turns": [
27+
"I want to run background jobs on Render. Should I use a background worker or a cron job?",
28+
"The jobs are triggered by user actions and need to be processed as soon as possible. There could be bursts of hundreds of jobs at once."
29+
],
30+
"expected_output": "The agent first asks clarifying questions about the job trigger pattern (scheduled vs event-driven) and volume. After the user clarifies that jobs are user-triggered and bursty, the agent recommends a Background Worker for always-on queue consumption, or a Workflow for high-volume bursty scenarios, explaining the tradeoffs. It explains cron jobs are for scheduled periodic tasks that exit, not queue consumers.",
31+
"files": [],
32+
"integrations": {},
33+
"assertions": [
34+
"In the first turn, the agent asks at least one clarifying question about the job trigger pattern or volume",
35+
"After the user clarifies bursty user-triggered jobs, the agent recommends a Background Worker or Workflow",
36+
"The agent explains that Cron Jobs are for scheduled periodic tasks and have a 12-hour max run limit",
37+
"The agent mentions that Workflows scale per run and have no idle instance cost, making them suitable for bursty workloads",
38+
"The agent distinguishes between always-on queue consumers (Background Worker) and scheduled tasks (Cron Job)"
39+
]
40+
},
41+
{
42+
"id": 3,
43+
"prompt": "My Celery workers on Render are losing jobs during deploys. How do I implement graceful shutdown? I need jobs that can take up to 2 minutes to finish. Show me the signal handler code and the render.yaml config.",
44+
"expected_output": "The agent explains Render's SIGTERM -> wait -> SIGKILL sequence, provides a Python SIGTERM signal handler using signal.signal, recommends setting maxShutdownDelaySeconds to at least 120 (2 minutes) plus buffer, and shows the render.yaml worker config with maxShutdownDelaySeconds set appropriately. It also mentions Celery's worker_shutting_down lifecycle signal and the importance of idempotent tasks.",
45+
"files": [],
46+
"integrations": {},
47+
"assertions": [
48+
"The response explains that Render sends SIGTERM first, waits maxShutdownDelaySeconds, then sends SIGKILL",
49+
"The response recommends setting maxShutdownDelaySeconds to at least 120 seconds (2 minutes) to cover the longest job duration",
50+
"The response includes a Python signal handler using signal.signal(signal.SIGTERM, ...) or references Celery's worker_shutting_down signal",
51+
"The render.yaml snippet includes maxShutdownDelaySeconds on the worker service set to 120 or higher",
52+
"The response advises stopping the consumer loop (stop dequeuing) as the first step on SIGTERM",
53+
"The response mentions job idempotency or checkpointing to handle retries safely"
54+
]
55+
}
56+
]
57+
}
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Graceful shutdown for Render workers
2+
3+
Render stops worker instances during **deploys**, **manual restarts**, and **scale-in** events. Your process must **exit cleanly** within the configured window or work may be **lost** or **duplicated** (depending on your queue’s ack/retry semantics).
4+
5+
## Platform behavior
6+
7+
1. Render sends **`SIGTERM`** to your process.
8+
2. The platform waits up to **`maxShutdownDelaySeconds`** (**1–300**, **default 30**).
9+
3. If the process is still running, Render sends **`SIGKILL`** (not catchable).
10+
11+
Configure **`maxShutdownDelaySeconds`** in the **Dashboard** (service settings) or in **`render.yaml`** on the worker service. Set it to cover your **longest job** you are willing to let complete during shutdown (plus buffer for flushing metrics, closing DB pools, etc.).
12+
13+
## General pattern
14+
15+
1. **Stop accepting new jobs** — stop the consumer loop, pause polling, or drain the framework’s internal fetch.
16+
2. **Finish the current job** or **checkpoint** durable progress so another worker can resume safely.
17+
3. **Close connections** — Redis/Postgres pools, HTTP clients.
18+
4. **Exit with code 0** when done.
19+
20+
## Python
21+
22+
**Low-level handler**
23+
24+
```python
25+
import signal
26+
import sys
27+
28+
def handle_sigterm(signum, frame):
29+
# set a flag; main loop checks it and stops dequeuing
30+
global shutting_down
31+
shutting_down = True
32+
33+
signal.signal(signal.SIGTERM, handle_sigterm)
34+
```
35+
36+
**Celery** — use lifecycle signals such as **`worker_shutting_down`** to run cleanup; ensure tasks honor a **soft time limit** or cooperative cancel flag so shutdown can finish within **`maxShutdownDelaySeconds`**.
37+
38+
## Ruby (Sidekiq)
39+
40+
Sidekiq **handles SIGTERM** by default: it stops fetching new work and waits for in-flight jobs up to a **configurable timeout** (`:timeout` in Sidekiq options, in seconds). Align that timeout with Render’s **`maxShutdownDelaySeconds`** (Sidekiq timeout should be **** platform delay minus a small margin).
41+
42+
## Node.js
43+
44+
```javascript
45+
let accept = true;
46+
47+
process.on("SIGTERM", async () => {
48+
accept = false;
49+
await worker.close(); // BullMQ: stops accepting, waits for active jobs
50+
process.exit(0);
51+
});
52+
```
53+
54+
**BullMQ** — prefer **`worker.close()`** (and **`queue.close()`** where applicable) so active jobs complete per library defaults; tune **`stalledInterval`** / job locks if you need stricter bounds.
55+
56+
## Go
57+
58+
Use **`signal.NotifyContext`** (or `signal.Notify` + `context.WithCancel`) to cancel a root context passed into your consumer loop and job handlers; wait on **`sync.WaitGroup`** or channels until in-flight work finishes, then exit.
59+
60+
```go
61+
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
62+
defer stop()
63+
// run consumer until ctx.Done(), then drain workers
64+
```
65+
66+
## Anti-patterns
67+
68+
- **Ignoring SIGTERM** — the process survives until **`SIGKILL`**, often **mid-job**, causing **lost work** or **stuck** queue entries.
69+
- **`maxShutdownDelaySeconds` too low** for your p95 job duration — frequent **hard kills** and retries.
70+
- **No idempotency** — if a job is retried after an ambiguous failure at shutdown, **duplicate side effects** can occur.
71+
72+
## Checklist
73+
74+
| Item | Action |
75+
|------|--------|
76+
| Delay | Set **`maxShutdownDelaySeconds`** ≥ longest graceful completion you need |
77+
| Consumer | On SIGTERM, **stop dequeuing** first |
78+
| Jobs | **Idempotent** handlers or explicit **checkpoints** |
79+
| Framework | Use built-in **drain** / **close** APIs (Sidekiq, BullMQ, Celery signals) where available |

0 commit comments

Comments
 (0)