Skip to content

improvement(helm): helm chart updates with security, ESO, and docs overhaul#4565

Merged
waleedlatif1 merged 26 commits into
stagingfrom
waleed/helm-chart-production-ready
May 12, 2026
Merged

improvement(helm): helm chart updates with security, ESO, and docs overhaul#4565
waleedlatif1 merged 26 commits into
stagingfrom
waleed/helm-chart-production-ready

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Apply Pod Security Standards "restricted" defaults to every pod and container, set automountServiceAccountToken: false, block cloud metadata endpoints via NetworkPolicy egress
  • Auto-partition sensitive app.env / realtime.env keys into a chart-managed Secret via envFrom (no more plaintext secrets on container specs)
  • Support three secret modes — inline, existingSecret, and ExternalSecrets Operator (ESO) — with fail-fast rendering when ESO is enabled but a sensitive key is unmapped
  • Add headless Services for both Postgres StatefulSets, HPA-aware replicas, auto PodDisruptionBudget, distinct startup/liveness/readiness probes, and ttlSecondsAfterFinished on CronJobs
  • Default image tags to Chart.AppVersion with pullPolicy: IfNotPresent; optional image.digest pin; enforce kubeVersion >=1.25.0-0
  • Rewrite README in cert-manager / Bitnami style, add NOTES.txt, annotate every example values file with usage and secret-strategy guidance

Type of Change

  • Improvement

Testing

Tested manually — helm lint clean, helm template renders 2055 lines without error across all three secret modes.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…s overhaul

Comprehensive Helm chart improvements bringing the chart up to industry
standards for security, secret management, and documentation.

Security
- Pod Security Standards "restricted" defaults on every pod and container
  (runAsNonRoot, allowPrivilegeEscalation=false, capabilities.drop=[ALL],
  seccompProfile=RuntimeDefault)
- automountServiceAccountToken=false on ServiceAccount and every pod
- NetworkPolicy egress blocks cloud metadata endpoints by default
- Sensitive app/realtime env keys auto-partitioned into chart-managed Secret
  via envFrom; no more plaintext secrets on container specs

Secret management
- Three modes: inline, existingSecret, ExternalSecrets Operator (ESO)
- ESO sync supports arbitrary sensitive keys
- Fail-fast template rendering when ESO enabled but sensitive key unmapped
- AWS/Azure/GCP example files document all three modes

Reliability
- Headless Services for both Postgres StatefulSets
- HPA-aware replicas (omits spec.replicas when autoscaling.enabled)
- PodDisruptionBudget auto-activates when replicaCount > 1
- Startup / liveness / readiness probes with distinct timings
- CronJob ttlSecondsAfterFinished for automatic cleanup

Chart hygiene
- Image tags default to Chart.AppVersion; pullPolicy IfNotPresent
- Optional image.digest pin for content-addressed deploys
- kubeVersion >=1.25.0-0 enforced
- Ollama pinned to 0.23.2; mount moved to /data

Documentation
- README rewritten in cert-manager / Bitnami style
- NOTES.txt with post-install guidance
- Example values files annotated with usage and secret-strategy guidance
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 12, 2026 5:40pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 12, 2026

PR Summary

High Risk
High risk because it significantly changes Helm chart rendering for secrets, env injection, network policies, and workload specs (replicas/PDB/services), which can break existing deployments or upgrades if values/secret strategies don’t match the new validations.

Overview
Modernizes the helm/sim chart into a production-oriented release: bumps chart metadata (version: 1.0.0, kubeVersion: >=1.25), adds NOTES.txt, and rewrites docs/examples (plus new .claude Helm ops skill references) to standardize install/upgrade and triage guidance.

Reworks configuration and secret handling to support three mutually exclusive modes (inline, app.secrets.existingSecret, or externalSecrets): moves app.env/realtime.env into a chart-managed Secret via envFrom, adds envDefaults inlining rules (mode-aware to avoid masking secret values), and introduces fail-fast template validation for required keys and for ESO coverage (every set env key must be mapped in externalSecrets.remoteRefs.app).

Hardens and operationalizes workloads: applies restricted PSS-aligned pod/container security contexts and automountServiceAccountToken: false, adds checksum annotations to force rollouts on secret/config changes, makes replicas HPA-aware, introduces topology spread constraints, adds cronjob TTL + Secret-sourced CRON_SECRET, expands NetworkPolicy (cron→app ingress, telemetry egress, configurable ingressFrom, metadata IP blocks, egress.extraRules), and adds headless Services plus rolling-update settings for Postgres StatefulSets. Also adds Helm test hook templates and helm-unittest suites to cover the new rendering invariants.

Reviewed by Cursor Bugbot for commit b9ceff9. Configure here.

Comment thread helm/sim/templates/cronjobs.yaml
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR is a comprehensive Helm chart overhaul that applies Pod Security Standards "restricted" defaults, introduces three secret-management modes (inline, existingSecret, ESO), restructures operational tunables into envDefaults, and adds headless Services, HPA-aware PDBs, startup probes, and a rewritten README with NOTES.txt.

  • Security hardening: All pods get automountServiceAccountToken: false, PSS-restricted securityContext defaults, and NetworkPolicy egress now excludes cloud metadata endpoints with configurable exceptCidrs. CRON_SECRET migrated from inline plaintext to secretKeyRef.
  • Secret management: secrets-app.yaml now writes the full union of app.env + realtime.env into a single secret (app-wins precedence), and sim.validateExternalSecretCoverage fails at template time when ESO is enabled but required keys are unmapped.
  • Operational improvements: image.tag defaults to Chart.AppVersion with optional image.digest pin; pullPolicy changed to IfNotPresent; distinct startup/liveness/readiness probes added; PDB activation is tri-state; ttlSecondsAfterFinished added to CronJobs.

Confidence Score: 4/5

Safe to merge for most deployments; users combining NetworkPolicy with an external database should add an egress extraRule for the DB host until the chart adds it natively.

The chart overhaul is well-structured and the most critical security, ESO-coverage, and env-precedence bugs were addressed during the PR review cycle. The remaining items are the external DB egress gap in NetworkPolicy (pre-existing open comment), a constant checksum/secret hash in ESO/existingSecret mode, and the maxUnavailable: 0 PDB falsy-int edge case — all non-blocking for the vast majority of deployments.

helm/sim/templates/networkpolicy.yaml — still missing an egress rule for externalDatabase hosts when networkPolicy.enabled=true and externalDatabase.enabled=true, which silently blocks app-to-database connectivity.

Important Files Changed

Filename Overview
helm/sim/templates/networkpolicy.yaml Major rework: adds OTEL egress rules, configurable ingressFrom peers, cloud-metadata CIDR exclusions, and cron-pod ingress allowances. External database egress is still missing — apps on external DB with NetworkPolicy enabled cannot connect to the database host.
helm/sim/templates/_helpers.tpl Significant refactor: new sim.image with digest pinning and required-tag guard, PSS-restricted security contexts, sim.topologySpreadConstraints, and ESO coverage validator. Gating of INTERNAL_API_SECRET/CRON_SECRET checks within $useExistingAppSecret guard is verified correct.
helm/sim/templates/secrets-app.yaml Rewrites the secret to contain all app.env + realtime.env values with app.env winning on collision (empty-string shadowing avoided via explicit loop). Chart-computed keys correctly excluded.
helm/sim/templates/deployment-app.yaml Adds checksum annotations, PSS security contexts, automountServiceAccountToken: false, HPA-aware replicas, and startup probe. The checksum/secret annotation evaluates to a constant hash in ESO/existingSecret mode, so secret rotation won't trigger automatic pod restarts.
helm/sim/templates/poddisruptionbudget.yaml Adds tri-state activation, HPA-aware auto-detection, and minAvailable: 1 fallback. maxUnavailable: 0 is falsy in Go templates and silently falls through to minAvailable: 1 — edge case only, default string "25%" is truthy and unaffected.
helm/sim/templates/external-secret-app.yaml Generalized to iterate externalSecrets.remoteRefs.app dynamically (string or map ref), replacing the hard-coded six-key whitelist. Default apiVersion corrected to v1beta1.
helm/sim/values.yaml Large restructuring: operational tunables moved to envDefaults, image tags defaulted to Chart.AppVersion with digest support, PDB tri-state, NetworkPolicy egress restructured to egress.exceptCidrs/egress.extraRules, topology spread constraints added per component.
helm/sim/templates/cronjobs.yaml Adds ttlSecondsAfterFinished, automountServiceAccountToken: false, PSS security contexts, and switches CRON_SECRET from inline plaintext to secretKeyRef.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph SecretModes["Secret Resolution"]
        ESO{{"externalSecrets.enabled?"}}
        ESO -->|yes| ESOPath["ExternalSecret CR\nESO syncs to K8s Secret\nvalidateExternalSecretCoverage\nenforces remoteRefs coverage"]
        ESO -->|no| ExistingQ{{"existingSecret.enabled?"}}
        ExistingQ -->|yes| ExistingPath["Pre-existing K8s Secret\napp.env non-empty values\ninlined as env: entries"]
        ExistingQ -->|no| InlinePath["Chart-managed Secret\napp.env + realtime.env union\napp.env wins on collision"]
    end
    InlinePath --> EnvFrom["envFrom secretRef"]
    ESOPath --> EnvFrom
    ExistingPath --> EnvFrom
    EnvFrom --> Pod["Pod environment"]
Loading

Reviews (15): Last reviewed commit: "fix(helm): allow cron pods through app N..." | Re-trigger Greptile

Comment thread helm/sim/templates/external-secret-app.yaml Outdated
Comment thread helm/sim/templates/secrets-app.yaml Outdated
Comment thread helm/sim/templates/_helpers.tpl Outdated
Comment thread helm/sim/templates/_helpers.tpl
The sim.fullname helper collapses to the release name when the release
name contains the chart name. With the documented release name 'sim',
actual resources are 'sim-app', 'sim-postgresql', etc. — not the
'sim-sim-*' form previously documented. Fixes copy-paste commands in the
pre-1.0.0 upgrade walkthrough and several troubleshooting snippets.

Also expands the cronjobs component description to reflect the full set
of 13 scheduled jobs (was understated as just Gmail/Outlook polling).
…defaults

- Add app.envDefaults / realtime.envDefaults for chart-shipped operational
  tunables (rate limits, timeouts, IVM, feature-flag defaults, localhost URL
  fallbacks). Rendered inline on the container, not into the Secret
- Remove operational defaults from app.env / realtime.env so the chart-managed
  Secret stays minimal and External Secrets Operator users only map keys they
  actually set, not every chart default
- Skip an envDefaults key when the user explicitly sets it in env (K8s `env`
  overrides `envFrom`, so an inline default would otherwise mask a Secret
  value at runtime)
- Relax values.schema.json to allow empty strings on NEXT_PUBLIC_APP_URL,
  BETTER_AUTH_URL, NEXT_PUBLIC_SUPPORT_EMAIL (defaults supplied via envDefaults)
…cret merge order, image guard

- CronJobs reference CRON_SECRET via secretKeyRef; fail-fast at template
  time when cronjobs.enabled=true and app.env.CRON_SECRET is empty so users
  get a clear error instead of a CreateContainerConfigError loop
- Default externalSecrets.apiVersion to "v1beta1" (supported by every ESO
  release since v0.7). The previous "v1" default targets only ESO v0.17+
- Swap merge order in secrets-app.yaml so app.env wins over realtime.env
  for shared keys (BETTER_AUTH_SECRET, BETTER_AUTH_URL, …) — both pods
  consume the same Secret via envFrom, so the app value must be canonical
- Add `required` guard on sim.image so an empty tag + empty digest +
  empty Chart.AppVersion surfaces as a clear template-time error instead
  of rendering an invalid `repo:` reference
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1 waleedlatif1 changed the title improvement(helm): production-ready chart with security, ESO, and docs overhaul improvement(helm): helm chart updates with security, ESO, and docs overhaul May 12, 2026
Comment thread helm/sim/templates/_helpers.tpl
Previously, enabling externalSecrets without mapping BETTER_AUTH_SECRET /
ENCRYPTION_KEY / INTERNAL_API_SECRET (and CRON_SECRET when cronjobs are
on) rendered cleanly but produced CrashLoopBackOff at runtime with
cryptic missing-env errors. Fail at template time instead.
Comment thread helm/sim/templates/_helpers.tpl
Comment thread helm/sim/templates/poddisruptionbudget.yaml Outdated
Previously the auto-enable predicate only checked the static
app.replicaCount, which defaults to 1 even when autoscaling is on
(HPA owns spec.replicas). PDB now also activates when
autoscaling.enabled=true and minReplicas > 1.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread helm/sim/templates/deployment-realtime.yaml
Comment thread helm/sim/templates/statefulset-postgresql.yaml
…alues; add StatefulSet upgrade NOTES

- Realtime override-skip now considers keys set in either app.env or
  realtime.env. The shared app Secret is mounted via envFrom on the
  realtime pod, so a key set in app.env (e.g. NEXT_PUBLIC_APP_URL) would
  previously be masked by the realtime envDefault (inline env overrides
  envFrom in K8s).
- NOTES.txt now prints a StatefulSet orphan-delete reminder on upgrade,
  surfacing the immutable serviceName issue documented in the README.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

- 7 helm-unittest suites covering smoke, validators, secret modes,
  envDefaults secret-mode-aware inlining (round-9 regression net),
  chart-computed env keys (round-8 regression net), NetworkPolicy
  shape, and PDB/HPA conditional rendering (38 tests, ~265ms).
- ci/*.yaml render fixtures for default, production, existingSecret,
  ESO, and external-db install modes.
- GitHub Actions workflow runs helm lint --strict, helm unittest,
  helm template across the ci matrix, and kubeconform validation
  against Kubernetes 1.30 schemas.
- CONTRIBUTING.md documents how to run the same gates locally.
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 12, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
32763747 Triggered Generic Password 716a677 helm/sim/tests/validators_test.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

- New templates/tests/test-connection.yaml renders a Pod with
  helm.sh/hook=test that wgets the app Service (and realtime when
  enabled). Lets users run `helm test <release>` after install for
  a real in-cluster connectivity check. Restricted PSS context.
- tests.* values block (image, timeoutSeconds, resources) is the
  knob to disable or tune the probe; documented in values.schema.json.
- 3 helm-unittest tests cover the hook annotations, PSS context,
  and tests.enabled=false skip path (41 tests total).
- New CI job spins up a kind v1.30 cluster and runs
  `kubectl apply --dry-run=server` against the rendered manifests
  for the CRD-free ci fixtures (default / existing-secret /
  external-db). Catches admission and validation issues the static
  kubeconform schema check can't see.
This is the 1.0.0 release of the chart — there is no pre-1.0.0
predecessor for users to upgrade from, so all of the dedicated upgrade
narration was hypothetical.

- Drop the 'Upgrading from a pre-1.0.0 build' README section and the
  matching troubleshooting entry.
- Drop the .Release.IsUpgrade block from NOTES.txt: items 5 (StatefulSet
  orphan-delete), 6 (INTERNAL_API_SECRET 'new in 1.0.0'), 7
  (networkPolicy.egress shape change). Each described a migration off a
  chart version that never shipped.
- Delete references/upgrade-pre-1.0.0.md and remove the corresponding
  pointers from SKILL.md.
- Anchor .helmignore patterns to chart root so /tests/ (unit suites)
  and /examples/ are dropped from the packaged tarball without also
  dropping templates/tests/ (the helm test hook).
The helm-unittest suites in helm/sim/tests/ and the helm test hook
in helm/sim/templates/tests/ stay — those are chart-internal quality
scaffolding, not CI. Removed:

- .github/workflows/helm-chart.yml
- helm/sim/ci/*.yaml (5 render fixtures used only by the workflow)
- helm/sim/CONTRIBUTING.md (mostly documented those gates)
- dead /ci/ and /CONTRIBUTING.md entries in .helmignore
- Add checksum/secret pod annotations on app, realtime, and copilot
  Deployments (plus checksum/config on app when branding ConfigMap is
  enabled). Closes the long-standing footgun where 'helm upgrade' with
  a changed Secret would silently leave pods running the old values
  until a manual rollout restart.
- New top-level topologySpreadConstraints value (and sim.topologySpreadConstraints
  helper) applied to app and realtime Deployments. Mirrors how affinity
  and tolerations are plumbed; users supply their own labelSelector
  to mirror Bitnami convention.
- 5 helm-unittest cases cover the checksum annotations and topology
  spread rendering (46 tests total).
Sprig 'merge' treats "" as a real value, so a default-empty
app.env.BETTER_AUTH_URL would shadow a non-empty realtime.env override
and the URL would never reach the rendered Secret. Replace 'merge'
with an explicit two-pass overlay that filters empties before writing,
mirroring the same pattern already used in deployment-realtime.yaml's
existingSecret block.

Adds two regression tests: realtime.env-only value reaches the Secret
when app.env is empty, and app.env still wins on collision when both
are non-empty (48 tests total).
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 570e5f0. Configure here.

Comment thread helm/sim/templates/_helpers.tpl
…tring

Greptile flagged that sim.topologySpreadConstraints helper docstring promised
per-component config (.Values.app, .Values.realtime, ...) but call sites
passed .Values, so any app.topologySpreadConstraints / realtime.topologySpreadConstraints
set by the user was silently dropped. The single global key also prevented
distinct app-vs-realtime spread rules.

Pass .Values.app / .Values.realtime to the helper at each call site; move
the top-level topologySpreadConstraints key into both component sections in
values.yaml. Adds a regression test that app constraints don't leak onto
the realtime pod.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread helm/sim/templates/networkpolicy.yaml
Cursor flagged that when networkPolicy.enabled=true and cronjobs.enabled=true
(the recommended production config), the app NetworkPolicy only allowed
ingress from realtime and the ingress controller — silently blocking every
cron pod's HTTP call to /api/schedules/execute, webhook polls, etc. All 13
default cronjobs would fail.

Tag cron pods with a stable simstudio.ai/component-group: cronjob label so
the app NetworkPolicy can allow them with a single rule (no per-job
enumeration). Rule is conditional on cronjobs.enabled. Adds positive and
negative regression tests.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit b9ceff9. Configure here.

@waleedlatif1 waleedlatif1 merged commit 9d2dd8f into staging May 12, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleed/helm-chart-production-ready branch May 12, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant