Skip to content

Scheduler client reuse#1796

Open
SubhamSinghal wants to merge 1 commit into
apache:mainfrom
SubhamSinghal:scheduler-client-reuse
Open

Scheduler client reuse#1796
SubhamSinghal wants to merge 1 commit into
apache:mainfrom
SubhamSinghal:scheduler-client-reuse

Conversation

@SubhamSinghal

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Today every query against a Ballista cluster opens a brand-new TCP+TLS+HTTP/2 connection to the scheduler, runs
execute_query plus a polling loop of get_job_status, and discards the connection. Even when the same
SessionContext runs many queries against one scheduler URL, the handshake is paid on every query.

tonic::transport::Channel is Clone and HTTP/2 multiplexes concurrent RPCs over one connection by design — so
the per-query reconnect is throwing away exactly the thing meant to be shared. For interactive REPLs and JDBC
sessions, this is a constant per-query latency floor (and worse on TLS-enabled deployments).

Two TODO reuse the scheduler to avoid connecting to the Ballista scheduler again and again markers flag this in
the source today:

  • ballista/core/src/execution_plans/distributed_query.rs:354 (execute_query_pull)
  • ballista/core/src/execution_plans/distributed_query.rs:527 (execute_query_push)

What changes are included in this PR?

Adds a per-planner, lazily-initialized, idle-evicting, single-flight SchedulerChannelCache that holds one
tonic::transport::Channel per BallistaQueryPlanner. The cache is constructed once per planner (≈ once per
SessionContext) and cloned into every DistributedQueryExec it builds, so all queries on a session share one
HTTP/2 connection.

Are these changes tested?

Yes:

  • 9 unit tests on SchedulerChannelCache: lazy init, single-flight under 16 concurrent callers,
    build-failure-doesn't-poison, invalidate() swap semantics, idle eviction with and without an in-flight build,
    drop-of-cache exits the eviction task (proving the eviction task holds only a Weak), and the
    is_transport_error predicate.
  • 1 unit test in BallistaQueryPlanner asserting that two create_physical_plan calls produce exec nodes whose
    caches share the same Arc<Inner> — the invariant that makes the feature actually work end-to-end.
  • All pre-existing ballista-core tests pass unchanged (108 / 108 lib tests green).

Are there any user-facing changes?

One new config option:

  • ballista.client.scheduler_channel_idle_timeout_seconds — idle timeout in seconds for the client-side cached
    scheduler gRPC channel. After this many seconds with no query activity, the underlying TCP connection is closed
    and the next query reconnects. Default 300.

@milenkovicm

Copy link
Copy Markdown
Contributor

Thanks @SubhamSinghal for your contribution.

There is similar functionality used for executor client caching, I wonder if provided infrastructure could be reused for scheduler client caching?

Also, do you really see this as a performance bottleneck?

@SubhamSinghal

Copy link
Copy Markdown
Author

@milenkovicm

Quick clarification on what you have in mind by "reuse"

  1. Same pattern: a DashMap<scheduler_id, ...> field on the planner that mirrors SchedulerClients in executor_server.rs.
  2. Lift to shared primitive: extract the existing executor-side caching into a reusable type in ballista-core and have both ExecutorServer and BallistaQueryPlanner use it.
  3. Something else I'm missing.

Also, do you really see this as a performance bottleneck?

I haven't measured it as such. I noticed the per-query reconnect(explicit mentioned TODO) while reading the client path and the fix seemed self-contained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants