Scheduler client reuse#1796
Open
SubhamSinghal wants to merge 1 commit into
Open
Conversation
Contributor
|
Thanks @SubhamSinghal for your contribution. There is similar functionality used for executor client caching, I wonder if provided infrastructure could be reused for scheduler client caching? Also, do you really see this as a performance bottleneck? |
Author
|
Quick clarification on what you have in mind by "reuse"
I haven't measured it as such. I noticed the per-query reconnect(explicit mentioned TODO) while reading the client path and the fix seemed self-contained. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Today every query against a Ballista cluster opens a brand-new TCP+TLS+HTTP/2 connection to the scheduler, runs
execute_queryplus a polling loop ofget_job_status, and discards the connection. Even when the sameSessionContextruns many queries against one scheduler URL, the handshake is paid on every query.tonic::transport::ChannelisCloneand HTTP/2 multiplexes concurrent RPCs over one connection by design — sothe per-query reconnect is throwing away exactly the thing meant to be shared. For interactive REPLs and JDBC
sessions, this is a constant per-query latency floor (and worse on TLS-enabled deployments).
Two
TODO reuse the scheduler to avoid connecting to the Ballista scheduler again and againmarkers flag this inthe source today:
ballista/core/src/execution_plans/distributed_query.rs:354(execute_query_pull)ballista/core/src/execution_plans/distributed_query.rs:527(execute_query_push)What changes are included in this PR?
Adds a per-planner, lazily-initialized, idle-evicting, single-flight
SchedulerChannelCachethat holds onetonic::transport::ChannelperBallistaQueryPlanner. The cache is constructed once per planner (≈ once perSessionContext) and cloned into everyDistributedQueryExecit builds, so all queries on a session share oneHTTP/2 connection.
Are these changes tested?
Yes:
SchedulerChannelCache: lazy init, single-flight under 16 concurrent callers,build-failure-doesn't-poison,
invalidate()swap semantics, idle eviction with and without an in-flight build,drop-of-cache exits the eviction task (proving the eviction task holds only a
Weak), and theis_transport_errorpredicate.BallistaQueryPlannerasserting that twocreate_physical_plancalls produce exec nodes whosecaches share the same
Arc<Inner>— the invariant that makes the feature actually work end-to-end.ballista-coretests pass unchanged (108 / 108 lib tests green).Are there any user-facing changes?
One new config option:
ballista.client.scheduler_channel_idle_timeout_seconds— idle timeout in seconds for the client-side cachedscheduler gRPC channel. After this many seconds with no query activity, the underlying TCP connection is closed
and the next query reconnects. Default
300.