Stream Anthropic responses uncompressed#771
Open
xymbol wants to merge 1 commit into
Open
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #771 +/- ##
==========================================
+ Coverage 82.88% 82.90% +0.02%
==========================================
Files 142 142
Lines 6574 6576 +2
Branches 1148 1148
==========================================
+ Hits 5449 5452 +3
Misses 663 663
+ Partials 462 461 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
e400fd4 to
5f7a121
Compare
5f7a121 to
1170a8c
Compare
0a80ed5 to
030d69a
Compare
Contributor
Author
|
Thanks @crmne for your work on RubyLLM. 🙏 On #771 an Anthropic maintainer (@ms-jpq) confirmed the same bug and applied the same one-header fix in the official SDK ( |
Net::HTTP auto-inflates the upstream gzip, which buffers SSE chunks until Cloudflare flushes its deflate state — turning ~100 events into 2 bursts and pushing first-chunk arrival from ~1s to ~15s on a 22s response. Set Accept-Encoding: identity on streaming requests. Non-streaming responses keep gzip.
030d69a to
c462d94
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Net::HTTP requests gzip and auto-inflates by default. Cloudflare gzips Anthropic's SSE with infrequent deflate flushes, batching chunk delivery into 2 bursts and pushing first-chunk arrival to ~15s on a 22s response.
Setting
Accept-Encoding: identityon streaming requests bypasses Net::HTTP's inflater. Scoped to Anthropic streaming; non-streaming responses still benefit from gzip.Measured on
claude-haiku-4-5, 1500-word completion. Sparkline: each char = 1s, digit = chunks delivered,_= zero.9____984528454645464555__92Type of change
Scope check
Required for new features
N/A — bug fix.
Quality check
overcommit --installand all hooks passbundle exec rake vcr:record[provider_name](not needed — VCR's default matcher is[:method, :uri], and Net::HTTP still inflates a recorded gzipped response on its own. The only load-bearing diff would be theAccept-Encodingrequest header value.)bundle exec rspecmodels.json,aliases.json)New spec at
spec/ruby_llm/providers/anthropic/streaming_spec.rbasserts the request header (fails without the fix, passes with).AI-generated code
API changes
Related
anthropics/anthropic-sdk-ruby#182 — the official Anthropic Ruby SDK has the same Net::HTTP auto-inflate bug and applies the same one-header fix (
Accept-Encoding: identity) on its streaming endpoints.