fix(ts-sdk): correct crawl module bugs and add createCrawl wrapper#4031
Open
sushaan-k wants to merge 1 commit into
Open
fix(ts-sdk): correct crawl module bugs and add createCrawl wrapper#4031sushaan-k wants to merge 1 commit into
sushaan-k wants to merge 1 commit into
Conversation
2e5abe9 to
65dc588
Compare
The crawl module had several copy-paste bugs from the datasets module: - JSDoc on getCrawlsForDataset said "create a dataset" - Return type was Promise<Dataset> instead of Promise<Array<CrawlRequest>> - Error message referenced "create a crawl" on a GET operation Also adds a wrapper for POST /api/crawl (createCrawl), which is missing from the SDK even though it exists in the OpenAPI spec. The PUT and DELETE crawl endpoints are intentionally not wrapped here. The backend behavior of update_crawl_request and delete_crawl_request appears to be broken in ways that would mislead SDK callers; see the PR description for details. Changes: - Fix JSDoc, return type, and error message on getCrawlsForDataset - Make the props argument optional (limit/page are both optional) - Add createCrawl wrapper - Add a basic crawl.test.ts that exercises getCrawlsForDataset
65dc588 to
72dd6b9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The crawl module under
clients/ts-sdk/src/functions/crawl/was cargo-culted from the datasets module and never updated. As shipped on main,getCrawlsForDatasetreturns a value typed asPromise<Dataset>(a single dataset), the JSDoc on it claims it creates a dataset, and the error message says it needs a dataset ID "to create a crawl" even though it's a GET. There's also no SDK wrapper forPOST /api/crawl.This PR fixes the wrong types, JSDoc, and error message on
getCrawlsForDataset, makes itspropsargument optional (bothlimitandpageare already optional in the underlying type), and adds a wrapper forcreateCrawlthat mirrors the patterns used in the other function modules.It also adds
clients/ts-sdk/src/functions/crawl/crawl.test.tswith a read-only smoke + return-type test forgetCrawlsForDataset, modeled on the existingevents.test.ts/analytics.test.tspattern. Nothing in the test creates or mutates crawls, so it's safe against the shared sandbox dataset.yarn buildandyarn lintboth pass.What's intentionally not in this PR
The OpenAPI spec also exposes
PUT /api/crawlandDELETE /api/crawl/{crawl_id}. I started by wrapping all four endpoints, but on closer reading of the backend the PUT and DELETE handlers look broken in ways that would mislead SDK callers, so I dropped those wrappers from this PR. Filing them here for the maintainers in case it's useful:delete_crawl_request(server/src/handlers/crawl_handler.rs:132) doesn't extractDatasetAndOrgWithSubAndPlananddelete_crawl_queryrunsDELETE FROM crawl_requests WHERE id = crawl_idwith nodataset_idfilter (server/src/operators/crawl_operator.rs:415). Combined withAdminOnlyonly checking that the caller is admin somewhere, that means an admin for any dataset can delete crawls from any other dataset by passing theircrawl_id. The documentedTR-Datasetheader is ignored.update_crawl_request(server/src/handlers/crawl_handler.rs:91) takescrawl_idin the payload, butupdate_crawl_query(server/src/operators/crawl_operator.rs:354) loads the first crawl in the dataset (no filter oncrawl_id), then deletes byscrape_id == crawl_id(notid == crawl_id), then creates a new crawl. In the common case where the caller passes the value ofCrawlRequest.idascrawl_id(which is what the OpenAPI type implies), the delete-by-scrape_iddoesn't match anything and the "first crawl in dataset" load is unrelated to the id the caller actually wanted to update.Happy to follow up with a server-side PR for either of these if it's something the maintainers want; for now this PR sticks to changes that don't require a backend fix to be safe.