Skip to content

fix(ts-sdk): correct crawl module bugs and add createCrawl wrapper#4031

Open
sushaan-k wants to merge 1 commit into
devflowinc:mainfrom
sushaan-k:aarya/fix-crawl-sdk-bugs-and-add-crud
Open

fix(ts-sdk): correct crawl module bugs and add createCrawl wrapper#4031
sushaan-k wants to merge 1 commit into
devflowinc:mainfrom
sushaan-k:aarya/fix-crawl-sdk-bugs-and-add-crud

Conversation

@sushaan-k
Copy link
Copy Markdown

@sushaan-k sushaan-k commented May 18, 2026

The crawl module under clients/ts-sdk/src/functions/crawl/ was cargo-culted from the datasets module and never updated. As shipped on main, getCrawlsForDataset returns a value typed as Promise<Dataset> (a single dataset), the JSDoc on it claims it creates a dataset, and the error message says it needs a dataset ID "to create a crawl" even though it's a GET. There's also no SDK wrapper for POST /api/crawl.

This PR fixes the wrong types, JSDoc, and error message on getCrawlsForDataset, makes its props argument optional (both limit and page are already optional in the underlying type), and adds a wrapper for createCrawl that mirrors the patterns used in the other function modules.

It also adds clients/ts-sdk/src/functions/crawl/crawl.test.ts with a read-only smoke + return-type test for getCrawlsForDataset, modeled on the existing events.test.ts/analytics.test.ts pattern. Nothing in the test creates or mutates crawls, so it's safe against the shared sandbox dataset.

yarn build and yarn lint both pass.

What's intentionally not in this PR

The OpenAPI spec also exposes PUT /api/crawl and DELETE /api/crawl/{crawl_id}. I started by wrapping all four endpoints, but on closer reading of the backend the PUT and DELETE handlers look broken in ways that would mislead SDK callers, so I dropped those wrappers from this PR. Filing them here for the maintainers in case it's useful:

  • delete_crawl_request (server/src/handlers/crawl_handler.rs:132) doesn't extract DatasetAndOrgWithSubAndPlan and delete_crawl_query runs DELETE FROM crawl_requests WHERE id = crawl_id with no dataset_id filter (server/src/operators/crawl_operator.rs:415). Combined with AdminOnly only checking that the caller is admin somewhere, that means an admin for any dataset can delete crawls from any other dataset by passing their crawl_id. The documented TR-Dataset header is ignored.
  • update_crawl_request (server/src/handlers/crawl_handler.rs:91) takes crawl_id in the payload, but update_crawl_query (server/src/operators/crawl_operator.rs:354) loads the first crawl in the dataset (no filter on crawl_id), then deletes by scrape_id == crawl_id (not id == crawl_id), then creates a new crawl. In the common case where the caller passes the value of CrawlRequest.id as crawl_id (which is what the OpenAPI type implies), the delete-by-scrape_id doesn't match anything and the "first crawl in dataset" load is unrelated to the id the caller actually wanted to update.

Happy to follow up with a server-side PR for either of these if it's something the maintainers want; for now this PR sticks to changes that don't require a backend fix to be safe.

Copilot AI review requested due to automatic review settings May 18, 2026 03:17
@sushaan-k sushaan-k force-pushed the aarya/fix-crawl-sdk-bugs-and-add-crud branch from 2e5abe9 to 65dc588 Compare May 18, 2026 21:16
The crawl module had several copy-paste bugs from the datasets module:

- JSDoc on getCrawlsForDataset said "create a dataset"
- Return type was Promise<Dataset> instead of Promise<Array<CrawlRequest>>
- Error message referenced "create a crawl" on a GET operation

Also adds a wrapper for POST /api/crawl (createCrawl), which is missing from
the SDK even though it exists in the OpenAPI spec.

The PUT and DELETE crawl endpoints are intentionally not wrapped here. The
backend behavior of update_crawl_request and delete_crawl_request appears
to be broken in ways that would mislead SDK callers; see the PR description
for details.

Changes:

- Fix JSDoc, return type, and error message on getCrawlsForDataset
- Make the props argument optional (limit/page are both optional)
- Add createCrawl wrapper
- Add a basic crawl.test.ts that exercises getCrawlsForDataset
@sushaan-k sushaan-k force-pushed the aarya/fix-crawl-sdk-bugs-and-add-crud branch from 65dc588 to 72dd6b9 Compare May 18, 2026 21:31
@sushaan-k sushaan-k changed the title fix(ts-sdk): correct crawl module bugs and add missing CRUD wrappers fix(ts-sdk): correct crawl module bugs and add createCrawl wrapper May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant