feat: refactor InfiniCore cpu runtime to InfiniRT by spike-zhu · Pull Request #8 · InfiniTensor/InfiniRT

spike-zhu · 2026-06-24T02:10:52Z

将 InfiniCore 中 CPU runtime 的实现调整为复用 InfiniRT 已有的 CPU runtime 接口，对应 InfiniCore 中更改见 InfiniTensor/InfiniCore#1342

单算子测试截图：

spike-zhu · 2026-06-25T13:27:14Z

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确，后续我会完善细节，感谢！

voltjia · 2026-06-26T06:13:00Z

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确，后续我会完善细节，感谢！

我看了一下，基本上没啥问题，就是咱们这次重构有个原则：尽量复用 CUDA Runtime API 的接口，换句话说，有些接口需要查一下 CUDA Toolkit 里面有没有，比如我好像没查到 GetDeviceResourceSnapshot 相关的接口（也可能是我遗漏了），这部分 CUDA Toolkit 里面没有的，我们可以列个表出来，看看后面是不是真的需要迁移到新 InfiniRT 里面。除了接口名称，参数列表也得检查一下。别的目前看来没啥问题。

spike-zhu · 2026-06-29T01:32:10Z

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确，后续我会完善细节，感谢！

我看了一下，基本上没啥问题，就是咱们这次重构有个原则：尽量复用 CUDA Runtime API 的接口，换句话说，有些接口需要查一下 CUDA Toolkit 里面有没有，比如我好像没查到 GetDeviceResourceSnapshot 相关的接口（也可能是我遗漏了），这部分 CUDA Toolkit 里面没有的，我们可以列个表出来，看看后面是不是真的需要迁移到新 InfiniRT 里面。除了接口名称，参数列表也得检查一下。别的目前看来没啥问题。

ok，关于 CUDA Runtime API 接口我也调研罗列一下

spike-zhu · 2026-06-29T03:23:07Z

InfiniCore CPU Runtime API 迁移判定表

CUDA API 来源：cuda_runtime_api.h，CUDA Toolkit 12.9。

InfiniCore 中的 API 名称	InfiniRT 中的名称	对应的 CUDA API 名称	CUDA API 函数接口	InfiniRT 是否迁移
`setDevice(int device_id)`	`infini::rt::SetDevice(Device device)`	`cudaSetDevice`	`cudaError_t cudaSetDevice(int device);`	已迁移
`getDevice(...)` / `infinirtGetDevice(...)`	`infini::rt::GetDevice(Device* device)`	`cudaGetDevice`	`cudaError_t cudaGetDevice(int *device);`	已迁移
`getDeviceCount(int *count)`	`infini::rt::GetDeviceCount(int* count, Device::Type type)`	`cudaGetDeviceCount`	`cudaError_t cudaGetDeviceCount(int *count);`	已迁移，InfiniRT 增加 `Device::Type`
`deviceSynchronize()`	`infini::rt::DeviceSynchronize()`	`cudaDeviceSynchronize`	`cudaError_t cudaDeviceSynchronize(void);`	已迁移
`mallocDevice(void **p_ptr, size_t size)`	`infini::rt::Malloc(void** ptr, std::size_t size)`	`cudaMalloc`	`cudaError_t cudaMalloc(void **devPtr, size_t size);`	已迁移
`freeDevice(void *ptr)`	`infini::rt::Free(void* ptr)`	`cudaFree`	`cudaError_t cudaFree(void *devPtr);`	已迁移
`memcpy(void dst, const void src, size_t size, infinirtMemcpyKind_t kind)`	`infini::rt::Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind)`	`cudaMemcpy`	`cudaError_t cudaMemcpy(void dst, const void src, size_t count, enum cudaMemcpyKind kind);`	已迁移
`memsetDevice(void *ptr, int value, size_t count)`	`infini::rt::Memset(void* ptr, int value, std::size_t count)`	`cudaMemset`	`cudaError_t cudaMemset(void *devPtr, int value, size_t count);`	已迁移
`mallocHost(void **p_ptr, size_t size)`	`infini::rt::MallocHost(void** ptr, std::size_t size)`	`cudaMallocHost`	`cudaError_t cudaMallocHost(void **ptr, size_t size);`	已迁移
`freeHost(void *ptr)`	`infini::rt::FreeHost(void* ptr)`	`cudaFreeHost`	`cudaError_t cudaFreeHost(void *ptr);`	已迁移
`memcpyAsync(void dst, const void src, size_t size, infinirtMemcpyKind_t kind, infinirtStream_t stream)`	`infini::rt::MemcpyAsync(void* dst, const void* src, std::size_t count, MemcpyKind kind, void* stream)`	`cudaMemcpyAsync`	`cudaError_t cudaMemcpyAsync(void dst, const void src, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream);`	已迁移
`memsetDeviceAsync(void *ptr, int value, size_t count, infinirtStream_t stream)`	`infini::rt::MemsetAsync(void* ptr, int value, std::size_t count, void* stream)`	`cudaMemsetAsync`	`cudaError_t cudaMemsetAsync(void *devPtr, int value, size_t count, cudaStream_t stream);`	已迁移
`mallocAsync(void **p_ptr, size_t size, infinirtStream_t stream)`	`infini::rt::MallocAsync(void** ptr, std::size_t size, void* stream)`	`cudaMallocAsync`	`cudaError_t cudaMallocAsync(void **devPtr, size_t size, cudaStream_t hStream);`	已迁移
`freeAsync(void *ptr, infinirtStream_t stream)`	`infini::rt::FreeAsync(void* ptr, void* stream)`	`cudaFreeAsync`	`cudaError_t cudaFreeAsync(void *devPtr, cudaStream_t hStream);`	已迁移
`streamCreate(infinirtStream_t *stream_ptr)`	`infini::rt::StreamCreate(void** stream)`	`cudaStreamCreate`	`cudaError_t cudaStreamCreate(cudaStream_t *pStream);`	已迁移
`streamDestroy(infinirtStream_t stream)`	`infini::rt::StreamDestroy(void* stream)`	`cudaStreamDestroy`	`cudaError_t cudaStreamDestroy(cudaStream_t stream);`	已迁移
`streamSynchronize(infinirtStream_t stream)`	`infini::rt::StreamSynchronize(void* stream)`	`cudaStreamSynchronize`	`cudaError_t cudaStreamSynchronize(cudaStream_t stream);`	已迁移
`streamWaitEvent(infinirtStream_t stream, infinirtEvent_t event)`	`infini::rt::StreamWaitEvent(void* stream, void* event)`	`cudaStreamWaitEvent`	`cudaError_t cudaStreamWaitEvent(cudaStream_t stream, cudaEvent_t event, unsigned int flags);`	已迁移，但 InfiniRT 当前缺少 `flags` 参数
`eventCreate(infinirtEvent_t *event_ptr)`	`infini::rt::EventCreate(void** event)`	`cudaEventCreate`	`cudaError_t cudaEventCreate(cudaEvent_t *event);`	已迁移
`eventCreateWithFlags(infinirtEvent_t *event_ptr, uint32_t flags)`	`infini::rt::EventCreateWithFlags(void** event, uint32_t flags)`	`cudaEventCreateWithFlags`	`cudaError_t cudaEventCreateWithFlags(cudaEvent_t *event, unsigned int flags);`	已迁移
`eventRecord(infinirtEvent_t event, infinirtStream_t stream)`	`infini::rt::EventRecord(void* event, void* stream)`	`cudaEventRecord`	`cudaError_t cudaEventRecord(cudaEvent_t event, cudaStream_t stream);`	已迁移
`eventQuery(infinirtEvent_t event, infinirtEventStatus_t *status_ptr)`	`infini::rt::EventQuery(void* event, int* status)`	`cudaEventQuery`	`cudaError_t cudaEventQuery(cudaEvent_t event);`	已迁移，InfiniRT 用输出参数表达 complete/not-ready
`eventSynchronize(infinirtEvent_t event)`	`infini::rt::EventSynchronize(void* event)`	`cudaEventSynchronize`	`cudaError_t cudaEventSynchronize(cudaEvent_t event);`	已迁移
`eventDestroy(infinirtEvent_t event)`	`infini::rt::EventDestroy(void* event)`	`cudaEventDestroy`	`cudaError_t cudaEventDestroy(cudaEvent_t event);`	已迁移
`eventElapsedTime(float *ms_ptr, infinirtEvent_t start, infinirtEvent_t end)`	`infini::rt::EventElapsedTime(float* ms, void* start, void* end)`	`cudaEventElapsedTime`	`cudaError_t cudaEventElapsedTime(float *ms, cudaEvent_t start, cudaEvent_t end);`	已迁移
`getMemInfo(int device_id, size_t free_bytes, size_t total_bytes)`	`infini::rt::GetMemInfo(Device device, std::size_t* free_bytes, std::size_t* total_bytes)`	`cudaMemGetInfo`	`cudaError_t cudaMemGetInfo(size_t free, size_t total);`	已迁移，InfiniRT 增加 `Device` 参数
`getDeviceResourceSnapshot(int device_id, infinirtDeviceResourceSnapshot_t *snapshot)`	无	无直接对应 API	CUDA Runtime 无 `GetDeviceResourceSnapshot` / resource snapshot 聚合接口	不迁移，保留在 InfiniCore 中
`streamBeginCapture(infinirtStream_t stream, infinirtStreamCaptureMode_t mode)`	当前 CPU 未迁移	`cudaStreamBeginCapture`	`cudaError_t cudaStreamBeginCapture(cudaStream_t stream, enum cudaStreamCaptureMode mode);`	CPU 当前不迁移，保留 unsupported
`streamEndCapture(infinirtStream_t stream, infinirtGraph_t *graph_ptr)`	当前 CPU 未迁移	`cudaStreamEndCapture`	`cudaError_t cudaStreamEndCapture(cudaStream_t stream, cudaGraph_t *pGraph);`	CPU 当前不迁移，保留 unsupported
`graphDestroy(infinirtGraph_t graph)`	当前 CPU 未迁移	`cudaGraphDestroy`	`cudaError_t cudaGraphDestroy(cudaGraph_t graph);`	CPU 当前不迁移，保留 unsupported
`graphInstantiate(...)`	当前 CPU 未迁移	`cudaGraphInstantiate`	`cudaError_t cudaGraphInstantiate(cudaGraphExec_t *pGraphExec, cudaGraph_t graph, unsigned long long flags);`	CPU 当前不迁移，保留 unsupported；InfiniCore 旧接口参数与新 CUDA 原型不完全一致
`graphExecDestroy(infinirtGraphExec_t graph_exec)`	当前 CPU 未迁移	无完全同名 CUDA Runtime API	常见对应为 `cudaGraphExecDestroy(cudaGraphExec_t graphExec)`，需按 CUDA Toolkit 版本确认	CPU 当前不迁移，保留 unsupported
`graphLuanch(infinirtGraphExec_t graph_exec, infinirtStream_t stream)`	当前 CPU 未迁移	`cudaGraphLaunch`	`cudaError_t cudaGraphLaunch(cudaGraphExec_t graphExec, cudaStream_t stream);`	CPU 当前不迁移，保留 unsupported

voltjia · 2026-06-30T07:32:38Z

-void Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind);
+int Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind);
+
+int GetMemInfo(Device device, std::size_t* free_bytes,


这个对应的 CUDA API 是 cudaMemGetInfo，所以应该改为 MemGetInfo。而且原 API 无 device 参数，这里也不应该有。

voltjia · 2026-06-30T07:35:07Z

+
+int StreamSynchronize(void* stream);
+
+int StreamWaitEvent(void* stream, void* event);


咱们的似乎缺少了 flags 参数，先加上但是不用就行了。

voltjia · 2026-06-30T07:35:48Z

+
+int EventCreate(void** event);
+
+int EventCreateWithFlags(void** event, uint32_t flags);


用 std::uint32_t 吧。

voltjia · 2026-06-30T07:36:31Z

+
+int EventRecord(void* event, void* stream);
+
+int EventQuery(void* event, int* status);


这个好像跟 CUDA 的参数列表不一致，这个是为啥？

voltjia · 2026-06-30T07:38:12Z

+int MallocHost(void** ptr, std::size_t size);
+
+int FreeHost(void* ptr);
+
+int MemcpyAsync(void* dst, const void* src, std::size_t count, MemcpyKind kind,
+                void* stream);
+
+int MallocAsync(void** ptr, std::size_t size, void* stream);
+
+int FreeAsync(void* ptr, void* stream);
+
+int MemsetAsync(void* ptr, int value, std::size_t count, void* stream);


这几个挪到上面吧，就是上面普通 Memcpy 那些的后面。

voltjia · 2026-06-30T07:47:19Z

+int GetDevice(Device* device);

-void GetDeviceCount(int* count, Device::Type type);
+int GetDeviceCount(int* count, Device::Type type);


这个函数比 CUDA 的多了个 type 参数，需要去掉。

voltjia · 2026-06-30T08:26:11Z

+int SetDevice(Device device);

-void GetDevice(Device* device);
+int GetDevice(Device* device);


这两个的参数也改成 int 吧，我们这一层的接口后面就跟 CUDA 的完全对齐，一模一样即可。

voltjia · 2026-06-30T08:26:51Z

按照刚才讨论的内容，CPU 的先往后放一放，优先搞英伟达 GPU 的吧。

voltjia · 2026-06-30T08:44:20Z

经过讨论，后面咱们这一层的 API 需要跟 CUDA Runtime API 完全对齐。所谓“完全对齐”，意思就是将 infinirt 替换成 cuda 之后，函数签名（也就是函数名称与参数类型）和返回类型与 CUDA 的 API 完全一致。举个例子：

CUDA 中的 cudaMemcpyAsync 的声明为：

cudaError_t cudaMemcpyAsync(void* dst, const void* src, size_t count, cudaMemcpyKind kind, cudaStream_t stream);

那么在 InfiniRT 里面就应该是：

infinirtError_t infinirtMemcpyAsync(void* dst, const void* src, size_t count, infinirtMemcpyKind kind, infinirtStream_t stream);

spike-zhu marked this pull request as draft June 24, 2026 02:11

spike-zhu force-pushed the feat/extract-infinicore-runtime branch from 2e80f6b to 866fc8d Compare June 25, 2026 13:15

spike-zhu mentioned this pull request Jun 25, 2026

issue/1311 - feat: refactor InfiniCore cpu runtime to InfiniRT InfiniTensor/InfiniCore#1342

Draft

spike-zhu self-assigned this Jun 25, 2026

spike-zhu requested a review from voltjia June 25, 2026 13:27

feat: refactor InfiniCore cpu runtime to InfiniRT

7fd37b5

spike-zhu force-pushed the feat/extract-infinicore-runtime branch from 866fc8d to 7fd37b5 Compare June 30, 2026 01:49

spike-zhu marked this pull request as ready for review June 30, 2026 01:51

voltjia requested changes Jun 30, 2026

View reviewed changes


		int StreamSynchronize(void* stream);

		int StreamWaitEvent(void* stream, void* event);


		int EventCreate(void** event);

		int EventCreateWithFlags(void** event, uint32_t flags);


		int EventRecord(void* event, void* stream);

		int EventQuery(void* event, int* status);

Uh oh!

Conversation

spike-zhu commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spike-zhu commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

voltjia commented Jun 26, 2026

Uh oh!

spike-zhu commented Jun 29, 2026

Uh oh!

spike-zhu commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

InfiniCore CPU Runtime API 迁移判定表

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

voltjia commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spike-zhu commented Jun 24, 2026 •

edited

Loading

spike-zhu commented Jun 25, 2026 •

edited

Loading

spike-zhu commented Jun 29, 2026 •

edited

Loading