Skip to content

feat: refactor InfiniCore cpu runtime to InfiniRT#8

Open
spike-zhu wants to merge 1 commit into
masterfrom
feat/extract-infinicore-runtime
Open

feat: refactor InfiniCore cpu runtime to InfiniRT#8
spike-zhu wants to merge 1 commit into
masterfrom
feat/extract-infinicore-runtime

Conversation

@spike-zhu

@spike-zhu spike-zhu commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

将 InfiniCore 中 CPU runtime 的实现调整为复用 InfiniRT 已有的 CPU runtime 接口,对应 InfiniCore 中更改见 InfiniTensor/InfiniCore#1342

单算子测试截图:
image
image

@spike-zhu

spike-zhu commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确,后续我会完善细节,感谢!

@spike-zhu spike-zhu requested a review from voltjia June 25, 2026 13:27
@voltjia

voltjia commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确,后续我会完善细节,感谢!

我看了一下,基本上没啥问题,就是咱们这次重构有个原则:尽量复用 CUDA Runtime API 的接口,换句话说,有些接口需要查一下 CUDA Toolkit 里面有没有,比如我好像没查到 GetDeviceResourceSnapshot 相关的接口(也可能是我遗漏了),这部分 CUDA Toolkit 里面没有的,我们可以列个表出来,看看后面是不是真的需要迁移到新 InfiniRT 里面。除了接口名称,参数列表也得检查一下。别的目前看来没啥问题。

@spike-zhu

Copy link
Copy Markdown
Contributor Author

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确,后续我会完善细节,感谢!

我看了一下,基本上没啥问题,就是咱们这次重构有个原则:尽量复用 CUDA Runtime API 的接口,换句话说,有些接口需要查一下 CUDA Toolkit 里面有没有,比如我好像没查到 GetDeviceResourceSnapshot 相关的接口(也可能是我遗漏了),这部分 CUDA Toolkit 里面没有的,我们可以列个表出来,看看后面是不是真的需要迁移到新 InfiniRT 里面。除了接口名称,参数列表也得检查一下。别的目前看来没啥问题。

ok,关于 CUDA Runtime API 接口我也调研罗列一下

@spike-zhu

spike-zhu commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

InfiniCore CPU Runtime API 迁移判定表

CUDA API 来源:cuda_runtime_api.h,CUDA Toolkit 12.9。

InfiniCore 中的 API 名称 InfiniRT 中的名称 对应的 CUDA API 名称 CUDA API 函数接口 InfiniRT 是否迁移
setDevice(int device_id) infini::rt::SetDevice(Device device) cudaSetDevice cudaError_t cudaSetDevice(int device); 已迁移
getDevice(...) / infinirtGetDevice(...) infini::rt::GetDevice(Device* device) cudaGetDevice cudaError_t cudaGetDevice(int *device); 已迁移
getDeviceCount(int *count) infini::rt::GetDeviceCount(int* count, Device::Type type) cudaGetDeviceCount cudaError_t cudaGetDeviceCount(int *count); 已迁移,InfiniRT 增加 Device::Type
deviceSynchronize() infini::rt::DeviceSynchronize() cudaDeviceSynchronize cudaError_t cudaDeviceSynchronize(void); 已迁移
mallocDevice(void **p_ptr, size_t size) infini::rt::Malloc(void** ptr, std::size_t size) cudaMalloc cudaError_t cudaMalloc(void **devPtr, size_t size); 已迁移
freeDevice(void *ptr) infini::rt::Free(void* ptr) cudaFree cudaError_t cudaFree(void *devPtr); 已迁移
memcpy(void *dst, const void *src, size_t size, infinirtMemcpyKind_t kind) infini::rt::Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind) cudaMemcpy cudaError_t cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind); 已迁移
memsetDevice(void *ptr, int value, size_t count) infini::rt::Memset(void* ptr, int value, std::size_t count) cudaMemset cudaError_t cudaMemset(void *devPtr, int value, size_t count); 已迁移
mallocHost(void **p_ptr, size_t size) infini::rt::MallocHost(void** ptr, std::size_t size) cudaMallocHost cudaError_t cudaMallocHost(void **ptr, size_t size); 已迁移
freeHost(void *ptr) infini::rt::FreeHost(void* ptr) cudaFreeHost cudaError_t cudaFreeHost(void *ptr); 已迁移
memcpyAsync(void *dst, const void *src, size_t size, infinirtMemcpyKind_t kind, infinirtStream_t stream) infini::rt::MemcpyAsync(void* dst, const void* src, std::size_t count, MemcpyKind kind, void* stream) cudaMemcpyAsync cudaError_t cudaMemcpyAsync(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream); 已迁移
memsetDeviceAsync(void *ptr, int value, size_t count, infinirtStream_t stream) infini::rt::MemsetAsync(void* ptr, int value, std::size_t count, void* stream) cudaMemsetAsync cudaError_t cudaMemsetAsync(void *devPtr, int value, size_t count, cudaStream_t stream); 已迁移
mallocAsync(void **p_ptr, size_t size, infinirtStream_t stream) infini::rt::MallocAsync(void** ptr, std::size_t size, void* stream) cudaMallocAsync cudaError_t cudaMallocAsync(void **devPtr, size_t size, cudaStream_t hStream); 已迁移
freeAsync(void *ptr, infinirtStream_t stream) infini::rt::FreeAsync(void* ptr, void* stream) cudaFreeAsync cudaError_t cudaFreeAsync(void *devPtr, cudaStream_t hStream); 已迁移
streamCreate(infinirtStream_t *stream_ptr) infini::rt::StreamCreate(void** stream) cudaStreamCreate cudaError_t cudaStreamCreate(cudaStream_t *pStream); 已迁移
streamDestroy(infinirtStream_t stream) infini::rt::StreamDestroy(void* stream) cudaStreamDestroy cudaError_t cudaStreamDestroy(cudaStream_t stream); 已迁移
streamSynchronize(infinirtStream_t stream) infini::rt::StreamSynchronize(void* stream) cudaStreamSynchronize cudaError_t cudaStreamSynchronize(cudaStream_t stream); 已迁移
streamWaitEvent(infinirtStream_t stream, infinirtEvent_t event) infini::rt::StreamWaitEvent(void* stream, void* event) cudaStreamWaitEvent cudaError_t cudaStreamWaitEvent(cudaStream_t stream, cudaEvent_t event, unsigned int flags); 已迁移,但 InfiniRT 当前缺少 flags 参数
eventCreate(infinirtEvent_t *event_ptr) infini::rt::EventCreate(void** event) cudaEventCreate cudaError_t cudaEventCreate(cudaEvent_t *event); 已迁移
eventCreateWithFlags(infinirtEvent_t *event_ptr, uint32_t flags) infini::rt::EventCreateWithFlags(void** event, uint32_t flags) cudaEventCreateWithFlags cudaError_t cudaEventCreateWithFlags(cudaEvent_t *event, unsigned int flags); 已迁移
eventRecord(infinirtEvent_t event, infinirtStream_t stream) infini::rt::EventRecord(void* event, void* stream) cudaEventRecord cudaError_t cudaEventRecord(cudaEvent_t event, cudaStream_t stream); 已迁移
eventQuery(infinirtEvent_t event, infinirtEventStatus_t *status_ptr) infini::rt::EventQuery(void* event, int* status) cudaEventQuery cudaError_t cudaEventQuery(cudaEvent_t event); 已迁移,InfiniRT 用输出参数表达 complete/not-ready
eventSynchronize(infinirtEvent_t event) infini::rt::EventSynchronize(void* event) cudaEventSynchronize cudaError_t cudaEventSynchronize(cudaEvent_t event); 已迁移
eventDestroy(infinirtEvent_t event) infini::rt::EventDestroy(void* event) cudaEventDestroy cudaError_t cudaEventDestroy(cudaEvent_t event); 已迁移
eventElapsedTime(float *ms_ptr, infinirtEvent_t start, infinirtEvent_t end) infini::rt::EventElapsedTime(float* ms, void* start, void* end) cudaEventElapsedTime cudaError_t cudaEventElapsedTime(float *ms, cudaEvent_t start, cudaEvent_t end); 已迁移
getMemInfo(int device_id, size_t *free_bytes, size_t *total_bytes) infini::rt::GetMemInfo(Device device, std::size_t* free_bytes, std::size_t* total_bytes) cudaMemGetInfo cudaError_t cudaMemGetInfo(size_t *free, size_t *total); 已迁移,InfiniRT 增加 Device 参数
getDeviceResourceSnapshot(int device_id, infinirtDeviceResourceSnapshot_t *snapshot) 无直接对应 API CUDA Runtime 无 GetDeviceResourceSnapshot / resource snapshot 聚合接口 不迁移,保留在 InfiniCore 中
streamBeginCapture(infinirtStream_t stream, infinirtStreamCaptureMode_t mode) 当前 CPU 未迁移 cudaStreamBeginCapture cudaError_t cudaStreamBeginCapture(cudaStream_t stream, enum cudaStreamCaptureMode mode); CPU 当前不迁移,保留 unsupported
streamEndCapture(infinirtStream_t stream, infinirtGraph_t *graph_ptr) 当前 CPU 未迁移 cudaStreamEndCapture cudaError_t cudaStreamEndCapture(cudaStream_t stream, cudaGraph_t *pGraph); CPU 当前不迁移,保留 unsupported
graphDestroy(infinirtGraph_t graph) 当前 CPU 未迁移 cudaGraphDestroy cudaError_t cudaGraphDestroy(cudaGraph_t graph); CPU 当前不迁移,保留 unsupported
graphInstantiate(...) 当前 CPU 未迁移 cudaGraphInstantiate cudaError_t cudaGraphInstantiate(cudaGraphExec_t *pGraphExec, cudaGraph_t graph, unsigned long long flags); CPU 当前不迁移,保留 unsupported;InfiniCore 旧接口参数与新 CUDA 原型不完全一致
graphExecDestroy(infinirtGraphExec_t graph_exec) 当前 CPU 未迁移 无完全同名 CUDA Runtime API 常见对应为 cudaGraphExecDestroy(cudaGraphExec_t graphExec),需按 CUDA Toolkit 版本确认 CPU 当前不迁移,保留 unsupported
graphLuanch(infinirtGraphExec_t graph_exec, infinirtStream_t stream) 当前 CPU 未迁移 cudaGraphLaunch cudaError_t cudaGraphLaunch(cudaGraphExec_t graphExec, cudaStream_t stream); CPU 当前不迁移,保留 unsupported

@spike-zhu spike-zhu force-pushed the feat/extract-infinicore-runtime branch from 866fc8d to 7fd37b5 Compare June 30, 2026 01:49
@spike-zhu spike-zhu marked this pull request as ready for review June 30, 2026 01:51
Comment thread src/runtime.h
void Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind);
int Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind);

int GetMemInfo(Device device, std::size_t* free_bytes,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个对应的 CUDA API 是 cudaMemGetInfo,所以应该改为 MemGetInfo。而且原 API 无 device 参数,这里也不应该有。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

Comment thread src/runtime.h

int StreamSynchronize(void* stream);

int StreamWaitEvent(void* stream, void* event);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱们的似乎缺少了 flags 参数,先加上但是不用就行了。

Comment thread src/runtime.h

int EventCreate(void** event);

int EventCreateWithFlags(void** event, uint32_t flags);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::uint32_t 吧。

Comment thread src/runtime.h

int EventRecord(void* event, void* stream);

int EventQuery(void* event, int* status);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个好像跟 CUDA 的参数列表不一致,这个是为啥?

Comment thread src/runtime.h
Comment on lines +103 to +114
int MallocHost(void** ptr, std::size_t size);

int FreeHost(void* ptr);

int MemcpyAsync(void* dst, const void* src, std::size_t count, MemcpyKind kind,
void* stream);

int MallocAsync(void** ptr, std::size_t size, void* stream);

int FreeAsync(void* ptr, void* stream);

int MemsetAsync(void* ptr, int value, std::size_t count, void* stream);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这几个挪到上面吧,就是上面普通 Memcpy 那些的后面。

Comment thread src/runtime.h
int GetDevice(Device* device);

void GetDeviceCount(int* count, Device::Type type);
int GetDeviceCount(int* count, Device::Type type);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数比 CUDA 的多了个 type 参数,需要去掉。

Comment thread src/runtime.h
Comment on lines +62 to +64
int SetDevice(Device device);

void GetDevice(Device* device);
int GetDevice(Device* device);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个的参数也改成 int 吧,我们这一层的接口后面就跟 CUDA 的完全对齐,一模一样即可。

Comment thread src/native/cpu/runtime_.h

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照刚才讨论的内容,CPU 的先往后放一放,优先搞英伟达 GPU 的吧。

@voltjia

voltjia commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

经过讨论,后面咱们这一层的 API 需要跟 CUDA Runtime API 完全对齐。所谓“完全对齐”,意思就是将 infinirt 替换成 cuda 之后,函数签名(也就是函数名称与参数类型)和返回类型与 CUDA 的 API 完全一致。举个例子:

CUDA 中的 cudaMemcpyAsync 的声明为:

cudaError_t cudaMemcpyAsync(void* dst, const void* src, size_t count, cudaMemcpyKind kind, cudaStream_t stream);

那么在 InfiniRT 里面就应该是:

infinirtError_t infinirtMemcpyAsync(void* dst, const void* src, size_t count, infinirtMemcpyKind kind, infinirtStream_t stream);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants