Latency/Energy Calculation #240
Replies: 1 comment
-
|
Transformer Engine itself does not provide a built-in API to directly calculate end-to-end latency or energy consumption for Transformer models such as BERT or ViT. Its primary purpose is to accelerate training and inference using FP8/BF16 mixed precision on supported NVIDIA GPUs (such as the H100), rather than serving as a performance analysis framework. Measuring LatencyThe recommended approach is to benchmark your model using PyTorch with proper GPU synchronization to obtain accurate timings. import time
import torch
model.eval()
x = ...
# Warm-up
for _ in range(20):
with torch.no_grad():
_ = model(x)
torch.cuda.synchronize()
start = time.perf_counter()
with torch.no_grad():
_ = model(x)
torch.cuda.synchronize()
end = time.perf_counter()
print(f"Latency: {(end - start) * 1000:.3f} ms")For reliable measurements:
Measuring Energy ConsumptionTransformer Engine does not expose energy measurements directly. Instead, NVIDIA recommends using GPU telemetry tools such as:
Energy is computed by integrating power over time: [ If the power draw is approximately constant, you can estimate it as: [ Typical Workflow on H100A common evaluation pipeline is:
This is the standard methodology used in many research papers evaluating Transformer Engine and H100 performance. If this answer helped or pointed you in the right direction, I'd appreciate it if you could mark it as the accepted answer so it's easier for others with the same issue to find. Also, if you found my contribution useful, I'd appreciate it if you could check out my GitHub profile, follow me, and star any repositories you find interesting. GitHub: https://github.com/Advait251206 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there is a method to calculate performance metrics (Latency, Energy) of a Transformer model (BERT, ViT) on the H100 Transformer Engine using Pytorch?
Beta Was this translation helpful? Give feedback.
All reactions