Introduction to GPU tracking

AdePT is designed as a Geant4 plugin that accelerates applications by offloading tracking to the GPU. To minimize changes in existing applications, AdePT invokes user code almost exactly as Geant4 does.

At the same time, GPU transport is inherently parallel, while Geant4 CPU tracking is inherently serial. This difference cannot be eliminated and has practical consequences for user actions and sensitive detectors.

This page explains the offloading model, the resulting behavior changes, and best practices for integrating user code.

Offloading of tracks to the GPU

Like G4HepEm, AdePT uses the virtual Geant4 G4VTrackingManager class to provide custom tracking for selected particle types.

This custom tracking is implemented in AdePTTrackingManager, an implementation of the virtual G4VTrackingManager. Once a G4VTrackingManager implementation is attached to a particle type, Geant4 delegates tracking of that particle type entirely to the custom manager. Currently, AdePTTrackingManager is attached to electrons, positrons, and gammas.

The flow graph is shown in Figure 1:

  1. Geant4 pops a particle from the stack.

  2. If it is an electron, positron, or gamma, the track is handed to AdePTTrackingManager; otherwise, it remains in standard Geant4 tracking.

  3. Inside AdePTTrackingManager, the track is passed either to CPU or GPU.

  4. CPU path: hand over to G4HepEmTrackingManager for efficient CPU tracking. The G4HepEmTrackingManager is itself an implementation of the G4VTrackingManager that provides an optimized transport of electrons, positrons, and gammas on CPU.

  5. GPU path: offload to AdePT GPU transport.

AdePT supports G4Region-based routing. You can run tracks on GPU in:

  • all regions,

  • all except selected regions, or

  • only selected regions.

Regions can be kept on CPU. For example, if a detector uses a parameterized shower and tracks only few particles in that region, it can be faster to keep that detector part on CPU, since that parameterized shower is not implemented on GPU yet.

The runtime controls for region-based offloading are documented in Specify the regions where the GPU is used.

Offloading of Geant4 tracks to AdePT GPU transport.

Figure 1 Offloading of electron, positron, and gamma tracks from Geant4 CPU tracking to AdePT GPU transport.

To avoid application-side code changes, AdePT aims to keep user-code callbacks consistent with Geant4 behavior (sensitive detectors, TrackingAction, SteppingAction, etc.). To do this, step information is copied back from GPU to CPU before invoking user code. The implications are described in User code in GPU-enabled simulations.

User code in GPU-enabled simulations

AdePT does not require user code to be ported to GPU: GPU step information is copied back to CPU, and CPU user code is invoked there.

However, because GPU tracking is parallel, this model introduces stricter rules.

Serial tracking in Geant4

In Geant4, tracks are processed serially, as shown in Figure 2.

The gray parent track is processed first:

  1. PreUserTrackingAction (TA) is called.

  2. The track advances step by step; SteppingAction (SA) is called at each step (including both G4UserSteppingAction and sensitive-detector calls).

  3. If secondaries are produced, they are pushed onto the stack.

  4. At end of track, PostUserTrackingAction and StackingAction (SkA) are called.

  5. SkA loops over secondaries and classifies which tracks to process next.

Serial tracking in Geant4

Figure 2 Sketch of the serial tracking sequence in Geant4.

Parallel tracking in AdePT

In AdePT, tracks are processed in parallel, as shown in Figure 3.

As soon as the gray parent track produces a secondary, that secondary is tracked immediately in the next step. In realistic jobs, many tracks can be active in parallel.

Because of that, a Geant4-style SkA stage is not suitable on GPU: the GPU should not wait for track end before deciding on secondaries. Instead, decisions must be made immediately in stepping logic on GPU.

Parallel tracking in AdePT

Figure 3 Sketch of the parallel tracking sequence in AdePT.

Consequences for CPU user code

To invoke the same user actions on CPU, AdePT copies GPU step information back to CPU. Since tracks are processed in parallel, steps can return interleaved across tracks. This has two key consequences:

  • The CPU no longer receives active G4Track objects, but reconstructed steps from GPU. These reflect the outcome of the step, and the real track may already have advanced or finished on GPU. Consequently, the track cannot be killed in UserSteppingAction or in a sensitive detector, because that would only affect the copied step on CPU, not the real track on GPU.

  • Steps arrive in order per track, but interleaved with steps from other tracks. Therefore, user actions must not cache per-track state in shared members (for example, m_initial_fourmomentum), because such state becomes invalid when tracks are interleaved. This information must be stored with the track itself, e.g. in G4VUserTrackInformation.

Summary of do’s and don’ts

Topic

Do

Don’t

Why

Access to tracking state

Use G4Step/G4Track data directly when possible.

Do not rely on G4SteppingManager or G4TrackingManager internals

With custom tracking and parallel execution, manager internals are not reliable integration points, as they are not used by the G4VTrackingManager-implementations

Per-track state in user code

Store per-track state on the track (e.g. G4VUserTrackInformation).

Do not cache mutable per-track state in shared user-action or SD members.

Steps from different tracks can be interleaved, so caches can be invalidated.

Track killing

Implement track-killing decisions on GPU.

Do not attempt to kill tracks from CPU stepping action or sensitive detector code.

CPU receives reconstructed step snapshots, not active GPU tracks.

Classification logic

Implement immediate classification in GPU stepping logic.

Do not rely on StackingAction for GPU classification.

GPU transport processes secondaries immediately; there is no GPU StackingAction stage.

Note: G4Trajectory is currently stored in G4TrackingManager, so it is not parallel-tracking compatible in this workflow. This requires a Geant4-side fix.