Is it possible to synchronize two CUDA streams without blocking the host? I know there's
In case event has been recorded but has not yet been completed when
cudaEventDestroy() is called, the function will return immediately and
the resources associated with event will be released automatically once
the device has completed event.
You're on the right track by using
cudaStreamWaitEvent. Creating events does carry some cost, but they can be created during your application start-up to prevent the creation time from being costly during your GPU routines.
An event is recorded when you you put the event into a stream. It is completed after all activity that was put into the stream before the event has completed. Recording the event basically puts a marker into your stream, which is the thing that enables
cudaStreamWaitEvent to stop forward progress on the stream until the event has completed.