I've been looking into writing applications using OpenGL to render data on-screen, and there is one thing that constantly comes up -- it is slow to copy data into the GPU.
I am currently switching between reading the OpenGL SuperBible 7th Edition and reading various tutorials online, and I have not come across when data is actually sent to the GPU, I only have guesses.
glCreateVertexArrays doesn't have anything to do with buffer objects or GPU memory (of that kind), so it's kinda irrelevant.
As for the rest, when OpenGL decides to allocate actual GPU memory is up to the OpenGL implementation. It can defer the actual allocation as long as it wants.
If you're asking about when your data is uploaded to OpenGL, OpenGL will always be finished with any pointer you pass it when that function call returns. So the implementation will either copy the data to the GPU-accessible memory within the call itself, or it will allocate some CPU memory and copy your data into that, scheduling the transfer to the actual GPU storage for later.
As a matter of practicality, you should assume that copying to the buffer doesn't happen immediately. This is because DMAs usually require certain memory alignment, and the pointer you pass may not have that alignment.
But usually, you shouldn't care. Let the implementation do its job.
2: Like the above, the implementation can do whatever it wants when you map memory. It might give you a genuine pointer to GPU-accessible memory. Or it might just allocate a block of CPU memory and DMA it up when you unmap the memory.
The only exception to this is persistent mapping. That feature requires that OpenGL give you an actual pointer to the actual GPU-accessible memory that the buffer resides in. This is because you never actually tell the implementation when you're finished writing to/reading from the memory.
This is also (part of) why OpenGL requires you to allocate buffer storage immutably to be able to use persistent mapping.
3: It is copied whenever the implementation feels that it needs to be.
OpenGL implementations are a black box. What they do is more-or-less up to them. The only requirement the specification makes is that their behavior act "as if" it were doing things the way the specification says. As such, the data can be copied whenever the implementation feels like copying it, so long as everything still works "as if" it had copied it immediately.
Making a draw call does not require that any buffer DMAs that this draw command relies on have completed at that time. It merely requires that those DMAs will happen before the GPU actually executes that drawing command. The implementation could do that by blocking in the
glDraw* call until the DMAs have completed. But it can also use internal GPU synchronization mechanisms to tie the drawing command being issued to the completion of the DMA operation(s).
The only thing that will guarantee that the upload has actually completed is to call a function that will cause the GPU to access the buffer, then synchronizing the CPU with that command. Synchronizing after only the upload doesn't guarantee anything. The upload itself is not observable behavior, so synchronizing there may not have an effect.
Then again, it might. That's the point; you cannot know.