I am currently writing an engine in Opengl 3.3 with heavy fullscreen post-processing (3 passes, for blurring shadow, lights, etc...).
For each pass, I have to render the exact same 2 triangles that cover the entire screen. And the exact same fragments have to be generated one by one, which are exactly all the pixels of the window.
The engine currently runs at a satisfying 200 FPS on a GeForce 570, but it's struggling at around 15 FPS on Intel integrated graphics.
If I halve the resolution of the window and adjust the textures accordingly, it runs about 3x faster. So it's definitely GPU-bound and limited by the amount of post-processing.
It seems wasteful to me that I keep generating the exact same fragments 600 times per second. So my question is the following, is there any feature in OpenGL 3/4 specifically designed for simplifying full-screen rendering, for example by telling OpenGL that the fragments he's trying to generate are just the rectangle of the screen and there's no guesswork to do?
is there any feature in OpenGL 3/4 specifically designed for simplifying full-screen rendering, for example by telling OpenGL that the fragments he's trying to generate are just the rectangle of the screen and there's no guesswork to do?
No, there is not. But even if there was, it wouldn't matter.
In multi-stage chemical synthesis, there is the concept of the "rate-determining step." That is, if you have a reaction where A produces B, and a further reaction where B produces C, one of these reactions will be slower than the other (oftentimes much slower). That one is the rate-determining step: the total process will never be faster than the slowest step.
If A->B takes 0.01 seconds, and B->C takes 1 seconds, the overall process is going to take 1 second. For 0.99 seconds, there's going to be a lot of B sitting around, waiting to get turned into C. So it really doesn't matter if you find a way to make A->B take 0.001 seconds; if you want C, it's still going to take 1 second to get it.
The same here. You want to do post-processing passes over your scene. Well, that takes lots of bandwidth and FS computation resources. The time spent processing CPU commands and generating triangles is trivial next to the bandwidth and FS time.
So even if you could make the already fast part of this process slightly faster, it would mean nothing for your overall performance. The commands would just be sitting there, waiting for bandwidth and FS resources to become available.