houcros houcros - 3 months ago 17
Scala Question

In Flink, how to write DataStream to single file?

The

writeAsText
or
writeAsCsv
methods of a
DataStream
write as many files as worker threads. As far as I could see, the methods only let you specify the path to these files and some formatting.

For debugging and testing purposes, it would be really useful to be able to print everything to a single file, without having to change the set up to having a single worker thread.

Is there any non-overly-complicated way to achieve this? I suspect it should be possible implementing a custom
SinkFunction
, but not sure about that one (besides, it also feels like a hassle for something that seems relatively simple).

Answer

You can achieve this by setting the parallelism to 1. This way, the writing happens only on one machine.

writeAsText(path).setParallelism(1);