Philip Lee Philip Lee - 3 months ago 12
Linux Question

How to merge three TCP streams in realtime

I have three bits of networked realtime data logging equipment that output lines of ASCII text via TCP sockets. They essentially just broadcast the data that they are logging - there are no requests for data from other machines on the network. Each piece of equipment is at a different location on my network and each has a unique IP address.

I'd like to combine these three streams into one so that I can log it to a file for replay or forward it onto another device to view in realtime.

At the moment I have a PHP script looping over each IP/port combination listening for up to 64Kb of data. As soon as the data is received or it gets an EOL then it forwards that on to another which that listens to the combined stream.

This works reasonably well but one of the data loggers outputs far more data than the others and tends to swamp the other machines so I'm pretty sure that I'm missing data. Presumably because it's not listening in parallel.

I've also tried three separate PHP processes writing to a shared file in memory (on /dev/shm) which is read and written out by a fourth process. Using file locking this seems to work but introduces a delay of a few seconds which I'd rather avoid.

I did find a PHP library that allows true multithreading using Pthreads called (I think) Amp but I'm still not sure how to combine the output. A file in RAM doesn't seem quick enough.

I've had a good look around on Google and can't see an obvious solution. There certainly doesn't seem to be a way to do this on Linux using command line tools that I've found unless I've missed something obvious.

I'm not too familiar with other languages but are there other languages that might be better suited to this problem ?


You don't necessary need to switch languages, it just sounds like you're not familiar with the concept of IO multiplexing. Check out some documentation for the PHP select call here

The concept of listening to multiple data inputs and not knowing which one some data will come from next is a common one and has standard solutions. There are variations on exactly how its implemented but the basic idea is the same: you tell the system that you're interested in receiving data from multiple source simultaneously (TCP sockets in your case), and run a loop waiting for this data. On every iteration of the loop the system the system tells you which source is ready for reading. In your case that means you can piecemeal-read from all 3 of your sources without waiting for an individual one to reach 64KB before moving on to the next.

This can be done in lots of languages, including PHP.