snapo snapo - 3 months ago 40
Bash Question

Base64 encoding line by line faster way

I have a kinda big text file (around 10GB) it fits without any trouble in to memory. My target is to convert every line to a base64 string. Currently my method takes forever and seems not to complete because it is single threaded.

while read line; do echo -n -i $line | base64 >> outputfile.txt; done < inputfile.txt


Can someone give me a hint how to do it faster? This solution creates around 100MB per hour (so finnishing time would be 100h) CPU usage is at 5% and also disk usage is very low.

Seems i got missunderstood about the control characters...
So i included a sample text file, and how the output should be (chepner was correct with the chomp):

Sample Input:

Банд`Эрос
testè!?£$
``
▒``▒`


Sample Output:

user@monster ~ # head -n 5 bash-script-output.txt
0JHQsNC90LRg0K3RgNC+0YE=
dGVzdMOoIT/CoyQ=
YGA=
4paSYGDilpJg

user@monster ~ # head -n 5 perl-without-chomp.txt
0JHQsNC90LRg0K3RgNC+0YEK
dGVzdMOoIT/CoyQK
YGAK
4paSYGDilpJgCg==

user@monster ~ # head -n 5 perl-chomp.txt
0JHQsNC90LRg0K3RgNC+0YE=
dGVzdMOoIT/CoyQ=
YGA=
4paSYGDilpJg


So samples are everytime better then human declarations ;=)

Answer

It might help a little to open the output file only once:

while IFS= read -r line; do echo -n $line | base64; done < inputfile.txt > outputfile.txt

bash is not a good choice here, however, for two reasons: iterating over a file is slow to begin with, and you are starting a new process for each line. A better idea is to use a language that has a library for computing base64 values, so that everything is handled in one process. An example using Perl

perl -MMIME::Base64 -ne 'print encode_base64($_)' inputfile.txt > outputfile.txt