Qmanchoo Qmanchoo - 4 months ago 4
Bash Question

How to use bash to split and unknown number of source file names into specified number of output files

I am trying to write a script in BASH that will read all the names of files in a source directory, which has an unknown file count, then split those names as evenly as possible between a specified number of output files. User input is source dir, target dir, and target file count.

For example, lets say we have 10 files in a source directory, and the user specifies they want the names of those files to be split between 3 output files.

SOURCE FILE NAMES:


test
test2
test3
test4
test5
test6
test7
test8
test9


RESULTING FILES:


FILE1
test
test2
test3
test4


(has 4)


FILE2
test5
test6
test7


(has 3)


FILE3
test8
test9
test10


(has 3)

So far, I have come up with the below, which will always catch all the files because I'm forcing a 'round up', but wont provide the desired number of output files in all cases as it does not account for an uneven number of file names in an output file.

ls -l -d -1 $source/{*,.*} | tail -n +3 | awk '{printf "%s\n",$9}' > hdpLoadList
fileCount=`ls -1 $source | wc -l`
threadCount=$3

fptf=$(bc <<< "scale=1;($fileCount/$threadCount)+.9")
fptf=${fptf:0:1}

sPos=1
sLen=$fptf
for i in `seq 1 $threadCount`;
do
sed -n ${sPos},${sLen}p hdpLoadList | sed -e ':a;N;$!ba;s/\n/ /g' > hdpLoadP_$i
sPos=$(($sPos+$fptf))
sLen=$(($sLen+$fptf))
done

Answer

Script, let's say split.sh:

#!/bin/bash

# Create an array with all the files
myArray=("$1"/*)

# Get array length
arrayLength=${#myArray[@]}

# Divide length by the third arg value
divRes=$(( $arrayLength / $3 ))

# Iterate through array using a counter
for ((i=0; i<${#myArray[@]}; i++)); do
    count=$(( $i / $divRes ))
    # Use basename to remove folder path
    echo `basename "${myArray[$i]}"` >> "$2/destFile$count"
done

Usage:

bash split.sh srcFolder destFolder 3