s1ddok s1ddok - 2 months ago 6
Linux Question

Sort and count unique entries in an array of strings

What I need to do is to get the list of users from

etc/group
, then sort it and then count unique entries.

Right now I only managed to get the usernames. But I suspect that it is wrong.

#!/bin/bash
usernames=();

while IFS=: read -r Groups Tmp1 Tmp2 Username
do
if [ $Username!="" ];
then
usernames+=($Username);
fi;
done < /etc/group


Then I also tried to sort it, but the output is VERY weird:

Sorting:

IFS=$'\n' sorted=($(sort <<<"${usernames[*]}"))
unset IFS


Output:

echo ${usernames[@]}
echo ""
echo ${sorted[@]}


Result:

root root root root root root _teamsserver root root _taskgated root root,_jabber,_postfix,_cyrus,_calendar,_dovecot _calendar,_jabber,_postfix _devicemgr,_teamsserver _eppc root _teamsserver _devicemgr _softwareupdate _locationd _teamsserver _devicemgr,_calendar,_teamsserver,_xserverdocs _teamsserver,_devicemgr _warmd

_calendar,_jabber,_postfix _devicemgr _devicemgr,_calendar,_teamsserver,_xserverdocs _devicemgr,_teamsserver _eppc _locationd _softwareupdate _taskgated _teamsserver _teamsserver _teamsserver _teamsserver,_devicemgr _warmd root root root root root root root root root root root,_jabber,_postfix,_cyrus,_calendar,_dovecot


I have zero bash experience and absolutely can't get it to work.

What I need the most basic solution to get the sort list of usernames from /etc/group with only the unique entries and print the amount of repitions of each.

For ex if I have this
/etc/group
file:

nobody:*:-2:
nogroup:*:-1:
wheel:*:0:root
daemon:*:1:root
kmem:*:2:root
sys:*:3:root
tty:*:4:root
operator:*:5:root
mail:*:6:_teamsserver


I want to get this:

root 6
_teamsserver 1

Answer

Each 'username' field is actually an optionally empty comma-separated list of user names. To get the user names separated, you'll need to split the entries on commas.

If I started from your loop, I'd probably use:

sorted=($(while IFS=: read -r Groups Tmp1 Tmp2 Usernames
          do
              if [ -n "$Usernames" ];
              then
                  echo "$Usernames"
              fi
          done < /etc/group |
          tr ',' '\n' |
          sort -u
       ))

 echo "${sorted[@]}"

This bypasses the intermediate usernames array. If you really want that, then keep your original loop and pipe the input to sort through the tr command before sort:

IFS=$'\n' sorted=($(tr ',' '\n' <<<"${usernames[*]}" | sort -u))

This generates an array, sorted, containing the list of unique names in sorted order.

If all you want is a count of the unique names, I'd probably do the whole thing in awk, though. Indeed, I'd be tempted to use awk instead of the while loop.

If you want a count of the occurrences of each unique name, then instead of sort -u you'd use sort | uniq -c. The options and variants on the statistics are legion — the key point is that you need to split the last field of the /etc/group file on the commas. If you have spaces in that list for some reason, you may have to get rid of those, too. tr ', ' '\n' would do that.

Using awk, you could do:

awk -F: '{ n = split($4, a, ","); for (u = 1; u <= n; u++) count[a[u]]++i }
         END { for (u in count) print u, count[u] }' /etc/group

It splits the fourth field into the array a, then counts the occurrences of each name in the count array. At the end, it prints the entries from the count array. On my Mac, it yielded:

root 11
_warmd 1
_locationd 1
_jabber 2
_taskgated 1
_postfix 2
_devicemgr 4
_calendar 3
_cyrus 1
_teamsserver 6
_dovecot 1
_xserverdocs 1
_eppc 1
_softwareupdate 1

You can further sort that as required.