firepro20 firepro20 - 1 year ago 52
Perl Question

Count the number of variable combinations in a logfile using perl

I have this logfile

New connection: ( [session: e696835c]
2016-04-29 21:13:59+0000 [SSHService ssh-userauth on HoneyPotTransport,3,] login attempt [user1/test123] failed
2016-04-29 21:14:10+0000 [SSHService ssh-userauth on HoneyPotTransport,3,] login attempt [user1/test1234] failed
2016-04-29 21:14:13+0000 [SSHService ssh-userauth on HoneyPotTransport,3,] login attempt [user1/test123] failed

I want to output to file a result like this:


The "Occurrences" variable will represent the number of times a combination of login details[username and password] that have been recorded in the file.
User1 test123
can be seen recorded two times from the same IP. How can I do this? I have two while loops at the moment and a subroutine being called inside the first while loop like so:


sub counter(){

$result = 0;
#open(FILE2, $cowrie) or die "Can't open '$cowrie': $!";
while(my $otherlines = <LOG2>){

if($otherlines =~ /login attempt/){
($user, $password) = (split /[\s:\[\]\/]+/, $otherlines)[-3,-2];
if($_[1] =~ /$user/ && $_[2] =~ /$password/){
}#if ip matches i think i have to do this with split

#print "TEST\n";
#print "Combo $_[0] and $_[1]\n";

#print "$result";
return $result;

Main method

sub cowrieExtractor(){

open(FILE2, $cowrie) or die "Can't open '$cowrie': $!";

open(LOG2, $path2) or die "Can't open '$path2': $!";

$seperator = chr(42);
#To output user and password of login attempt, set $ip variable to the contents of array at that x position of new
#connection to match the ip of the login attempt
print FILE2 "SourcePort"."$seperator".

$ip = "";
$port = "";
$usr = "";
$pass = "";
$status = "";
$frequency = 0;

#Given this is a user/pass attempt honeypot logger, I will use a wide character to reduce the possibility of stopping
#the WEKA CSV loader from functioning by using smileyface as seperators.

while(my $lines = <LOG2>){

if($lines =~ /New connection/){

($ip, $port) = (split /[\[\]\s:()]+/, $lines)[7,8];

if($lines =~ /login attempt/){#and the ip of the new connection
if($lines =~ /$ip/){
($usr, $pass, $status) = (split /[\s:\[\]\/]+/, $lines)[-3,-2,-1];

$frequency = counter($ip, $usr, $pass);

#print $frequency;
if($ip && $port && $usr && $pass && $status ne ""){
print FILE2 join "$seperator",($port, $status, $frequency, $end);
print FILE2 "\n";



Right now in output under
in output I am getting a
and when I tested it appears to be coming from what I initialize the variable
in the subroutine. i.e. 0; meaning that the if statement inside the subroutine is not working properly. Any help?

Answer Source

Here is a basic way to get expected output. Questions about the context (purpose) remain.

use warnings;
use strict;

my $file = 'logfile.txt';
open my $fh_in, '<', $file;

# Assemble results for required output in data structure:
# %rept = { $port => { $usr => { $status => $freq } };

my %rept;
my ($ip, $port);

while (my $line = <$fh_in>) 
    if ($line =~ /New connection/) {
        ($ip, $port) = $line =~ /New connection:\s+([^:]+):(\d+)/;

    my ($usr, $status) =  $line =~ m/login\ attempt \s+ \[ ( [^\]]+ ) \] \s+ (\w+)/x;
    if ($usr and $status) {
    else { warn "Line with an unexpected format:\n$line" }

# use Data::Dumper;
# print Dumper \%rept;

print "Port,Status,Occurences\n";
foreach my $port (sort keys %rept) {
    foreach my $usr (sort keys %{$rept{$port}}) {
        foreach my $stat ( sort keys %{$rept{$port}{$usr}} ) { 
            print "$port,$stat,$rept{$port}{$usr}{$stat}\n"; 


With your input copied into a file logfile.txt this prints


I take the whole user1/test123 (etc) to identify the user. This can be changed in the regex as needed. Note that this will not allow you to query or organize data very differently, it mostly pulls what is needed for the required output. Please let me know if explanations are needed.

An introductory explanation of the nested hash used above

First, I strongly recommend a good reading of some of the many materials available. A good start is surely the standard tutorial on Perl references, as well as a cookbook of sorts on Perl data structures.

The hash used to collect data has keys which are port numbers, and each of them has for its value a hash reference (or, rather, an anonymous hash). Each of these hashes has keys which are users, which for their values have, again, hash references. The keys for these are the possible values of status, so there are two keys (failed and succeded). Their values are frequencies. This kind of 'nesting' is a complex data structure. There is another important thing. The first time the statement $rept{$port}{$usr}{$status}++ is seen the whole hierarchy is created. So the key $port did not need to exist beforehand. Importantly, this auto vivification happens even if a structure is merely queried for values (unless it actually exists already).

After the first iteration, the hash is

%rept = { '64400' => { 'user1/test123' => { 'failed' => 1 } } }

In the second iteration the same port is seen but a new user, so new data is added to the second-level anonymous hash. The key with the new user is created, with its value being a (new) anonymous hash, with status => count. The whole hash is:

%rept = { 
    '64400' => { 
        'user1/test123'  => { 'failed' => 1 },
        'user1/test1234' => { 'failed' => 1 },

In the next iteration the same port is seen and one of already existing users, and as it happens with the status (failed) which also exists. Thus the count for that status is incremented.

The whole strucure can handily be seen using, for example, the Data::Dumper package. The commented-out lines in the code above would produce (with deeper indentation)

$VAR1 = {
    '64400' => {
        'user1/test123' => {
                                'failed' => 2
        'user1/test1234' => {
                                'failed' => 1

As we keep processing lines new keys are added as needed (ports, users, status) with the full hierarchy down to the count (of 1 the first time), or, if an existing is encountered, its count is incremented. The generated data structure can be traversed and used as seen in the code, for example. Please also see the plentiful documentation for more on that.