Mahendra prabhu Mahendra prabhu - 1 month ago 17
Groovy Question

How to merge two rows data into single row in groovy script/nifi?

my data is in form of unstructured data in which having end of column stored in two rows like below.

UID|Name|ID|Mail
1|Ester|991|sd
gmail
2|Siva|992|siva
hotmail
3|Hari|993|hi gmail


Some rows in data has been fulfilled but some rows to avoid those two line data into single line like below.

UID|Name|ID|Mail
1|Ester|991|sd gmail
2|Siva|992|siva hotmail
3|Hari|993|hi gmail


I don't know that nifi processors in which helpful for this conversion.

But i have tried following Groovy Script to read lines and not able to find way to combine spitted rows into single row.

def flowfile = session.get()
if(!flowfile)return
flowfile = session.write(flowfile, {rawIn, rawOut->
// ## transform streams into reader and writer
rawIn.withReader("UTF-8"){reader->
rawOut.withWriter("UTF-8"){writer->
reader.eachLine{line, lineNum->
if(!line.isEmpty())
{// ## let use regular expression to transform each line
writer << line << '\n'
}
}
}
}
} as StreamCallback)
session.transfer(flowfile, REL_SUCCESS)


Can anyone suggest me idea convert my data into requirement?

Answer Source

i assume that first line with headers could not have new line symbol and provides the number of delimiters

the following lines just check the count of delimiters and take a decision to write new line or not.

but this algorithm will work if you have new line in the last column...

code snippet:

def reader = new StringReader('''UID|Name|ID|Mail
1|Ester|991|sd
gmail
2|Siva|992|siva
hotmail
3|Hari|993|hi gmail''')

def writer = new StringWriter()

def delimCount = 0
reader.eachWithIndex{line,id->
    if(id==0){
        //let's count delims in header
        delimCount = line.count('|')
        //write header as is
        writer << line
    }else{
        if( line.count('|')==delimCount ){
            writer << '\n' //write new line
        }else{
            writer << ' ' //write space to continue previous line
        }
        writer << line
    }
}

println writer.toString()

result:

UID|Name|ID|Mail
1|Ester|991|sd gmail
2|Siva|992|siva hotmail
3|Hari|993|hi gmail