AishwaryaKulkarni AishwaryaKulkarni - 2 months ago 11
Bash Question

Sorting alphanumeric and numeric columns together

I have a file wherein I need to sort the E index followed by I index such that the file, I have many such indexes not just the one mentioned below:

**chr3 148813677 148815677 ENSG00000071794:I1 -**
chr3 148804104 148804291 ENSG00000071794:E1 -
chr3 148804291 148804292 ENSG00000071794:E1 -
chr3 148804292 148804309 ENSG00000071794:E1 -
chr3 148804309 148804317 ENSG00000071794:E1 -
chr3 148804317 148804341 ENSG00000071794:E1 -
chr3 148802469 148802676 ENSG00000071794:E2 -
chr3 148801419 148801522 ENSG00000071794:E3 -
chr3 148793668 148793834 ENSG00000071794:E4 -
chr3 148792002 148792135 ENSG00000071794:E5 -
chr3 148791012 148791109 ENSG00000071794:E6 -
chr3 148789370 148789444 ENSG00000071794:E7 -
chr3 148802677 148804103 ENSG00000071794:I1 -
chr3 148801523 148802468 ENSG00000071794:I2 -
chr3 148793835 148801418 ENSG00000071794:I3 -
chr3 148792136 148793667 ENSG00000071794:I4 -
chr3 148791110 148792001 ENSG00000071794:I5 -
chr3 148789445 148791011 ENSG00000071794:I6 -
chr3 148789231 148789369 ENSG00000071794:I7 -


becomes

chr3 148789231 148789369 ENSG00000071794:I7 -
chr3 148789370 148789444 ENSG00000071794:E7 -
chr3 148789445 148791011 ENSG00000071794:I6 -
chr3 148791012 148791109 ENSG00000071794:E6 -
chr3 148791110 148792001 ENSG00000071794:I5 -
chr3 148792002 148792135 ENSG00000071794:E5 -
chr3 148792136 148793667 ENSG00000071794:I4 -
chr3 148793668 148793834 ENSG00000071794:E4 -
chr3 148793835 148801418 ENSG00000071794:I3 -
chr3 148801419 148801522 ENSG00000071794:E3 -
chr3 148801523 148802468 ENSG00000071794:I2 -
chr3 148802469 148802676 ENSG00000071794:E2 -
chr3 148802677 148804103 ENSG00000071794:I1 -
chr3 148802677 148804103 ENSG00000071794:I1 -
chr3 148804104 148804291 ENSG00000071794:E1 -
chr3 148804291 148804292 ENSG00000071794:E1 -
chr3 148804292 148804309 ENSG00000071794:E1 -
chr3 148804309 148804317 ENSG00000071794:E1 -
chr3 148804317 148804341 ENSG00000071794:E1 -


By removing any duplicate IDs (in this case first row in bold) that are out of the consecutive order between columns 2 and 3 of each row. In short I want to order my rows in terms of the IDs esp everything after ':' and the positions in 2nd and 3rd columns.

Answer
$ sort -k 4,3 my_file 

chr3    148789231   148789369   ENSG00000071794:I7  -
chr3    148789370   148789444   ENSG00000071794:E7  -
chr3    148789445   148791011   ENSG00000071794:I6  -
chr3    148791012   148791109   ENSG00000071794:E6  -
chr3    148791110   148792001   ENSG00000071794:I5  -
chr3    148792002   148792135   ENSG00000071794:E5  -
chr3    148792136   148793667   ENSG00000071794:I4  -
chr3    148793668   148793834   ENSG00000071794:E4  -
chr3    148793835   148801418   ENSG00000071794:I3  -
chr3    148801419   148801522   ENSG00000071794:E3  -
chr3    148801523   148802468   ENSG00000071794:I2  -
chr3    148802469   148802676   ENSG00000071794:E2  -
chr3    148802677   148804103   ENSG00000071794:I1  -
chr3    148804104   148804291   ENSG00000071794:E1  -
chr3    148804291   148804292   ENSG00000071794:E1  -
chr3    148804292   148804309   ENSG00000071794:E1  -
chr3    148804309   148804317   ENSG00000071794:E1  -
chr3    148804317   148804341   ENSG00000071794:E1  -