Jaffer Wilson Jaffer Wilson - 26 days ago 18
Linux Question

How to extract domain from a text file using Ubuntu Command?

I have a file of URLs, in the format as shown below:

com.blendtuts/S
°=
com.blengineering.www/:http
±=
com.blenheimgang.www/le-porsche-museum-en-details/porsche-museum-3
²=
com.blenheimsi
³=
com.blenkov.www/page/media/18/34/376
´=
com.blentwell.www/bookmarks.php/jackroldan/sp
¸=
com.blentwell.www/tags.php/I


The file size is in GigaBytes. Say around 250 GB of the file size.

I was trying to reverse the words in the file and extract only the domains from the text. I tried to make it using Ubuntu OS terminal commands.
Let me tell you what I have tried:

First I removed the data after “/” using the following command:

~$ ex -sc '%s/\(\/\).*/\1/ | x' newfile.txt > ddm.txt


And the result as:

com.blendtuts/
°=
com.blengineering.www/
±=
com.blenheimgang.www/
²=
com.blenheimsi
³=
com.blenkov.www/
´=
com.blentwell.www/
¸=
com.blentwell.www/


Now I reversed the complete text in the file using the solution from : How to reverse the word in Ubuntu?

And got the following result:

/blendtuts.com
°= /www.blengineering.com
±= /www.blenheimgang.com
²= blenheimsi.com
³= /www.blenkov.com
µ= /www.blentwell.com
¶= /www.blentwell.com
•= /www.blentwell.com

/www.blentwell.com


But still the problem is not solved. I would like to how it is possible to extract URLs and put them into another file using Ubuntu. As you can see above the output what still I have is not the domain, it has a backslash with it.

If there is another solution to such a problem using any other operating system, do let me know. I prefer to go with Ubuntu.

I would like to extract domains out of the file and separate them to another file and that too in a proper format.

If I get the unique domain then it will be an excellent solution to my query. Otherwise, I am using command as:

$ sort filename.txt | uniq > save_to_file.txt


Hope to hear a solution.

Please check here is the sample file: Sample File

Answer

I have got this answer:

$ perl -F/ -anle 'print reverse(split("([^.]*)", $F[0])) if /\./' file_name.txt

One can refer to : http://askubuntu.com/questions/847307/how-to-do-this-in-a-single-command-on-ubuntu-16-04