I have a file of URLs, in the format as shown below:
The file size is in GigaBytes. Say around 250 GB of the file size.
I was trying to reverse the words in the file and extract only the domains from the text. I tried to make it using Ubuntu OS terminal commands.
Let me tell you what I have tried:
First I removed the data after “/” using the following command:
~$ ex -sc '%s/\(\/\).*/\1/ | x' newfile.txt > ddm.txt
And the result as:
Now I reversed the complete text in the file using the solution from : How to reverse the word in Ubuntu?
And got the following result:
But still the problem is not solved. I would like to how it is possible to extract URLs and put them into another file using Ubuntu. As you can see above the output what still I have is not the domain, it has a backslash with it.
If there is another solution to such a problem using any other operating system, do let me know. I prefer to go with Ubuntu.
I would like to extract domains out of the file and separate them to another file and that too in a proper format.
If I get the unique domain then it will be an excellent solution to my query. Otherwise, I am using command as:
$ sort filename.txt | uniq > save_to_file.txt
Hope to hear a solution.
Please check here is the sample file: Sample File