DomainsFeatured DomainsFeatured - 2 months ago 8
Linux Question

How To Prepend String Only If Match Found Between Two Files

I'm working on a couple files that contain urls. I have tried using sed, cut and grep, but I'm really unsure of how to approach this. If you could just get me moving in the right direction, I would really appreciate it.

File 1:

https://example1.com
http://example2.com


File 2:

example1.com/example1-is-https-domain/
example1.com/need-https-in-front/
example1.com/match-me-to-https/
example1.com/example-https-not-http/
example2.com/im-an-http-domain/
example2.com/must-match-to-example2/
example2.com/path-of-http/
example2.com/http-domain-not-https/
example3.com/this-should-not-match/
example3.com/this-page-is-not-required/


Desired output:

https://example1.com/example1-is-https-domain/
https://example1.com/need-https-in-front/
https://example1.com/match-me-to-https/
https://example1.com/example-https-not-http/
http://example2.com/im-an-http-domain/
http://example2.com/must-match-to-example2/
http://example2.com/path-of-http/
http://example2.com/http-domain-not-https/


My approach:

I'm thinking I can use grep with the option to match after '//' and then would need to use another command to paste together what is found? Here's where I'm struggling a bit. Any help is very much appreciated.

Summary:

I'm really trying to prepend the correct http or https to the matching domain between File 1 and 2.

Answer

Let's see:

awk 'BEGIN{OFS=FS="/"}NR==FNR{k[$3]=$0;next}$1 in k{$1=k[$1];print}'

I think it makes the job, but I don't have an awk right here to test it.

It creates a dictionary with the selected domains with the first file (NR==FNR), and for the second file it looks the domain in the created dictionary, if exists, then it replace the domain name with the full record from file 1 and then print all