w0rthyw0rks w0rthyw0rks - 3 months ago 8
AppleScript Question

extract string between two strings from text document using AppleScript

I am very new to writing code. I've been looking at every way I can find of finding a string in a text document and then returning part of the string on the following line. Ideally with the end goal of putting this extracted string into an excel file but I'm no where near that step yet. I've been playing around with a lot of different options and I can not for the life of me get it to work. I feel like I'm close and it's killing me because I just can't figure out where I'm going wrong here.

Goal: to extract the name of the person who posted the job from the text below without knowing the person's name. I know the string "Job posted by" will immediately preseed the name I'm looking for and I know " · " will immediately follow the name. no where else in the text document do either of these surround strings appear.

I'm running OS X El Capitan
file name for this example is ExtractedTextOutput.txt
file location for this example is "/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt"


my attempts at this so far are the following (my issue is that it appears to simply return the entire text document as opposed to just the name I'm looking for)

set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
set theFileContents to read theFile

set output to {}
set od to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"
"}

set all_lines to every text item of theFileContents
repeat with the_line in all_lines
if "Job posted by" is not in the_line then
set output to output & the_line
else
set AppleScript's text item delimiters to {"Job posted by"}
set latter_part to last text item of the_line
set AppleScript's text item delimiters to {" "}
set last_word to last text item of latter_part
set output to output & ("$ " & last_word as string)
end if
end repeat

set AppleScript's text item delimiters to {"
"}

set output to output as string
set AppleScript's text item delimiters to od
return output


any and all help and ideas is enormously appreciated.

sample text in the file:
9/2/2016 Application Security Engineer Job at Datadog in Greater New York City Area | LinkedIn
60
Home Profile
Job description
My Network Jobs
 Search for people, jobs, companies, and more... Interests
 Advanced
 
Business Services

Go to Lynda.c
Application Security Engineer
Datadog
Greater New York City Area
Posted 15 days ago 93 views
1 alum works here
Apply on company website
We’re on a mission to bring sanity to cloud operations and we need you to build resilient and secure applications on our platform. What you will do
Perform code and design reviews, contribute code that improves security throughout Datadog's products Educate your fellow engineers about security in code and infrastructure
Monitor production applications for anomalous activity
Prioritize and track application security issues across the company
Help improve our security policies and processes
Job posted by
Ryan Elberg · 2nd
Head of Tech Talent Acquisition at Datadog Greater New York City Area
Send Inmail

Answer

I just had some difficulties to determine what is exactly your second separator. you text example shows '·', but when I checked what is just after 'Elberg" and before '2nd...', I found 4 characters : code 32 (space), code 194 (¬), code 183 (∑), code 32 (space).

In the script bellow, I have used the code 194. it works when I cut/paste your text example into a file. Here is the script :

set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
-- your separator seems to be code 32 (space), code 194 (¬), code 183 (∑), code 32 (space)
set Separator to ASCII character 194 -- is it correct ?

set theFileContents to read theFile
set myAuthor to ""
set AppleScript's text item delimiters to {"Job posted by "}
if (count of text item of theFileContents) is 2 then
set Part2 to text item 2 of theFileContents -- this part starts just after "Job posted by "
set AppleScript's text item delimiters to {Separator}
set myAuthor to text item 1 of Part2
end if

log "result=//" & myAuthor & "//" -- show the result in variable myAuthor

Note : if the text does not contain "Job posted by ", then myAuthor is ''.