S.Khalil S.Khalil - 4 months ago 18
AppleScript Question

Manage looping on txt file with AppleScript

I have a text file that looks like this: screenshot below
http://i.stack.imgur.com/AqKzS.png

Each item has this format:

ID<>Text

~~

ID<>Text

~~

I want to fetch the ID in an INT to be used later. And the Text in a String to be used later.

I looped over the file many times using delimiters "<>" & "~~". However, I fail each time with a different script error.


first I faced difficulties because the file contains a lot of newlines throughout the "Text".
Also, the text sometimes contains an English paragraph followed by an Arabic paragraph, as showed in the Screenshot.

The ID as highlighted should be {9031}
and the Text should be
{N/M06"El Patio.......

......

....

....

....

Arabic Text.....}


Can someone help me with the correct script to loop over this text file and fetch each ID followed by its text to be used in a DataEntry process?

Answer

For this purpose I recommend to install Satimage sax 3.7.0

The benefit is to find text with regular expression.

Then you easily filter the text with find text

set theText to read file "HD:Path:to:text.txt" as «class utf8» -- replace the HFS path with the actual path
set theResult to {}
set matches to find text "\\d{1,4}<>.*" in theText with regexp and all occurrences
repeat with aMatch in matches
    tell aMatch's matchResult
        set end of theResult to {text 1 thru 4, text 7 thru -1}
    end tell
end repeat

find text returns a record:

matchLen: length of the match
matchPos: offset of the match (0 is the first character!)
matchResult: the matching string (possibly formatted according to the "using" parameter)

The result of the script in variable theResult is a list of lists containing the id and the text. The text starts after the <> but you might cut more characters.

Edit:

It seems that the regex can't parse this text (or my regex knowledge is too bad).

This is a pure AppleScript version without the Scripting Addition.

set theText to read file ((path to desktop as text) & "description.txt") as «class utf8» -- replace the HFS path with the actual path
set {TID, text item delimiters} to {text item delimiters, ("~~" & linefeed)}
set theMatches to text items of theText
set text item delimiters to TID
set theResult to {}
repeat with aMatch in theMatches
    if length of aMatch > 1 then
        tell aMatch
            set end of theResult to {text 1 thru 4, text 7 thru -1}
        end tell
    end if
end repeat