I'm working on a project that involved reading source code files, looking for certain tokens.
(In my case, I'm looking to determine if an Objective-C class implements a protocol.) The problem is that, while I can just scan lines for the keyword, it could show up in a comment or string.
What's the correct way of handling this? Do I need to tokenize the entire file and lex it? Is there an easier way?
Your problem is that when you scan the file line by line, you may match the keyword, but it can be part of a comment? For example
/* keyword */
In this case you can just set a variable to True if you encounter a comment, and set it back to False if you find the end of the comment. If you find the keyword while the variable is False, you know that the keyword appears in the code. In a similar way you can check if // appears in the same line.
'//' in currentLine