I can create a multi-line string using this syntax:
string = str("Some chars "
"Some more chars")
Some chars Some more chars
Read the reference manual, it is in there, specifically:
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings,
This is why:
string = str("Some chars " "Some more chars")
is exactly the same as:
str("Some chars Some more chars").
This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with
str above) et cetera.
The only caveat is when a string literal is not contained between one of the grouping delimiters
, but instead, spreads between two seperate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:
string = "Some chars " \ "Some more chars"
Of course, concatenation of strings on the same physical line does not require the backslash. (
string = "Hello " "World" is just fine)
Is Python joining these two separate strings or is the editor/compiler treating them as a single string?
Python is, now when exactly does Python do this is where things get interesting.
From what I could gather (take this with a pinch of salt, I'm not a parsing expert), this happens when Python transforms the parse tree (
LL(1) Parser) for a given expression to it's corresponding AST (Abstract Syntax Tree).
You can get a view of the parsed tree via the
import parser expr = """ str("Hello " "World") """ pexpr = parser.expr(expr) parser.st2list(pexpr)
This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in
-- rest snipped for brevity -- [322, [323, [3, '"hello"'], [3, '"world"']]]]]]]]]]]]]]]]]], -- rest snipped for brevity --
As you can see in the snipped version I added, you have two different entries corresponding to the two different
str literals in the expression parsed.
Next, we can view the output of the AST tree produced by the previous expression (hint, via the
p = ast.parse(expr) ast.dump(p) # this prints out the following: "Module(body = [Expr(value = Call(func = Name(id = 'str', ctx = Load()), args = [Str(s = 'hello world')], keywords = ))])"
The output is more user friendly in this case; you can see that the
args for the function call is the single concatinated string
In addition, I also stumbled upon a cool module that generates a better tree view for
ast nodes. Using it, the output of the expression
expr is visualized like this:
As you can see, in the terminal leaf node we have a single
str object, the joined string for
If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at
Parser/pgen.c while the code transforming the parse tree into an Abstract Syntax Tree is in
This information is for
Python 3.5 and I'm pretty sure that unless you're using some really old version (
< 2.5) the functionality and locations should be similar.
Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video
From Source to Code: How CPython's Compiler Works.