Krazy Glew Krazy Glew - 2 months ago 7
Perl Question

Are there any good uses for multiple Perl fat commas in series ( a => b => 1 )?

BRIEF: Q: are there any good uses for Perl's fat commas in series?

e.g. func_hash_as_array_arg( a=>b=>1 )

DETAIL:

I just got bitten by a bug caused by two fat commas / fat arrows in series:

$ bash $> perl -e 'use strict; use warnings; my @v = ( a=> b => 1 )'



actually in a function; actually in a constructor for an object (blessed hash), so I was thinking {} when it was new( a=>b=>1).

$ bash $> perl -e '
use strict; use warnings;
sub kwargs_func{ print "inside\n"; my %kw = $_[0] ;};
kwargs_func( a=> b => 1 )
'
inside
Odd number of elements in hash assignment at -e line ##.



Obviously I found the bug fairly quickly - but I would prefer to have had a compile-time error or warning rather than a run-time error.

Q: are there any good uses for fat commas in series?

I am surprised that there was not a 'use warnings' warning for this.




Here's a contrived example of a semi-legitimate use. One that I can imagine encountering in real life:

I do a lot of graph code.

Imagine entering a constant graph like K3, K4, or K3,3 (I will assume that all arcs are bidirectional)

One might enter such graphs as pairs, like

K3: (a<=>b, a<=>b, b<=>c).


But it might be nice to enter it as

K3: (a<=>b<=>c<=>a).


Less repetition, as one gets to bigger graphs.

E.g. K4 written as pairs is

K4: ( a<=>b, a<=>c, a<=>d, b<=>c, b<=>d )


whereas using these "chains" K4 is:

K4: (a<=>b<=>c<=>d<=>a<=>c,b<=>d)


I have written what we now call DSL (Domain Specific Languages) that accept such "chain" notations. Note: using <=> above, deliberately non-Perl friendly syntax.

Of course, in Perl one would have to indicate the end of such a chain, probably by undef:

K4: (a=>b=>c=>d=>a=>c=>undef,b=>d=>undef)


although one might elide the last undef.

I am too lazy to type in K3,3, so let me enter K3,2:

DSL pairs K3,2: (a<=>x, a<=>y, b<=>x, b<=>y, c<=>x, c<=>y )

DSL chains: K3,2: (y<=>a<=>x<=>b<=>y<=>c<=>x)

Perl pairs K3,2: (a=>x, a=>y, b=>x, b=>y, c=>x, c=>y )

Perl chains: K3,2: (y=>a=>x=>b=>y=>c=>x=>undef)





I like functions with keyword arguments. In Perl there are two main ways to do this:

func_hash_as_array_arg( kwarg1=>kwval1, kwarg2=>kwval2 )
func_hashref_as_scalar_arg( { kwarg1=>kwval1, kwarg2=>kwval2 } )


which can be mixed with positional in a reasonably nice way

func( posarg1, posarg2, kwarg1=>kwval1, kwarg2=>kwval2 )
func( posarg1, posarg2, { kwarg1=>kwval1, kwarg2=>kwval2 } )


and also in less nice ways

func( { kwarg1=>kwval1, kwarg2=>kwval2 }, varargs1, vargags2, ... )


Although I prefer f(k1=>v1) to f({k1=>v1}) - less clutter - the fact that the hashref "keyword argument group" gives more compile-time checking is interesting. I may flip.

Of course, the real problem is that Perl needs a proper syntax for keyword arguments.

Perl6 does it better.




For grins, some related code examples with 2 fat commas in series.

$ bash $> perl -e 'use strict; use warnings; my %v = ( a=> b => 1 )'
Odd number of elements in hash assignment at -e line 1.


$ bash $> perl -e 'use strict; use warnings; my $e = { a=> b => 1 }'
Odd number of elements in anonymous hash at -e line 1.


$ bash $> perl -e 'use strict; use warnings; my $e = [ a=> b => 1 ]'


$ bash $> perl -e '
use strict; use warnings;
sub kwargs_func{ print "inside\n"; my %kw = $_[0] ;};
kwargs_func( a=> b => 1 )
'
inside
Odd number of elements in hash assignment at -e line ##.


$ bash $> perl -e '
use strict; use warnings;
sub kwargs_func{ print "inside\n"; my %kw = %{$_[0]} ;};
kwargs_func( {a=> b => 1} )
'
Odd number of elements in anonymous hash at -e line ##.
inside


Answer

---+ BRIEF

In addition to notation for graphs and paths (like Travelling Salesman, or critical path), multiple serial fat arrow/commas can be nice syntactic sugar for functions that you might call like

# Writing: creating $node->{a}->{b}->{c} if it does not already exist
assign_to_path($node=>a=>b=>c=>"value"); 

# Reading
my $cvalue = follow_path($node=>a=>b=>c=>"default value);

the latter being similar to

my $cvalue = ($node->{a}->{b}->{c})//"default value);

although you can do more stuff in a pointer chasing / hashref path following function than you can with //

It turned out that I already had such functions in my personal library, but I did not know that you could use a=>b=>"value" with them to make them look less ugly where used.

---+ DETAIL

I usually try not to answer my own questions on this forum, encouraging others to - but in this case, in addition to the contrived example I posted inside and shortly after the original question, I have since realized what I think is a completely legitimate use for multiple fat arrow/commas in series.

I would not complain if multiple fat arrows in series were disallowed, since they are quite often a real bug, but there are at least two places where they are appropriate.

(1) Entering Graphs as Chains

Reminder: my first, totally contrived, use case for multiple fat pointer/commas in series was to make it easier to enter certain graphs by using "chains". E.g. a classic deadlock graph would be, in pairs { 1=>2, 2=>1 }, and as a "chain" (1=>2=>1). If you want to show a graph that is one big cycle with a "chord" or shortcut, it might look like ([1=>2=>3=>4=>5=>6=>1],[3=>6]).

Note that I used node numbers: if I wanted to use node names, I might have to do (a=>b=>c=>undef) to avoid having to quote the last node in a cycle (a=>b=>"c"). This is because of the implicit quote on the left hand but not the right hand argument. Since you have to but up with undef to support node names anyway, one might just "flatten" ([1=>2=>3=>4=>5=>6=>1],[3=>6]) to ([1=>2=>3=>4=>5=>6=>1=>undef,3=>6=>undef). In the former end of chain is indicated by end of array [...]. In the latter, by undef. Using undef makes all of the nodes at the left hand of a =>, so syntactically uniform.

I admit that tis is contrived - it was just the first thing that came to mind.

(2) Paths as a data type

Slightly less contrived: imagine that you are writing, using, or testing code that is seeking "paths" through a graph - e.g. Hamiltonians, Traveling Salesman, mapping, electronic circuit speed path analysis. For that matter, any critical path analysis, or data flow analysis.

I have worked in 4 of the 6 areas I just listed. Although I have never used Perl fat arrow/commas in such code (usually Perl is to slow for such code when I have been working on such tasks), I can certainly avow that, although it is GOOD ENOUGH to write (a,b,c,d,e) in a computer program, in my own notes I usually draw arrows (a->b->c->d->e). I think that it would be quite pleasant to be able to code it as (a=>b=>c=>d=>e=>undef), even with the ugly undefs. (a=>b=>c=>d=>e=>undef) is preferable to qw(a b c d e), if I were trying to make the code resemble my thinking.

"Trying to make the code resemble my thinking" is often what I am doing. I want to use the notations common to the problem area. Sometimes I will use a DSL, sometimes write my own, sometimes just write some string or text parsing routines But if a language like Perl has a syntax that looks almost familiar, that's less code to write.

By the way, in C++ I often express chains or paths as

Path p = Path()->start("a")->link_to("b")->link_to("c")->end("d");

This is unfortunately verbose, but it is almost self-explanatory.

Of course, such notations are just the programmer API: the actual data strcture is usually well hidden, and is seldom the linear linked list that the above implies.

Anyway - if I need to write such "path-manipulating" code in Perl, I may use (a=>b=>c=>undef) as a notation -- particularly when passed to a constructor like Path(a=>b=>c=>undef) which creates the actual data structure.

There might even be some slightly more pleasant ways of dealing with the non-quoting of the fit arrow/comma's right hand side: eg. sometimes I might use a code like 0 or -1 to indicate closed loops (cycles) or paths that are not yet complete: Path(a=>b=>c=>0) is a cycle, Path(a=>b=>c=>-1) is not. 0 rather looks like a closed loop. It is unfortunate that this would mean that you could not have numeric nodes. Or one might leverage more Perl syntax: Path(a=>b=>c=>undef), Path(a=>b=>c=>[]), Path(a=>b=>c=>{}).

All we are doing here is using the syntax of the programming language to create notations that resemble the notation of the problem domain.

(3) Finally, a use case that is more "native Perl"-ish.

Have you ever wanted to access $node->{a}->{b}->{c}, when it is not guaranteed that all of the elements of the path exist?

Sometimes one ends up writing code like

When writing:

$node = {} if not defined $node;
$node->{a} = {}  if not exists $node->{a};
$node->{a}->{b} = {}  if not exists $node->{a}->{b};
$node->{a}->{b}->{c} = 0;

When reading ... well, you can imagine. Before the introduction of the // operator, I would have been too lazy to enter it. With the // operator, such code might look like:

my $value = $node->{a}->{b}->{c}//"default value if the path is incomplete";

Yeah, yeah... one should never expose that much detail of the datastructure. Before writing code like the above, one should refactor to a nice set of object oriented APIs. Etc.

Nevertheless, when you have to deal with somebody else's Perl code, you may run into the above. Especially if that somebody else was an EE in a hurry, not a CS major.

Anyway: I have long had in my personal Perl library functions that encapsulate the above.

Historically, these have looked like:

assign_to_hash_path( $node, "a", "b", "c", 0 )
# sets $node->{a}->{b}->{c} = 0, creating all nodes as necessary
# can follow or create arbitrarily log chains
# the first argument is the base node,
# the last is the value
# any number of intermediate nodes are allowed.

or, more obviously an assignment:

${hash_path_lhs( $node, "a", "b", "c")} = 0
# IIRC this is how I created a left-hand-side
# by returning a ref that I then dereffed.

and for reading (now usually // for simple cases):

my $cvalue = follow_hash_path_undef_if_cannot( $node, "a", "b", "c" );

Since the simple case of reading is now usually //, it is worth mentioning less simple cases, e.g. in a simulator where you are creating (create, zero-fill, or copy-on-read), or possibly tracking stats or modifying state like LRU or history

my $cvalue = lookup( $bpred_top => path_history => $path_hash => undef );    
my $cvalue = lookup( $bpred_top => gshare => hash($pc,$tnt_history) => undef );    

Basically, these libraries are the // operator on steroids, with a wider selection of what to do is the full path does not exist (or even if it does exist, e.g. count stats and cache).

They are slightly more pleasant using the quote operators, e.g.

assign_to_hash_path( $node, qw{a b c}, 0);
${hash_path_lhs( $node, qw{a b c})} = 0;
my $cvalue = follow_hash_path_undef_if_cannot( $node, qw{a b c});

But now that it has sunk into my thick head after many years of using perlobj, I think that fat arrow/commas may make these look much more pleasant:

assign_to_hash_path( $node => a => b => c => 0);
my $cvalue = follow_hash_path( $node => a => b => c => undef );

Unfortunately, the LHS function doesn't improve much because of the need to quote the last element of such a path:

${hash_path_lhs( $node=>a=>b=>"c"} = 0;
${hash_path_lhs( $node=>a=>b=>c=>undef} = 0;

so I would be tempted to give up on LHS, or use some mandatory final argument, like

${hash_path_lhs( $node=>a=>b=>c, Create_As_Needed() ) = 0;
${hash_path_lhs( $node=>a=>b=>c, Die_if_Path_Incomplete() ) = 0;

The LHS code looks ugly, but the other two look pretty good, expecting that the final element of such a chain would either be the value to be assigned, or the default value.

assign_to_hash_path( $node => a => b => c => "value-to-be-assigned");
my $cvalue = follow_hash_path( $node => a => b => c => "default-value" );

Unfortunately, there is no obvious place to hand keyword options - the following does not work because you cannot distinguish optional keywords from args, at either beginning or end:

assign_to_hash_path( $node => a => b => c => 0);
assign_to_hash_path( {warn_if_path_incomplete=>1}, $node => a => b => c => 0);
my $cvalue = follow_hash_path( $node => a => b => c => undef );
my $cvalue = follow_hash_path( $node => a => b => c => undef, {die_if_path_incomplete=>1} );

I have occasionally used a Keyword class, abbreviated KW, so that a type inquiry can tell us which is the keyword, but that is suboptimal - actually, it's not bad, but it is just that Perl has no single BKM (yeah, TMTOWTDI):

assign_to_hash_path( $node => a => b => c => 0);
assign_to_hash_path( KW(warn_if_path_incomplete=>1), $node => a => b => c => 0);
my $cvalue = follow_hash_path( $node => a => b => c => undef );
my $cvalue = follow_hash_path( KW(die_if_path_incomplete=>1), $node => a => b => c => undef );
my $value = follow_hash_path( $node => a => b => c => undef, KW(die_if_path_incomplete=>1) );

Conclusion: Foo(a=>b=>c=>1) seems strange, but might be useful/nice syntactic sugar

So: while I do rather wish that use warnings had warned me about foo(a=>a=>1), when a keyword was duplicated by accident, I think that multiple fat arrow/commas in series might be useful in making some types of code more readable.

Although I haven't seen any real-world examples of this, usually if I can imagine something, a better and more perspicacious Perl programmer has already written it.

And I am considering reworking some of my legacy libraries to use it. In fact, I may not have to rework - the library that I designed to be called as

assign_to_hash_path( $node, "a", "b", "c", 0 )

may already work if invoked as

assign_to_hash_path( $node => a => b=> c => 0 )

Simple Working Example

For grins, an example of a simple path following function, that does a bit more error reporting than is convenient to do with //

$ bash 1278 $>  cat example-Follow_Hashref_Path.pl
use strict;
use warnings;

sub follow_path {
    my $node=shift;
    if( ref $node ne 'HASH' ) {
    print "Error: expected \$node to be a ref HASH,"
      ." instead got ".(
          ref $node eq ''
        ?"scalar $node"
        :"ref ".(ref $node))
      ."\n";
    return;
    }
    my $path=q{node=>};
    my $full_path = $path . join('=>',@_);
    foreach my $field ( @_ ) {
    $path.="->{$field}";
    if( not exists $node->{$field} ) {
        print "stopped at path element $field"
          ."\n    full_path = $full_path"
          ."\n    path so far = $path"
          ."\n";
        return;
    }
    $node = $node->{$field}
    }
}

my $node={a=>{b=>{c=>{}}}};

follow_path($node=>a=>b=>c=>"end");
follow_path($node=>A=>b=>c=>"end");
follow_path($node=>a=>B=>c=>"end");
follow_path($node=>a=>b=>C=>"end");
follow_path({}=>a=>b=>c=>"end");
follow_path(undef=>a=>b=>c=>"end");
follow_path('string-value'=>a=>b=>c=>"end");
follow_path('42'=>a=>b=>c=>"end");
follow_path([]=>a=>b=>c=>"end");

and use:

$ perl example-Follow_Hashref_Path.pl
stopped at path element end
    full_path = node=>a=>b=>c=>end
    path so far = node=>->{a}->{b}->{c}->{end}
stopped at path element A
    full_path = node=>A=>b=>c=>end
    path so far = node=>->{A}
stopped at path element B
    full_path = node=>a=>B=>c=>end
    path so far = node=>->{a}->{B}
stopped at path element C
    full_path = node=>a=>b=>C=>end
    path so far = node=>->{a}->{b}->{C}
stopped at path element a
    full_path = node=>a=>b=>c=>end
    path so far = node=>->{a}
Error: expected $node to be a ref HASH, instead got scalar undef
Error: expected $node to be a ref HASH, instead got scalar string-value
Error: expected $node to be a ref HASH, instead got scalar 42
Error: expected $node to be a ref HASH, instead got ref ARRAY
✓
$

Another Example ($node->{a}->{B}->{c}//"premature end")

$ bash 1291 $>  perl -e 'use warnings;my $node={a=>{b=>{c=>"end"}}}; print "followed path to the ".($node->{a}->{B}->{c}//"premature end")."\n"'
followed path to the premature end
$ bash 1292 $>  perl -e 'use warnings;my $node={a=>{b=>{c=>"end"}}}; print "followed path to the ".($node->{a}->{b}->{c}//"premature end")."\n"'
followed path to the end

I admit that I have trouble keeping the binding strength of // in my head.

Finally

By the way, if anyone has examples of idioms using // and -> that avoid the need to create library functions, especially for writes, I'd love to hear of them.

It's good to be able to create libraries to make stuff easier or more pleasant.

It is also good not to need to do so - as in ($node->{a}->{B}->{c}//"default").