Zack Zack - 4 months ago 10x
Perl Question

How do I delete a [sub]hash based off of the keys/values of another hash?

Lets assume I have two hashes. One of them contains a set of data that only needs to keep things that show up in the other hash.


my %hash1 = (
test1 => { inner1 => { more => "alpha", evenmore => "beta" } },
test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },
test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } }

my %hash2 = (
major=> { test2 => "inner2",
test3 => "inner3" } );

What I would like to do, is to delete the whole subhash in hash1 if it does not exist as a key/value in hash2{major}, preferably without modules. The information contained in "innerX" does not matter, it merely must be left alone (unless the subhash is to be deleted then it can go away).

In the example above after this operation is preformed hash1 would look like:

my %hash1 = (
test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },

It deletes hash1{test1} and hash1{test3} because they don't match anything in hash2.

Here's what I've currently tried, but it doesn't work. Nor is it probably the safest thing to do since I'm looping over the hash while trying to delete from it. However I'm deleting at the each which should be okay?

This was my attempt at doing this, however perl complains about:

Can't use string ("inner1") as a HASH ref while "strict refs" in use at

while(my ($test, $inner) = each %hash1)
if(exists $hash2{major}{$test}{$inner})
print "$test($inner) is in exists.\n";
print "Looks like $test($inner) does not exist, REMOVING.\n";
#not to sure if $inner is needed to remove the whole entry
delete ($hash1{$test}{$inner});


You were close. Remember that $hash2{major}{$test} is a scalar, not a hash reference.

#! /usr/bin/perl

use strict;
use warnings;

my %hash1 = ( 
  test1 => { inner1 => { more => "alpha", evenmore => "beta" } },
  test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },
  test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } }

my %hash2 = (
  major => { test2 => "inner2",
             test3 => "inner3" }

foreach my $k (keys %hash1) {
  my $delete = 1;
  foreach my $inner (keys %{ $hash1{$k} }) {
    $delete = 0, last if exists $hash2{major}{$k} &&
                                $hash2{major}{$k} eq $inner;
  delete $hash1{$k} if $delete;

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%hash1;

The line beginning with $delete = 0, ... is a bit cutesy. It's equivalent to $delete = 0; last; within another conditional, but it was already nested twice. Not wanting to build a matryoshka doll, I used a statement modifier, but as the name suggests, it modifies a single statement.

That's where Perl's comma operator comes in:

Binary , is the comma operator. In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator.

In this case, the left argument is the expression $delete = 0, and the right argument is last.

The conditional might seem needlessly fussy, but

... if $hash2{major}{$k} eq $inner;

produces undefined-value warnings when probing for tests not mentioned in %hash2 (test1/inner1, for example). Using

.. if $hash2{major}{$k} && $hash2{major}{$k} eq $inner;

would incorrectly delete a test mentioned in %hash2 if its "inner name" were a false value such as the string "0". Yes, using exists here may be needlessly fussy, but not knowing your actual hash keys, I chose the conservative route.


$VAR1 = {
  'test2' => {
    'inner2' => {
      'somethingelse' => 'delta',
      'more' => 'charlie'

Although you don't violate it, be aware of the following caveat related to using each:

If you add or delete elements of a hash while you're iterating over it, you may get entries skipped or duplicated, so don't. Exception: It is always safe to delete the item most recently returned by each, which means that the following code will work:

    while (($key, $value) = each %hash) {
      print $key, "\n";
      delete $hash{$key};   # This is safe

Update: Searching hashes as though they were arrays (impress your CS nerd friends by saying “… linearly rather than logarithmically”) is a red flag, and the code above does just that. A better approach, which turns out to be similar to Penfold's answer, is

%hash1 = map +($_ => $hash1{$_}),
         grep exists $hash2{major}{$_} &&
              exists $hash1{$_}{ $hash2{major}{$_} },
         keys %hash1;

In nice declarative style, it describes the desired contents of %hash1, namely

  1. first-level keys of %hash1 should be mentioned in $hash2{major}, and
  2. the value in $hash2{major} corresponding to each first-level key should itself be a subkey of that key back in %hash1

(Wow, dizzying. We need multiple placeholder variables in English!)

The unary plus in +($_ => $hash1{$_}) disambiguates for the poor parser so it knows we want the expression treated as a “pair.” See the end of the perlfunc documentation on map for other cases when this may be necessary.