Valter Henrique Valter Henrique - 1 month ago 9
Python Question

How to reset values over a dictionary in Python?

I have a

YAML
file which can't be committed to my repository because contains password and sensible information. It looks like this:

devops:
branch: somebranch

password:
provider:
digital_ocean:
token:
""
aws:
bob:
access_key_id:
"XXX"
secret_access_key:
"XXX"
jim:
access_key_id:
"XXX"
secret_access_key:
"XXX"
dev:
bob:
"secret"
jim:
"another secret"
app:
mom:
zookeeper:
"XXX"
admin:
"XXX"


I'm trying to develop a script in
Python
were I can clear all the passwords in this my file. And this I can commit into my repository, it should look like this after being processed:

devops:
branch: somebranch

password:
provider:
digital_ocean:
token:
""
aws:
bob:
access_key_id:
""
secret_access_key:
""
jim:
access_key_id:
""
secret_access_key:
""
dev:
bob:
""
jim:
""
app:
mom:
zookeeper:
""
admin:
""


I know that is possible to set values into dictionaries like this:

import os
import yaml

with open(os.environ['DEVOPS_HOME'] + "/vagrant/server/settings.yml") as f:
settings = yaml.load(f)

for key in settings.keys():
settings[key]=0


However, this is my output:

{'password': 0, 'devops': 0}


Is there a way to iterate over my dictionary and set the values for my passwords only? Or should I change the structure of my YAML file?

Answer

The keys for which the values need "wiping" don't seem to have a regularity in naming, but the one thing that they seem to have in common is that they are all the leaf-values under key password. That makes recursion an option to wipe them all without specifying each full key path ¹:

import sys
import ruamel.yaml

yaml_str = """\
nas:
devops:
  branch: somebranch

password:
  provider:
    digital_ocean:
      token:
        ""
    aws:
      bob:
        access_key_id:
          "XXX"
        secret_access_key:
          "XXX"
      jim:
        access_key_id:
          "XXX"
        secret_access_key:
          "XXX"
  dev:
    bob:
      "secret"
    jim:
      "another secret"
  app:
    mom:
      zookeeper:
        "XXX"
      admin:
        "XXX"
"""

def wipe_pass(data, key):
    """wipe the value if it is a string instance"""
    if isinstance(data[key], type("")):
        data[key] = ruamel.yaml.scalarstring.DoubleQuotedScalarString("")
        return
    if isinstance(data[key], dict):
        for k in data[key]:
            clear_pass(data[key], k)
        return
    raise NotImplementedError   # e.g. a YAML sequence

data = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
wipe_pass(data, 'password')
ruamel.yaml.round_trip_dump(data, sys.stdout)

which gives:

devops:
  branch: somebranch

password:
  provider:
    digital_ocean:
      token: ""
    aws:
      bob:
        access_key_id: ""
        secret_access_key: ""
      jim:
        access_key_id: ""
        secret_access_key: ""
  dev:
    bob: ""
    jim: ""
  app:
    mom:
      zookeeper: ""
      admin: ""

please note that your original YAML has inconsistent formatting of key-value pairs, where the value is not a mapping. Here the output is consistent with your original branch: somebranch pair.

The ruamel.yaml.scalarstring.DoubleQuotedScalarString("") is necessary to get "" as output. If you just assign "", you'll get the default '' single quotes for the empty string in your YAML file.

You can try to do the above with PyYAML, but you will lose any comments, have non-guaranteed key ordering, lose the empty line before password: and probably more. That makes it largely unusable for round-tripping data (load, modify, dump) that has to have minimal changes between commits.


¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author.