user2284570 user2284570 - 6 months ago 14
Python Question

How to modify a compressed itxt record of an existing file in python?

I know this looks too simple but I couldn’t find a straight forward solution.

Once saved, the itxt should be compressed again.

Answer

It's not so simple as you eyeballed it. If it were, you might have found out there is no straightforward solution.

Let's start with the basics.

Can PyPNG read all chunks?

An important question, because modifying an existing PNG file is a large task. Reading its documentation, it doesn't start out well:

PNG: Chunk by Chunk

Ancillary Chunks

.. iTXt
Ignored when reading. Not generated.

(https://pythonhosted.org/pypng/chunk.html)

But lower on that page, salvation!

Non-standard Chunks
Generally it is not possible to generate PNG images with any other chunk types. When reading a PNG image, processing it using the chunk interface, png.Reader.chunks, will allow any chunk to be processed (by user code).

So all I have to do is write this 'user code', and PyPNG can do the rest. (Oof.)

What about the iTXt chunk?

Let's take a peek at what you are interested in.

4.2.3.3. iTXt International textual data

.. the textual data is in the UTF-8 encoding of the Unicode character set instead of Latin-1. This chunk contains:

Keyword:             1-79 bytes (character string)
Null separator:      1 byte
Compression flag:    1 byte
Compression method:  1 byte
Language tag:        0 or more bytes (character string)
Null separator:      1 byte
Translated keyword:  0 or more bytes
Null separator:      1 byte
Text:                0 or more bytes

(http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.iTXt)

Looks clear to me. The optional compression ought not be a problem, since

.. [t]he only value presently defined for the compression method byte is 0, meaning zlib ..

and I am pretty confident there is something existing for Python that can do this for me.

Back to PyPNG's chunk handling then.

Can we see the chunk data?

PyPNG offers an iterator, so indeed checking if a PNG contains an iTXt chunk is easy:

chunks()
Return an iterator that will yield each chunk as a (chunktype, content) pair.

(https://pythonhosted.org/pypng/png.html?#png.Reader.chunks)

So let's write some code in interactive mode and check. I got a sample image from http://pmt.sourceforge.net/itxt/, repeated here for convenience. (If the iTXt data is not conserved here, download and use the original.)

itxt sample image

>>> import png
>>> imageFile = png.Reader("itxt.png")
>>> print imageFile
<png.Reader instance at 0x10ae1cfc8>
>>> for c in imageFile.chunks():
...   print c[0],len(c[1])
... 
IHDR 13
gAMA 4
sBIT 4
pCAL 44
tIME 7
bKGD 6
pHYs 9
tEXt 9
iTXt 39
IDAT 4000
IDAT 831
zTXt 202
iTXt 111
IEND 0

Success!

What about writing back? Well, PyPNG is usually used to create complete images, but fortunately it also offers a method to explicitly create one from custom chunks:

png.write_chunks(out, chunks)
Create a PNG file by writing out the chunks.

So we can iterate over the chunks, change the one(s) you want, and write back the modified PNG.

Unpacking and packing iTXt data

This is a task in itself. The data format is well described, but not suitable for Python's native unpack and pack methods. So we have to invent something ourself.

The text strings are stored in ASCIIZ format: a string ending with a zero byte. We need a small function to split on the first 0:

def cutASCIIZ(str):
   end = str.find(chr(0))
   if end >= 0:
      result = str[:end]
      return [str[:end],str[end+1:]]
   return ['',str]

This quick-and-dirty function returns an array of a [before, after] pair, and discards the zero itself.

To handle the iTXt data as transparently as possible, I make it a class:

class Chunk_iTXt:
  def __init__(self, chunk_data):
    tmp = cutASCIIZ(chunk_data)
    self.keyword = tmp[0]
    if len(tmp[1]):
      self.compressed = ord(tmp[1][0])
    else:
      self.compressed = 0
    if len(tmp[1]) > 1:
      self.compressionMethod = ord(tmp[1][1])
    else:
      self.compressionMethod = 0
    tmp = tmp[1][2:]
    tmp = cutASCIIZ(tmp)
    self.languageTag = tmp[0]
    tmp = tmp[1]
    tmp = cutASCIIZ(tmp)
    self.languageTagTrans = tmp[0]
    if self.compressed:
      if self.compressionMethod != 0:
        raise TypeError("Unknown compression method")
      self.text = zlib.decompress(tmp[1])
    else:
      self.text = tmp[1]

  def pack (self):
    result = self.keyword+chr(0)
    result += chr(self.compressed)
    result += chr(self.compressionMethod)
    result += self.languageTag+chr(0)
    result += self.languageTagTrans+chr(0)
    if self.compressed:
      if self.compressionMethod != 0:
        raise TypeError("Unknown compression method")
      result += zlib.compress(self.text)
    else:
      result += self.text
    return result

  def show (self):
    print 'iTXt chunk contents:'
    print '  keyword: "'+self.keyword+'"'
    print '  compressed: '+str(self.compressed)
    print '  compression method: '+str(self.compressionMethod)
    print '  language: "'+self.languageTag+'"'
    print '  tag translation: "'+self.languageTagTrans+'"'
    print '  text: "'+self.text+'"'

Since this uses zlib, it requires an import zlib at the top of your program.

The class constructor accepts 'too short' strings, in which case it will use defaults for everything undefined.

The show method lists the data for debugging purposes.

Using my custom class

With all of this, now examining, modifying, and adding iTXt chunks finally is straightforward:

import png
import zlib

# insert helper and class here

sourceImage = png.Reader("itxt.png")
chunkList = []
for chunk in sourceImage.chunks():
  if chunk[0] == 'iTXt':
    itxt = Chunk_iTXt(chunk[1])
    itxt.show()
    # modify existing data
    if itxt.keyword == 'Author':
      itxt.text = 'Rad Lexus'
      itxt.compressed = 1
    chunk = [chunk[0], itxt.pack()]
  chunkList.append (chunk)

# append new data
newData = Chunk_iTXt('')
newData.keyword = 'Custom'
newData.languageTag = 'nl'
newData.languageTagTrans = 'Aangepast'
newData.text = 'Dat was leuk.'
chunkList.insert (-1, ['iTXt', newData.pack()])

with open("foo.png", "wb") as file:
  png.write_chunks(file, chunkList)

When adding a totally new chunk, be careful not to append it, because then it will appear after the required last IEND chunk, which is an error. I did not try but you should also probably not insert it before the required first IHDR chunk or (as commented by Glenn Randers-Pehrson) in between consecutive IDAT chunks.

Note that according to the specifications, all texts in iTXt should be UTF8 encoded.

Comments