Gregory Nisbet Gregory Nisbet - 4 months ago 10
Python Question

Mypy Python 2 insist on unicode value not string value

Python 2 will implicitly convert

str
to
unicode
in some circumstances. This conversion will sometimes throw a
UnicodeError
depending on what you try to do with the resulting value. I don't know the exact semantics, but it's something I'd like to avoid.

Is it possible to use another type besides
unicode
or a command-line argument similar to
--strict-optional
(http://mypy-lang.blogspot.co.uk/2016/07/mypy-043-released.html) to cause programs using this implicit conversion to fail to type check?

def returns_string_not_unicode():
# type: () -> str
return u"a"

def returns_unicode_not_string():
# type: () -> unicode
return "a"


In this example, only the function
returns_string_not_unicode
fails to type check.

$ mypy --py2 unicode.py
unicode.py: note: In function "returns_string_not_unicode":
unicode.py:3: error: Incompatible return value type (got "unicode", expected "str")


I would like both of them to fail to typecheck.

EDIT:

type: () -> byte
seems to be treated the same way as
str


def returns_string_not_unicode():
# type: () -> bytes
return u"a"

Answer

This is, unfortunately, an ongoing and currently unresolved issue -- see https://github.com/python/mypy/issues/1141 and https://github.com/python/typing/issues/208.

A partial fix is to use typing.Text which is (unfortunately) currently undocumented (I'll work on fixing that though). It's aliased to str in Python 3 and to unicode in Python 2. It won't resolve your actual issue or cause the second function to fail to typecheck, but it does make it a bit easier to write types compatible with both Python 2 and Python 3.

In the meantime, you can hack together a partial workaround by using the recently-implemented NewType feature -- it lets you define a psuedo-subclass with minimal runtime cost, which you can use to approximate the functionality you're looking for:

from typing import NewType, Text

# Tell mypy to treat 'Unicode' as a subtype of `Text`, which is
# aliased to 'unicode' in Python 2 and 'str' (aka unicode) in Python 3
Unicode = NewType('Unicode', Text)

def unicode_not_str(a: Unicode) -> Unicode:
    return a

# my_unicode is still the original string at runtime, but Mypy
# treats it as having a distinct type from `str` and `unicode`.
my_unicode = Unicode(u"some string")

unicode_not_str(my_unicode)      # typechecks
unicode_not_str("foo")           # fails
unicode_not_str(u"foo")          # fails, unfortunately
unicode_not_str(Unicode("bar"))  # works, unfortunately

It's not perfect, but if you're principled about when you elevate a string into being treated as being of your custom Unicode type, you can get something approximating the type safety you're looking for with minimal runtime cost until the bytes/str/unicode issue is settled.

Note that you'll need to install mypy from the master branch on Github to use NewType.