Gregory Nisbet Gregory Nisbet - 6 months ago 20
Python Question

Mypy Python 2 insist on unicode value not string value

Python 2 will implicitly convert

in some circumstances. This conversion will sometimes throw a
depending on what you try to do with the resulting value. I don't know the exact semantics, but it's something I'd like to avoid.

Is it possible to use another type besides
or a command-line argument similar to
( to cause programs using this implicit conversion to fail to type check?

def returns_string_not_unicode():
# type: () -> str
return u"a"

def returns_unicode_not_string():
# type: () -> unicode
return "a"

In this example, only the function
fails to type check.

$ mypy --py2 note: In function "returns_string_not_unicode": error: Incompatible return value type (got "unicode", expected "str")

I would like both of them to fail to typecheck.


type: () -> byte
seems to be treated the same way as

def returns_string_not_unicode():
# type: () -> bytes
return u"a"


This is, unfortunately, an ongoing and currently unresolved issue -- see and

A partial fix is to use typing.Text which is (unfortunately) currently undocumented (I'll work on fixing that though). It's aliased to str in Python 3 and to unicode in Python 2. It won't resolve your actual issue or cause the second function to fail to typecheck, but it does make it a bit easier to write types compatible with both Python 2 and Python 3.

In the meantime, you can hack together a partial workaround by using the recently-implemented NewType feature -- it lets you define a psuedo-subclass with minimal runtime cost, which you can use to approximate the functionality you're looking for:

from typing import NewType, Text

# Tell mypy to treat 'Unicode' as a subtype of `Text`, which is
# aliased to 'unicode' in Python 2 and 'str' (aka unicode) in Python 3
Unicode = NewType('Unicode', Text)

def unicode_not_str(a: Unicode) -> Unicode:
    return a

# my_unicode is still the original string at runtime, but Mypy
# treats it as having a distinct type from `str` and `unicode`.
my_unicode = Unicode(u"some string")

unicode_not_str(my_unicode)      # typechecks
unicode_not_str("foo")           # fails
unicode_not_str(u"foo")          # fails, unfortunately
unicode_not_str(Unicode("bar"))  # works, unfortunately

It's not perfect, but if you're principled about when you elevate a string into being treated as being of your custom Unicode type, you can get something approximating the type safety you're looking for with minimal runtime cost until the bytes/str/unicode issue is settled.

Note that you'll need to install mypy from the master branch on Github to use NewType.