Teysz Teysz - 2 months ago 8
Python Question

How to input Polish characters into CMD as a Python parameter?

I just started to learn coding in Python and I have a simple Python program that returns

Cześć <input>
where
<input>
is the name that a user can input into CMD as a parameter for this Python program. If no input is given it'll return
Cześć Świat
. It works fine, but when I for instance input the name
Łukasz
it strips the strike from the
Ł
and the program returns
Cześć Lukasz
instead of the correct
Cześć Łukasz
.

In Windows CMD I used the CD command to go to the folder containing the Python program and there I execute the Python program by using the statement:
hello.py Łukasz
.

My script looks like this (it is originally from Google's Python exercises (source) and I edited it to make it work for unicode characters with Python version 2.7 and also replaced 'hello' with 'cześć' for instance):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

# Define a main() function that prints a little greeting.
def main():
# Get the name from the command line, using 'World' as a fallback.
if len(sys.argv) >= 2:
name = sys.argv[1].decode('cp1252')
else:
name = u'Świat'
str = u'Cześć '+name
print str.encode('utf-8')

# This is the standard boilerplate that calls the main() function.
if __name__ == '__main__':
main()


Originally I decoded the
sys.argv[1]
with
utf-8
, but somehow when I used the letter
Óó
it would throw an ugly exception (see this SO answer). Using either
utf-8
or
cp1252
results in the Polish letters (e.g. ĄĆĘŁŃŚŻŹ) getting stripped of their accents, with the exception of the letter
Óó
which seems to keep their accent when using
cp1252
, because using that letter with
utf-8
caused the previously mentioned exception.

So my question is, how do I retrieve the string intact with the accents from CMD to use in my Python program?

I won't accept answers that suggest to remove/ignore the accents!

vz0 vz0
Answer

This is a known limitation of Python 2 in Windows. sys.argv does not accept Unicode and characters are truncated to the standard ANSI character page. Upgrading to Python 3 will solve your issue.

Comments