44

Is there a way to globally suppress the unicode string indicator in python? I'm working exclusively with unicode in an application, and do a lot of interactive stuff. Having the u'prefix' show up in all of my debug output is unnecessary and obnoxious. Can it be turned off?

11 답변


40

You could use Python 3.0.. The default string type is unicode, so the u'' prefix is no longer required..

In short, no. You cannot turn this off.

The u comes from the unicode.__repr__ method, which is used to display stuff in REPL:

>>> print repr(unicode('a'))
u'a'
>>> unicode('a')
u'a'

If I'm not mistaken, you cannot override this without recompiling Python.

The simplest way around this is to simply print the string..

>>> print unicode('a')
a

If you use the unicode() builtin to construct all your strings, you could do something like..

>>> class unicode(unicode):
...     def __repr__(self):
...             return __builtins__.unicode.__repr__(self).lstrip("u")
... 
>>> unicode('a')
a

..but don't do that, it's horrible


  • This is as good a solution as any. The real answer is to suck it up! - Ryan
  • +1 for me learning python 3 strings are all unicode by default - notbad.jpeg

26

I had a case where I needed drop the u prefix because I was setting up some javascript with python as part of an html template. A simple output left the u prefix in for the dict keys e.g.

var turns = [{u'armies':2...];

which breaks javascript.

In order to get the output javascript needed, I used the json python module to encode the string for me:

turns = json.dumps(turns)

This does the trick in my particular case and as the keys are all ascii there is no worry about the encoding. You could probably use this trick for your debug output.


  • Brilliant, json.dumps() is recursive like repr(). One caveat, dictionary keys are converted from int to str. assert '{"3": 5}' == json.dumps({3:5}) (Because JavaScript object property identifiers are all strings.) - Bob Stein

7

using str( text ) is a somewhat bad idea in fact whenever you cannot be 100% sure about both your python's default encoding and the exact content of the string---the latter would be typical for a text fetched from the internet. also, depending on what you want to do, using print text.encode( 'utf-8' ) or print repr( text.encode( 'utf-8' ) ) may yield disappointing results, as you might get a rendering full of unreadable codepoints like \x3a.

i think the optimum is really to avail yourself of a unicode-capable command line (difficult under windows, easy under linux) and switch from python 2.x to python 3.x. the ease and clarity of text vs bytes handling afforded by the new python 3 series is really one of the big gains you can expect. it does mean you'll have to spend a little time learning the distinction between 'bytes' and 'text' and grasp the concept of character encodings, but then that time is much better spent in a python 3 environment as python's new approch to these vexing problems is much clearer and much less error-prone than what python 2 had to offer. i'd go so far as to call python 2's approach to unicode problematic in retrospect, although i used to think of it as superior---when i compared it to the way this issue is handled in php.

edit i just stopped by a related discussion here on SO and found this comment on the way that php these days appears to tackle unicode / encoding issues:

It's like a mouse trying to eat an elephant. By framing Unicode as an extension of ASCII (we have normal strings and we have mb_strings) it gets things the wrong way around, and gets hung up on what special cases are required to deal with characters with funny squiggles that need more than one byte. If you treat Unicode as providing an abstract space for any character you need, ASCII is accommodated in that without any need to treat it as a special case.

i quote this here because in my experience 90% of all SO python+unicode topics seem to come from people who used to be fine with ascii or maybe latin-1, got bitten by the occasional character that was not supported in their usual settings, and then basically just want to get rid of it. what you do when switching to python 3 is exactly what the commenter above suggests to do: instead of viewing unicode as a vexing extension of ascii, you start to view ascii (and almost any other encoding you'll ever meet) as subset(s) of unicode.

to be true, unicode v6 is certainly not the last word in encodings, but it is as close to being universal as you can get in 2011. get used to it.


7

from __future__ import unicode_literals

is available since Python 2.6 (released on October 1, 2008). It is default in Python 3.

It allows to omit u'' prefix in the source code though it does not change repr(unicode_string) that would be misleading.

You could override sys.displayhook() in a Python REPL, to display objects however your like. You could also override __repr__ for your own custom objects.


  • from future import unicode_literals,doesn't work in python 2.7 inpython console - Alex Luya
  • @AlexLuya it does work (yes, I did try it, to make sure). Try type("") in a fresh Python REPL, you should see <type 'str'>. Then run: from __future__ import unicode_literals and repeat type(""). Now you should see <type 'unicode'>. What is your environment (OS, python version)? - jfs
  • Ubuntu+ipython+python 2.7 - Alex Luya
  • type("") got "unicode",but "from nltk.corpus import stopwords",then "print stopwords.words("english")",'u'' is till prefixed - Alex Luya
  • @AlexLuya it works for me and the docs say explicitly that it should work: "A future statement typed at an interactive interpreter prompt will take effect for the rest of the interpreter session." - jfs

4

I know this isn't a global option, but you can also suppress the Unicode u by placing the string in a str() function.

So a Unicode derived list that would look like:

>>> myList=[unicode('a'),unicode('b'),unicode('c')]
>>> myList
[u'a', u'b', u'c']

would become this:

>>> myList=[str(unicode('a')),str(unicode('b')),str(unicode('c'))]
>>> myList
['a', 'b', 'c']

It's a bit cumbersome, but might be useful to some one


  • Oh Man! Thanks very much for this 'in-array' string creation. - itsricky
  • I am trying to handball this data over to PHP for parsing, and it was soooo confusing trying to handle this conversion in PHP. I lost 2 years of my life im sure. I had been using 'stories.append(word)', but I changed this to your magical 'stories.append(str(unicode(word)))' and its all sorted. Brilliant @electrice! - itsricky
  • uh, absolutely **do not do this**—it will crash with non-ASCII data and defeats the entire purpose of using unicode in the first place. if you're relying on not having the u somewhere, you are doing something terrible wrong. @itsricky, you probably want to be encoding to JSON, not trying to parse Python reprs in PHP! - Eevee
  • I have reached my UPVOTE LIMIT but I had to thank you Electrice!!! - Mona Jalal
  • Works, great!!! - gsamaras

4

Just in case you are getting something like this u['hello'] then you must be printing an array. print str(arr[0]) and you are good to go.


3

Not sure with unicode, but generally you can call str.encode() to convert it to a more suitable form. For instance, subprocess output captured in Python 3.0+ captures it as a byte stream (prefix 'b'), and encode() fixes to a regular string form.


3

What seems to be working for me:

import ast
import json
j = json.loads('{"one" : "two"}')
j
dd = {u'one': u'two'}
dd
# to get double quotes
json.dumps(j,  encoding='ascii')
json.dumps(dd, encoding='ascii')
# to get single quotes
str(ast.literal_eval(json.dumps(j,  encoding='ascii')))
str(ast.literal_eval(json.dumps(dd, encoding='ascii')))

Output:

>>> {u'one': u'two'}
>>> {u'one': u'two'}
>>> '{"one": "two"}'
>>> '{"one": "two"}'
>>> "{'one': 'two'}"
>>> "{'one': 'two'}"

Above works for dictionaries and JSON objects, as self-evident.

For just a string, wrapping in str() seems to work for me.

s=u'test string'
s
str(s)

Output:

>>> u'test string'
>>> 'test string'

Python version: 2.7.12


1

Try the following

print str(result.url)

It could be that your default encoding has been changed.

You can check your default encoding with the following:-

> import sys
> print sys.getdefaultencoding()
> ascii

The default should be ascii which means u'string' should be printed as 'string' but yours may have been modified.


  • perfect! Thanks. - Nik
  • @Nik: If this answered your question, please mark it as the right answer. - XORcist

1

You have to use print str(your_Variable)


1

In the case that you do not want to update to Python 3, you could make use of substrings. For example, say the original output was (u'mystring',). Let us assume for the sake of the example that the variable row contains the "mystring" string without the unicode prefix. Then you would want to do something like this:

temp = str(row); #str is not necessary, but probably good practice
temp = temp[:-3];
print = temp[3:];

Linked


Related

Latest