[Product-Developers] Re: Re: UnicodeDecodeError
lists at zopyx.com
Tue Mar 25 05:38:59 UTC 2008
--On 24. März 2008 17:19:48 -0300 Derek Broughton <news at pointerstop.ca>
>> What are you talking about? Python has nothing like a 'unicode' default.
>> Likely you're referring to sys.getdefaultencoding() which is ascii by
> Why do you always have to be so confrontational?
Because the part of content of your postings are wrong and the implicit
hint for changing the default encoding of Python is the wrong way and will
lead of other problems (see below). I am so confrontational because I want
to write and design clean code when it comes to unicode-awareness.
Unicode-awareness is a big problem within Python-based applications...but
you can get around doing it the right way.
> That is, of course, what I'm talking about - and I know perfectly well
> that it's NOT a unicode default. If you read the posts, that would be
You're mixing mixing unicode encodings with the Python 'unicode' type.
There is no way for making strings by default a unicode string.
>> General rule #1: don't touch that. Rule #2: if you have the need
>> to touch the default encoding as a workaround: better fix your code
> Funny, but it's not _my_ code that runs into problems - in fact it's some
> of yours. SQLAlchemyDA won't read non-ascii data off my UTF-8 postgres or
> Oracle databases.
SQLAlchemy has options do deal with different encoding. The Oracle client
libraries also provide environment variables for controlling the client
encoding (e.g. for an implicit conversion of the server side database
encoding into some client side encoding).
> I _have_ non-ascii strings in my data, I'm not going to change that.
We all have that. There is no problem building a clean application dealing
with various encodings at a time in a sane way
> seems to me that the only way to make Plone work with it is to set my
> default encoding to unicode.
Wrong again. You set an *encoding*. An *encoding* like 'utf-8' is NOT
unicode. Don't mix the 'unicode' type of Python with the _various_ available
encoding for Unicode like UTF8, UTF16, UTF32 or the internal encodings
UCS-2 or UCS-4. All these encodings can represent a Python unicode string
as a sequence of bytes but an somehow encoded-byte-string is *not* a
unicode string. So please be precise about the things you're talking about.
> It's worked so far, and it _may_ be
Yes, it is hackish. And if you write code that depends on a particular
default encoding by Python then this code is broken by design. It will
likely break on an installation with a different default encoding (a while
ago it took me a day or two to figure out such an error condition where a
co-worker used to set the default encoding to latin1 or utf-8...causing
several failures in my own sandbox).
As I said, it seems logical that there's a reason why it's not
> Unicode by default, but you're not helping any by just saying we should
> "fix our code".
Well, the code is yours :-)
And because the approaches for building clean applications in such a case
are well known:
- represent your data *internally* as unicode strings
(*NOT* as utf-X encoded byte strings)
- do all the processing internally on top of Python unicode strings
- convert all incoming string data from your input encoding to unicode
- convert all outgoing data from unicode some your output encoding
If you want to look as such a clean application, look at TXNG3. TXNG3 uses
unicode internally and enforces that all data is being converted to unicode.
It never deals internalls with standard Python strings but only with Python
unicode strings. Once again: never mix up some encoded byte-strings with
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 194 bytes
Desc: not available
More information about the Product-Developers