[Product-Developers] Re: Re: Re: UnicodeDecodeError

Andreas Jung lists at zopyx.com
Tue Mar 25 13:51:47 UTC 2008

--On 25. März 2008 10:06:18 -0300 Derek Broughton <news at pointerstop.ca> 

> Andreas Jung wrote:
>> --On 24. März 2008 17:19:48 -0300 Derek Broughton
>> <news at pointerstop.ca> wrote:
>>>> What are you talking about? Python has nothing like a 'unicode'
>>>> default. Likely you're referring to sys.getdefaultencoding() which is
>>>> ascii by default.
>>> Why do you always have to be so confrontational?
>> Because the part of content of your postings are wrong
> Which part?  There's no need to be insulting - it's OBVIOUS that none of
> us know the right way to do this, and your best response is to tell us
> that.  Well thanks for nothing.

You should read my reply carefully. I told you what is considered  best 
practice when dealing with unicode in any kind of Python application -
not only Zope, not only Plone.

>> and the implicit
>> hint for changing the default encoding of Python is the wrong way and
>> will
> What do you mean "implicit"! I SAID I've changed the default BUT THERE
> clearer can I be?

How clearer can I be: don't touch the encoding - never ever. And it is 
perfectly ok for Plone not to touch the default encoding. Implicit
charset conversions are evil...as I wrote several times.

>> lead of other problems (see below). I am so confrontational because I
>> want people
>> to write and design clean code when it comes to unicode-awareness.
>> Unicode-awareness is a big problem within Python-based applications...but
>> you can get around doing it the right way.
>>> That is, of course, what I'm talking about - and I know perfectly well
>>> that it's NOT a unicode default.  If you read the posts, that would be
>>> clear.
>> You're mixing mixing unicode encodings with the Python 'unicode' type.
> No, I'm not.

You do.

>>>> General rule #1: don't touch that. Rule #2: if you have the need
>>>> to touch the default encoding as a workaround: better fix your code
>>>> first.
>>> Funny, but it's not _my_ code that runs into problems - in fact it's
>>> some of yours.  SQLAlchemyDA won't read non-ascii data off my UTF-8
>>> postgres or Oracle databases.
>> SQLAlchemy has options do deal with different encoding. The Oracle client
>> libraries also provide environment variables for controlling the client
>> encoding (e.g. for an implicit conversion of the server side database
>> encoding into some client side encoding).
>>> I _have_ non-ascii strings in my data, I'm not going to change that.
>> We all have that. There is no problem building a clean application
>> dealing with various encodings at a time in a sane way
> Then why do I get an error any time I read data from a UTF-8 database
> (either Oracle or Postgres).  Try to be at least a little bit helpful.
>>> It
>>> seems to me that the only way to make Plone work with it is to set my
>>> default encoding to unicode.
>> Wrong again. You set an *encoding*. An *encoding* like 'utf-8' is NOT
>> unicode.
> There you go again.  From wiki:utf-8, "UTF-8 (8-bit UCS/Unicode
> Transformation Format) is a variable-length character encoding for
> Unicode."  I'm NOT talking about unicode strings.  I'm talking about ANY
> attempt to get data out of a Unicode-encoded database with SqlAlchemy.

create_engine() has options for dealing with unicode. If necessary you
convert your data on the application. If you would include an additional 
database using a different encoding into your Plone application, setting 
the default encoding would not help you because you suddenly have to deal
with utf-8 encoded string and e.g. latin-1 encoded string -> disaster.

>>> It's worked so far, and it _may_ be
>>> hackish,
>> Yes, it is hackish. And if you write code that depends on a particular

Even if it is not your code, it's your application and you have the 
code..so it's basically your code and you have the chance to fix it.

>>   As I said, it seems logical that there's a reason why it's not
>>> Unicode by default, but you're not helping any by just saying we should
>>> "fix our code".
>> Well, the code is yours :-)
> IT'S STILL NOT MY CODE. And don't think that putting a smiley on a
> statement makes it less insulting.

Calm down. If you still don't get the point that fiddling around with the
default encoding is the wrong way....no my problem..I hope you got at least 
the point.

>> And because the approaches for building clean applications in such a case
>> are well known:
>>  - represent your data *internally* as unicode strings
>>    (*NOT* as utf-X encoded byte strings)
> That's not an option.  I don't control the SQL databases.

You control the client side and have enough options.

>>  - do all the processing internally on top of Python unicode strings
>>  - convert all incoming string data from your input encoding to unicode
>>  - convert all outgoing data from unicode some your output encoding
> Also not an option - it's not my code.

You have the application and you can adjust the layers if necessary. If 
changing the encoding solves your issue, then change to what ever you want. 
But is not a general solution for any kind of clean Plone application or 
Plone deployment.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.plone.org/pipermail/plone-product-developers/attachments/20080325/298d0e54/attachment.asc>

More information about the Product-Developers mailing list