Comments (24)
Automation removed owner |
---|
from twisted.
@glyph commented |
---|
#!html
<pre>
+0. UTF-8 is an 8-bit encoding, which means it's not just ASCII. You
can encode NULs and so forth. Some backends expect a different
encoding and a different translation to unicode (pre-existing
authentication databases going against Oracle with JIS japanese
encoding, off the top of my head).
It's doable, and for 99% of the cases out there it won't make a
difference to client code, but there are still other considerations.
What if the Avatar ID is some encoding of an integer, and not a string?
I'd like to avoid doing this until somebody who really knows unicode
can tell us how.
</pre>
from twisted.
@radix commented |
---|
#!html
<pre>
How about the policy "it's up to the cred-checker [or
whichever bit is relevant] to accept unicode or not"? That's
basically how it is now, afaict?
</pre>
from twisted.
@radix commented |
---|
#!html
<pre>
well? this is "urgent", has the fix been decided? are we
just going to document that realms should accept both
unicode and regular strings?
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
> My attempt is to firm up the interface *now* and *perhaps* loosen
> it *later* if and when we have a use case.
My goal is the same. My rationale for suggesting the encoding strategy
is to say to potential users, "This is the interface: str=>str. We are
not supporting unicode, and we're doing that on purpose. Encode it
UTF-8 if you must, because that at least looks like ASCII some of the
time. If you have a better idea for how this should work, let us know,
but in the meanwhile DON'T decide to return random junk like
EncodedUsername("HELLO", "latin-1") from your requestAvatarId in order
to support internationalization: return a string or your code cannot
possibly work with other peoples' realms."
> For example, do you think
> we should support unicode avatar ids in files? [in which case a utf-8
> thingy might be sane]
Yes.
> Do you think we should support unicode avatar ids
> from databases? [in which case it's better to work with opaque objects
> and do no encoding/decoding at all]
Yes, but the *way* we should support unicode from databases with our
current interface would be to encode to utf-8 on one side of the
interface and decode on the other. We don't have a clear idea of what a
good opaque object would be.
> What happens when a unicode conversion
> error happens when trying to see if an avatar id belongs to a checker?
> Do we treat it as user-not-found or as
> catastrophical-bug-argh-shut-down-connection?
Well, that's up to the checker, to some extent. If implemented
properly, "user not found". If not implemented properly,
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:
ordinal not in range(128)" or some variation on that: this will still be
handled just fine by the checker and will not yield any particularly
sensitive information to the client.
> I have no answers for any of those. In my sole experience with a
> Hebrew/Arabic/English site, all the usernames [avatar IDs in cred-speak]
> are in plain ASCII [and I'm *pretty sure* it's not a database problem
> but a decision non-technical users made] So I'm pretty sure
> unicode-in-usernames is a) not an important issue b) a world of hurt.
> Hence, I would tend to discourage writing support for it until someone
> comes with a clear use-case ["I need Japanese username support. My
> users hate UTF-8 because it was invented by white people. I have a
> colon-separated file with usernames in SHIFT-JIS and passwords in
> ASCII. How do I use cred?" is a somewhat tongue-in-cheek but not
> *entirely* unrealistic, and I'd hate to tell this guy "well, we made
> some decisions about unicode incompatible with your needs. Nobody really
> uses unicode though, so let's try breaking unicode compatibility and
> see what happens."]
SHIFT-JIS encoded text by itself can easily be brought back and forth to
unicode, no? Doesn't the problem with "white people invented it" only
arise when you *mix* encodings? e.g. some BIG5 and some JIS on the same
webpage?
In other words, just have him have his checker know that the data
storage format is in JIS, but pass the username around encoded UTF-8
when it goes to the realm. Any display software he's writing along with
this will have to store the region-encoding hint along with their avatar
so that when he adds korean support, it will know which usernames came
from korean-encoded chinese vs. japanese-encoded kanji, but the extra
encode/decode/encode step will still go through unicode on its way
through the avatar and not cause problems or lose information.
Also, if I'm wrong (having never been *directly* involved with this kind
of asian-language insanity, I'm sure my understanding is at least
partially flawed) let's say he has to have special knowledge in his
realm of his checker. It's not the end of the world. With such an
unusual use-case, it is unlikely he will require integration with other
peoples' cred software, but *even if he does*, if he just has a wacky
encoding scheme, the sysadmin can just do a little work in the realm's
storage layer to make sure that it matches up with the peculiarities of
his encoding, manually running some scripts to go from SHIFT-JIS to
UTF-8 if necessary. This is, after all, what sysadmins do :).
*BUT*, this only works if we encourage some sanity in that we do not say
"we don't know how unicode should work at all, just do whatever you
want" - thus encouraging anyone with a unicode-ish use case, even
someone who knows considerably *less* about the potential problems that
entails (think newbie ex-java programmer here) they may decide to come
up with a whole secondary framework for username encodings, along with
self-hashing subclasses of string or other insanity, rather than just
adhering to this simple convention, because it is "cleaner" not to have
to call .encode or .decode in their application logic.
Hopefully this is clear. This suggestion is intended to preserve the
existing interface in all instances where it can be preserved, and to
give some boundaries for people who *THINK* their use-case is not supported.
Another thing that is probably going to make this discussion moot is the
emergence of UID/GIDs in just about every system I've been writing that
uses cred. It seems that a very likely pattern is that every user has a
numerical ID (RDBMS primary key, UNIX uid, storq storage ID, ZODB/cog
oid) and you should not even use usernames as avatar IDs at all if you
can avoid it.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
It's a doc bug, at least.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
Okay, it's clear nobody is totally sure how to do this correctly, so I'm
removing the release1.1 tag. It's still a doc bug, and we need to find someone
who has real unicode-login-name use cases and be sure that the solution I've
outlined below works. But I can't see why it wouldn't.
</pre>
from twisted.
@radix commented |
---|
#!html
<pre>
+1 for allow unicode. What's so hard about encoding an
avatar ID into utf-8 for backends?
What reasons are there to not allow it?
</pre>
from twisted.
@itamarst commented |
---|
#!html
<pre>
To clarify - credentialcheckers generate an avatar id, which
is passed to realm. We need to make sure realms work with
*all* credential checkers, and if most checkers generate
strings and the realm assumes this, and then admin changes
to a checker that produces unicode, the realm will *break*.
So, at the minimum we need to require realms to support
unicode, which we do not do at the moment.
</pre>
from twisted.
@radix commented |
---|
#!html
<pre>
So if I understand correctly, if we decide to say "avatar
IDs are unicode", then only the credentials-checker has to
care about encoding or decoding the unicode (iff the
backend/storage mechanism it uses doesn't natively support
unicode). When is this a problem?
</pre>
from twisted.
@itamarst commented |
---|
#!html
<pre>
If we document that avatar ids can be both unicode and
8-bit, we should be ok, since all realms will be able to
deal with both by downgrading or upgrading, depending on how
their storage works. So if we ever decide to restrict it to
unicode only it will still work.
</pre>
from twisted.
@radix commented |
---|
#!html
<pre>
this is obviously not urgent
</pre>
from twisted.
@exarkun commented |
---|
#!html
<pre>
http://www.ietf.org/internet-drafts/draft-ietf-sasl-saslprep-03.txt
has something to say about this.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
Ahem.
"It is not intended to be used for to prepare identities
which are not simple user names (e.g., distinguished names
and domain names). Nor is the profile intended to be used
for simple user names which require different handling.
Protocols (or applications of those protocols) which have
application-specific identity forms and/or comparison
algorithms should use mechanisms specifically designed for
these forms and algorithms."
I don't understand what that spec is trying to say.
How about this - for comparison and such, we will always
call 'credStringValue.decode("utf-8")'. This will disallow
non-ASCII characters in non-unicode strings, but will still
allow unicode strings.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
Plus, we should get this in for release 1.1.
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
Plus I meant ".encode", not ".decode"
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
Is this really a bug?
I'm not sure...
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
Should we just document that "currently, unicode IDs are not supported -- if
you have a use case, please explain it in a bug report" and close this? I'm
loath to add any more code, or *EVEN DOCUMENT AN APPROACH* if we have nfi
what we are talking about. I'd feel much safer supporting stuff with a use
case in mind [even if the support comes to documenting stuff].
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
Here's a patch to document the non-supportingness of unicode strings
Index: doc/howto/cred.xhtml
===================================================================
RCS file: /cvs/Twisted/doc/howto/cred.xhtml,v
retrieving revision 1.5
diff -u -r1.5 cred.xhtml
--- doc/howto/cred.xhtml 17 Oct 2003 04:46:19 -0000 1.5
+++ doc/howto/cred.xhtml 19 Oct 2003 12:21:43 -0000
@@ -128,6 +128,12 @@
<p>This method will typically be called from 'Portal.login'. The avatarId
is the one returned by a CredentialChecker.</p>
+<div class="note">
+Avatars, currently, can only be strings. Passing unicode strings around,
+in particular, is <em>not</em> supported by the infrastructure. If you
+find a need for unicode usernames, please file a bug with your specific
+use-case.</div>
+
<p>The important thing to realize about this method is that if it is being
called, <em>the user has already authenticated</em>. Therefore, if possible,
the Realm should create a new user if one does not already exist
</pre>
from twisted.
@glyph commented |
---|
#!html
<pre>
I don't have no idea whatsoever, I just don't know that this is a
panacea. Python's encoding support is very well done, so it's not like
we're designing from scratch either.
Considering that this approach will continue to work even if we firm up
the spec so that it's no longer really necessary, I'd still like to
suggest it, rather than having folks who *really* have NFI what they're
talking about come up with some cockeyed idea where they just have
magical realms that emit some other random instance object from
requestAvatarId rather than actually using this "workaround" for
conforming to the interface.
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
I didn't understand a word you said.
Please attempt to be clearer.
My attempt is to firm up the interface *now* and *perhaps* loosen
it *later* if and when we have a use case. For example, do you think
we should support unicode avatar ids in files? [in which case a utf-8
thingy might be sane] Do you think we should support unicode avatar ids
from databases? [in which case it's better to work with opaque objects
and do no encoding/decoding at all] What happens when a unicode conversion
error happens when trying to see if an avatar id belongs to a checker?
Do we treat it as user-not-found or as
catastrophical-bug-argh-shut-down-connection?
I have no answers for any of those. In my sole experience with a
Hebrew/Arabic/English site, all the usernames [avatar IDs in cred-speak]
are in plain ASCII [and I'm *pretty sure* it's not a database problem
but a decision non-technical users made] So I'm pretty sure
unicode-in-usernames is a) not an important issue b) a world of hurt.
Hence, I would tend to discourage writing support for it until someone
comes with a clear use-case ["I need Japanese username support. My
users hate UTF-8 because it was invented by white people. I have a
colon-separated file with usernames in SHIFT-JIS and passwords in
ASCII. How do I use cred?" is a somewhat tongue-in-cheek but not
*entirely* unrealistic, and I'd hate to tell this guy "well, we made
some decisions about unicode incompatible with your needs. Nobody really
uses unicode though, so let's try breaking unicode compatibility and
see what happens."]
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
<moshez> glyph: transporting SHIFT_JIS correctly across unicode is a fairly
non-trivial task
<glyph> moshez: craptastic
<glyph> moshez: python's JIS encodings won't do it for you?
<moshez> that's why I chose SHIFT-JIS
<moshez> glyph: my understanding is that JIS->Unicode is a political issue
rife with difficulties centering around the difference between lots
of subtle concepts I've no idea about like the difference between a
character and a code point
<glyph> moshez: well, my point is, there is *SOME* way to encode what you want
as a string
<glyph> moshez: so the *convention* should be UTF-8
<glyph> if you can't do UTF-8, well, that sucks, but it's just a convention
anyway
<glyph> moshez: okay, but are we in agreement?
<moshez> glyph: well, I still dislike recommending a work-around [use utf-8]
without a clear view of the implication
<glyph> moshez: I think we've demonstrated that we have a clear view of 90%
of the implications
<moshez> glyph: I prefer "bug us with a use case"
<glyph> moshez: they won't
<moshez> glyph: so do you want to formulate a new note, and check it in?
<glyph> moshez: OK. I'll add something to the documentation tonight.
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
Adding proposed formulation and marking patch.
Index: doc/howto/cred.xhtml
===================================================================
RCS file: /cvs/Twisted/doc/howto/cred.xhtml,v
retrieving revision 1.5
diff -u -r1.5 cred.xhtml
--- doc/howto/cred.xhtml 17 Oct 2003 04:46:19 -0000 1.5
+++ doc/howto/cred.xhtml 20 Oct 2003 16:42:39 -0000
@@ -128,6 +128,12 @@
<p>This method will typically be called from 'Portal.login'. The avatarId
is the one returned by a CredentialChecker.</p>
+<div class="note">
+Note that <code>avatarId</code> must always be a string. In particular,
+do not use unicode strings. If internationalized support is needed,
+it is recommended to use UTF-8, and take care of decoding in the realm.
+</div>
+
<p>The important thing to realize about this method is that if it is being
called, <em>the user has already authenticated</em>. Therefore, if possible,
the Realm should create a new user if one does not already exist
</pre>
from twisted.
@moshez commented |
---|
#!html
<pre>
Modified files:
Twisted/doc/howto/cred.xhtml 1.6 1.7
Log message:
document internationalization suckage
Fixed.
</pre>
from twisted.
Related Issues (20)
- Deprecation of SSH ciphers HOT 1
- t.c.ssh.channel.SSHChannel.write has no limit to the buffer
- t.c.ssh.filetransferserver.FileTransferServer.packet_READ with unhandled error
- type confusion in KnownHosts.verifyHostKey traceback when adding ip key
- Compiled version of DelayedCall
- Change documentation for optional dependency SOAPpy to SOAPpy-py3 HOT 6
- Remove the soap code
- `twisted.internet.test.test_tcp.AbortConnectionTests_*Tests.test_fullWriteBuffer` test timeout on macOS HOT 13
- TwistedWeb ReverseProxyResource example has string not bytes as third argument
- Setup benchmarks using Codspeed.io HOT 6
- Speed up twisted.web server, part 2 of N
- Deprecate t.w.h.HTTPClient HOT 4
- t.w.h.HTTPClient parses Content-Length laxly HOT 1
- disttrial / `trial -j` should support debug prints
- `twisted.python.test.test_sendmsg.SendmsgTests.test_sendSubProcessFD` is flaky HOT 2
- Deprecation docs should use NEXT placeholders
- Improve defer.inlineCallbacks tests
- twisted.web generates deprecation warnings internally HOT 4
- Expand benchmarks to run on PyPy as well HOT 2
- ``twisted.python.code`` package proposal HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twisted.