NLS: update handling of Unicode

This patch (as1239) updates the kernel's treatment of Unicode.  The
character-set conversion routines are well behind the current state of
the Unicode specification: They don't recognize the existence of code
points beyond plane 0 or of surrogate pairs in the UTF-16 encoding.

The old wchar_t 16-bit type is retained because it's still used in
lots of places.  This shouldn't cause any new problems; if a
conversion now results in an invalid 16-bit code then before it must
have yielded an undefined code.

Difficult-to-read names like "utf_mbstowcs" are replaced with more
transparent names like "utf8s_to_utf16s" and the ordering of the
parameters is rationalized (buffer lengths come immediate after the
pointers they refer to, and the inputs precede the outputs).
Fortunately the low-level conversion routines are used in only a few
places; the interfaces to the higher-level uni2char and char2uni
methods have been left unchanged.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This commit is contained in:
Alan Stern
2009-04-30 10:08:18 -04:00
committed by Greg Kroah-Hartman
父節點 a853a3d4eb
當前提交 74675a5850
共有 9 個文件被更改,包括 183 次插入138 次删除

查看文件

@@ -15,7 +15,11 @@ static int uni2char(wchar_t uni, unsigned char *out, int boundlen)
{
int n;
if ( (n = utf8_wctomb(out, uni, boundlen)) == -1) {
if (boundlen <= 0)
return -ENAMETOOLONG;
n = utf32_to_utf8(uni, out, boundlen);
if (n < 0) {
*out = '?';
return -EINVAL;
}
@@ -25,11 +29,14 @@ static int uni2char(wchar_t uni, unsigned char *out, int boundlen)
static int char2uni(const unsigned char *rawstring, int boundlen, wchar_t *uni)
{
int n;
unicode_t u;
if ( (n = utf8_mbtowc(uni, rawstring, boundlen)) == -1) {
n = utf8_to_utf32(rawstring, boundlen, &u);
if (n < 0 || u > MAX_WCHAR_T) {
*uni = 0x003f; /* ? */
n = -EINVAL;
return -EINVAL;
}
*uni = (wchar_t) u;
return n;
}