932154 – x11-terms/xterm-390 breaks display of german "ß"

Bug 932154 - x11-terms/xterm-390 breaks display of german "ß"

Summary: x11-terms/xterm-390 breaks display of german "ß"

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	No maintainer - Look at https://wiki.gentoo.org/wiki/Project:Proxy_Maintainers if you want to take care of it

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2024-05-18 13:02 UTC by Klaus Ethgen
Modified:	2024-05-23 01:39 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Klaus Ethgen 2024-05-18 13:02:41 UTC

When updating x11-terms/xterm from 388 to 390, the german umlaut "ß" is displayed as "_" (underscore). Going back to 388 solves the bug.

Reproducible: Always

Steps to Reproduce:
1. Install x11-terms/xterm-390
2. try to type/display ß
3.
Actual Results:  
_

Expected Results:  
ß

I use locale de_DE (so latin1, no utf8)

Comment 1 Sam James archtester

2024-05-18 13:05:52 UTC

Could you try 391?

Comment 2 Klaus Ethgen 2024-05-18 13:14:42 UTC

Sure, I did right in the moment and the bug is still there.

However, I noticed that the copy&paste bug (I didn't mentioned here, but umlauts was double encoded whith 390) is fixed.

Comment 3 Sam James archtester

2024-05-19 13:41:33 UTC

CCing upstream maintainer for advice.

Comment 4 Thomas Dickey 2024-05-19 14:36:12 UTC

sounds like this fix:

https://github.com/ThomasDickey/xterm-snapshots/commit/565644dd4a77ecbac2a29f2b5d00fbc1d51bc34a

(391 is current, and 392 is "soon", to fix a different regression)

Comment 5 Thomas Dickey 2024-05-19 14:39:12 UTC

...though I see now the followup comment about copy/paste.

I'll take a look, though I've no German keyboard
(and will have to imitate the events somehow).

Comment 6 Klaus Ethgen 2024-05-19 16:17:24 UTC

I did already test 391. The patch, that is in 391 already does fix something else I seen with 390 (but did not report here).

But the "ß" is still gone.

Comment 7 Klaus Ethgen 2024-05-19 16:34:57 UTC

I did bisect xterm from that repository. The patch that introduces the bug is 36d1901c96261a257cca22c75cbcbbd089372c1e (xterm-389h).

Comment 8 Thomas Dickey 2024-05-19 19:36:41 UTC

The relevant change in 389h appears to be this:

    amend UPSS change from patch #389, fixing a regression in VT100/VT220 character sets.

UPSS is basically a choice between Latin-1 or DEC's multinational character set. The latter conflicts with Unicode, and in 389-391, I've been ironing out problems in this area.  Last week, in reviewing this stuff (with the test-script used for #389) I did notice what might be _this_ bug and that it predated #391.  Fixing the test discrepancy (without reintroducing the other regression) was on my to-do list...

Comment 9 Klaus Ethgen 2024-05-19 20:57:51 UTC

I got lost in the implementation of the charset selection.

As I would see it, there is a menu switch which allows to select (or not select) UTF8. (On my side, it is not switched on.)

Also there is the locales that select a clear charset.

With latin1, all chars from a1 to ff are in use. Most of them are overlapping DEC.

Can you tell me, what is the different between ä, ö or ü and a ß? (I also did not check the other chars like ½, © or » as I seldom use them.)

Comment 10 Thomas Dickey 2024-05-19 21:27:32 UTC

yes, it's complicated.  "ß" is 223 (0xdf), in both Unicode and ISO-8859-1.

However, the ISO-8859-1 encoding lets one assign that or other character
sets (such as ASCII) to 4 slots (G0, G1, G2, G3) and assign those to the
actual display (GL and GR).

The character sets include DEC-specific as well as several ISO-8859-x's
(about 20 total).

In 389, I got almost all of those working - but there was the UPSS
(user-preferred selection set), which I'd overlooked (originally because
I didn't know enough...).  Fixing that's not straightforward, since there's
a lot of overlap in the program logic between Unicode versus the combination
of ASCII + Latin-1.

Looking at the diff between xterm-389g and xterm-389h, the problem area
is mostly in charproc.c (resetCharsets) and the change to charsets.c
(which probably looks simple - but the pitfall is that overlap - the
simple change there requires some other tweaks to make it all work).

Anyway, I'm looking into the UPSS tests, to see where to adjust.

Comment 11 Thomas Dickey 2024-05-22 23:37:05 UTC

I believe this is fixed in xterm patch #392.

Comment 12 Larry the Git Cow gentoo-dev

2024-05-23 01:36:41 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=5379ee90f68182a24e598fe1765874c93a52be35

commit 5379ee90f68182a24e598fe1765874c93a52be35
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-05-23 00:39:37 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-05-23 01:35:17 +0000

    x11-terms/xterm: add 392
    
    Closes: https://bugs.gentoo.org/932154
    Signed-off-by: Sam James <sam@gentoo.org>

 x11-terms/xterm/Manifest         |   2 +
 x11-terms/xterm/xterm-392.ebuild | 110 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

Comment 13 Sam James archtester

2024-05-23 01:39:10 UTC

Thank you Thomas! Klaus, let us know if there's further issues.