184192 – sys-apps/less-394 cannot display some html

Bug 184192 - sys-apps/less-394 cannot display some html

Summary: sys-apps/less-394 cannot display some html

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-07-04 14:03 UTC by Michal Suchanek
Modified:	2007-07-09 10:34 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michal Suchanek 2007-07-04 14:03:29 UTC

There is lesspipe.sh set up which filters html through links to present the formatted page rather than the source.
This by itself would be nice feature if it worked. However, it fails in some cases.

First, links does not understand utf-8. Thus in utf-8 locale the best approximation you can use is 7bit ascii. This is quite limited and cannot display much of the content that would be otherwise visible in utf-8 locale.

Note that you may have to manually set up links to display 7bit ascii only, the default is probably something like latin1 which would produce garbage, and display even less characters than plain 7bit ascii.

Also the page may be in the encoding of the current locale. It is often the case for locally created pages. However, links would by default interpret it in latin1 again unless there is the correct meta element. This would again hide content that would be otherwise visible without using the filter.

Note that meta elements are not needed to specify the encoding if the server is set up to send the correct encoding header.

Reproducible: Always

Steps to Reproduce:
in utf-8 locale
1. install less
2. install links
3. view a web page with 8bit characters in less

Actual Results:  
page often garbled

Expected Results:  
correct web page display when at least one of these conditions is met:

- the page specifies the encoding and the characters can be represented in the current locale

- the page is in the current locale

links  2.1_pre28-r1
less 394

Comment 1 SpanKY gentoo-dev

2007-07-04 18:13:28 UTC

dont really know what you expect from less here ... get links or lynx to work with unicode

Comment 2 Michal Suchanek 2007-07-04 19:40:52 UTC

I would expect it not to use broken software unless I explicitly set it up to.

Comment 3 Jakub Moc (RETIRED) gentoo-dev

2007-07-04 20:07:59 UTC

(In reply to comment #2)
> I would expect it not to use broken software unless I explicitly set it up to.

So install lynx instead; it handles unicode just fine.

Comment 4 Michal Suchanek 2007-07-04 22:09:32 UTC

The script uses links before lynx. I have lynx 2.8.6-r2 installed.

Not that it is much good at handling unicode either.

If I go to http://www.ruby-lang.org/ja and save the page, I can at least tell it is Japanese by looking at it by "cat page.html | less".

"lynx page.html" yields very little text, it does not even hint there is some content that could be seen if stuff worked correctly, seems like the page is broken rather than the viewer. This is much worse that links.

links displays lots of garbage instead of the Japanese text. It is not possible to tell what it is but at least something is there.

When I save http://seznam.cz and view it with lynx it again removes some characters. I do not see how this is better handling of unicode, or any handling of unicode at all.

With links I see the page properly, apparently it has some conversion table for Latin characters with diacritics, and converts them to ascii.

Not ideal as I could see all the characters but it picks them from the html for me.

Generally picking the text from html would be nice but I do not see it working so I would rather have text with the tags than (almost) nothing.

Comment 5 SpanKY gentoo-dev

2007-07-06 20:29:01 UTC

get links/lynx fixed

Comment 6 Michal Suchanek 2007-07-09 10:34:50 UTC

oh, so you install a script that relies on a functionality that's never been implemented in either of the three packages it uses, and it is not a bug in the script but rather in the packages it misuses?