Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 184192 - sys-apps/less-394 cannot display some html
Summary: sys-apps/less-394 cannot display some html
Status: RESOLVED WONTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-04 14:03 UTC by Michal Suchanek
Modified: 2007-07-09 10:34 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Suchanek 2007-07-04 14:03:29 UTC
There is lesspipe.sh set up which filters html through links to present the formatted page rather than the source.
This by itself would be nice feature if it worked. However, it fails in some cases.

First, links does not understand utf-8. Thus in utf-8 locale the best approximation you can use is 7bit ascii. This is quite limited and cannot display much of the content that would be otherwise visible in utf-8 locale.

Note that you may have to manually set up links to display 7bit ascii only, the default is probably something like latin1 which would produce garbage, and display even less characters than plain 7bit ascii.

Also the page may be in the encoding of the current locale. It is often the case for locally created pages. However, links would by default interpret it in latin1 again unless there is the correct meta element. This would again hide content that would be otherwise visible without using the filter.

Note that meta elements are not needed to specify the encoding if the server is set up to send the correct encoding header.

Reproducible: Always

Steps to Reproduce:
in utf-8 locale
1. install less
2. install links
3. view a web page with 8bit characters in less

Actual Results:  
page often garbled

Expected Results:  
correct web page display when at least one of these conditions is met:

- the page specifies the encoding and the characters can be represented in the current locale

- the page is in the current locale

links  2.1_pre28-r1
less 394
Comment 1 SpanKY gentoo-dev 2007-07-04 18:13:28 UTC
dont really know what you expect from less here ... get links or lynx to work with unicode
Comment 2 Michal Suchanek 2007-07-04 19:40:52 UTC
I would expect it not to use broken software unless I explicitly set it up to.
Comment 3 Jakub Moc (RETIRED) gentoo-dev 2007-07-04 20:07:59 UTC
(In reply to comment #2)
> I would expect it not to use broken software unless I explicitly set it up to.

So install lynx instead; it handles unicode just fine.
Comment 4 Michal Suchanek 2007-07-04 22:09:32 UTC
The script uses links before lynx. I have lynx 2.8.6-r2 installed.

Not that it is much good at handling unicode either.

If I go to http://www.ruby-lang.org/ja and save the page, I can at least tell it is Japanese by looking at it by "cat page.html | less".

"lynx page.html" yields very little text, it does not even hint there is some content that could be seen if stuff worked correctly, seems like the page is broken rather than the viewer. This is much worse that links.

links displays lots of garbage instead of the Japanese text. It is not possible to tell what it is but at least something is there.

When I save http://seznam.cz and view it with lynx it again removes some characters. I do not see how this is better handling of unicode, or any handling of unicode at all.

With links I see the page properly, apparently it has some conversion table for Latin characters with diacritics, and converts them to ascii.

Not ideal as I could see all the characters but it picks them from the html for me.

Generally picking the text from html would be nice but I do not see it working so I would rather have text with the tags than (almost) nothing.
Comment 5 SpanKY gentoo-dev 2007-07-06 20:29:01 UTC
get links/lynx fixed
Comment 6 Michal Suchanek 2007-07-09 10:34:50 UTC
oh, so you install a script that relies on a functionality that's never been implemented in either of the three packages it uses, and it is not a bug in the script but rather in the packages it misuses?