There is lesspipe.sh set up which filters html through links to present the formatted page rather than the source. This by itself would be nice feature if it worked. However, it fails in some cases. First, links does not understand utf-8. Thus in utf-8 locale the best approximation you can use is 7bit ascii. This is quite limited and cannot display much of the content that would be otherwise visible in utf-8 locale. Note that you may have to manually set up links to display 7bit ascii only, the default is probably something like latin1 which would produce garbage, and display even less characters than plain 7bit ascii. Also the page may be in the encoding of the current locale. It is often the case for locally created pages. However, links would by default interpret it in latin1 again unless there is the correct meta element. This would again hide content that would be otherwise visible without using the filter. Note that meta elements are not needed to specify the encoding if the server is set up to send the correct encoding header. Reproducible: Always Steps to Reproduce: in utf-8 locale 1. install less 2. install links 3. view a web page with 8bit characters in less Actual Results: page often garbled Expected Results: correct web page display when at least one of these conditions is met: - the page specifies the encoding and the characters can be represented in the current locale - the page is in the current locale links 2.1_pre28-r1 less 394
dont really know what you expect from less here ... get links or lynx to work with unicode
I would expect it not to use broken software unless I explicitly set it up to.
(In reply to comment #2) > I would expect it not to use broken software unless I explicitly set it up to. So install lynx instead; it handles unicode just fine.
The script uses links before lynx. I have lynx 2.8.6-r2 installed. Not that it is much good at handling unicode either. If I go to http://www.ruby-lang.org/ja and save the page, I can at least tell it is Japanese by looking at it by "cat page.html | less". "lynx page.html" yields very little text, it does not even hint there is some content that could be seen if stuff worked correctly, seems like the page is broken rather than the viewer. This is much worse that links. links displays lots of garbage instead of the Japanese text. It is not possible to tell what it is but at least something is there. When I save http://seznam.cz and view it with lynx it again removes some characters. I do not see how this is better handling of unicode, or any handling of unicode at all. With links I see the page properly, apparently it has some conversion table for Latin characters with diacritics, and converts them to ascii. Not ideal as I could see all the characters but it picks them from the html for me. Generally picking the text from html would be nice but I do not see it working so I would rather have text with the tags than (almost) nothing.
get links/lynx fixed
oh, so you install a script that relies on a functionality that's never been implemented in either of the three packages it uses, and it is not a bug in the script but rather in the packages it misuses?