Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 29063 - apache 2 can't handle charset settings from HTML files
Summary: apache 2 can't handle charset settings from HTML files
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High blocker (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-09-18 20:21 UTC by Clemens Schwaighofer
Modified: 2004-02-21 09:27 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Clemens Schwaighofer 2003-09-18 20:21:34 UTC
i have a HTML (or PHP) with this:

<html>
<head>
<title>TEST</title>
<meta http-equiv="Content-Type" content="text/html; charset=SHIFT_JIS">
</head>

<body>
&#12371;&#12428;&#12399;&#12471;&#12501;&#12488;JIS&#12391;&#12377;
</body>
</html>

if I view this file with an Apache 1.3.28 I get correct ShiftJIS, smae file in
same enviroment with Apache 2.0.47 sends out default 8859-1 encoding and not the
correct ShiftJIS.

I tried the same thing on a Redhat 8 with apache 2.0.40 and RedHat 9 with apache
2.0.40 and RedHat 9 with apache 2.0.47 from rawhide. All the same result.

I will also bug report to Apache if not yet done by someone else.

I think this shows why it was WRONG to put Apache 2 into stable (beside the fact
that the php apxs2 is still EXPERIMENTAL!!!!)

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Comment 1 Clemens Schwaighofer 2003-09-18 21:03:15 UTC
mea culpa

this is not a bug, this is a feature. after checking with lynx I saw that apache sends this header, why? because it is set so in the conf files from apache (a "security" feature, that sucks ... like the hole apache 2 anyway ...)

well, comment that line if you use plain HTML or comment that line anyway as it sucks.

but if you cannot, and u use php, you have to send the content type with charset with the header(""); command
Comment 2 SpanKY gentoo-dev 2003-09-19 00:16:31 UTC
so it's a matter of config file management
Comment 3 Clemens Schwaighofer 2003-09-19 16:59:45 UTC
yes it is. sadly its not good documented and turned on in default wich will give big headaches (like it did me) to most people who think upgrading is more ore less "seamless" ...
Comment 4 Felix Buenemann 2004-02-21 09:27:57 UTC
In the commonapache2.conf it says:
---
    #
    # Specify a default charset for all pages sent out. This is
    # always a good idea and opens the door for future internationalisation
    # of your web site, should you ever want it. Specifying it as
    # a default does little harm; as the standard dictates that a page
    # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
    # are merely stating the obvious. There are also some security
    # reasons in browsers, related to javascript and URL parsing
    # which encourage you to always set a default char set.
    #
    AddDefaultCharset ISO-8859-1
---
This is nonsense, as it implies that you are able to override the charset send by server with the one specified in the html document. But it is the other way around (server overrides document), so this is a really bad thing to do (TM).
IMHO the ebuild should patch the commonapache2.conf to disable this by default.
And setting this just because of probably buggy clients IMHO doesn't make much sense as it's merely a border case, while html files with different charsets are rather common.