Guillaume Guillaume - 3 months ago 15
Java Question

How to set request encoding in Tomcat?

I have a problem in my Java webapp.

Here is the code in index.jsp:

<%@page contentType="text/html" pageEncoding="UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<% request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
%>

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>JSP Page</title>
</head>
<body>
<h1>Hello World!</h1>

<form action="index.jsp" method="get">
<input type="text" name="q"/>
</form>

Res: <%= request.getParameter("q") %>
</body>
</html>


When I wireshark a request, my browser sends this header:

GET /kjd/index.jsp?q=%C3%A9 HTTP/1.1\r\n
...
Accept-Charset: UTF-8,*\r\n


And the Tomcat server returns me this:

Content-Type: text/html;charset=UTF-8\r\n


But if I send "é"(%C3%A9 in UTF-8) in my form, "é" is displayed instead.

What I understand is that the browser sends an "é" encoded with UTF-8 (the %C3%A9).

But the server interpret this as ISO-8859-1. So the %C3 is decoded as à and %A9 as ©, and then sends back the response encoded in UTF-8.

In the code, the requests should be decoded with UTF-8:

request.setCharacterEncoding("UTF-8");


But, if I send this url:

http://localhost:8080/kjd/index.jsp?q=%E9


the "%E9" is decocded with ISO-8859-1 and an "é" is displayed.

Why isn't this working? Why requests are decoded with ISO-8859-1?

I've tried it on Tomcat 6 and 7, and on Windows and Ubuntu.

Answer

The request.setCharacterEncoding("UTF-8"); only sets the encoding of the request body (which is been used by POST requests), not the encoding of the request URI (which is been used by GET requests).

You need to set the URIEncoding attribute to UTF-8 in the <Connector> element of Tomcat's /conf/server.xml to get Tomcat to parse the request URI (and the query string) as UTF-8. This indeed defaults to ISO-8859-1. See also the Tomcat HTTP Connector Documentation.

<Connector ... URIEncoding="UTF-8">

or to ensure that the URI is parsed using the same encoding as the body :

<Connector ... useBodyEncodingForURI="true">

See also:


Please get rid of those scriptlets in your JSP. The request.setCharacterEncoding("UTF-8"); is called at the wrong moment. It would be too late whenever you've properly used a Servlet to process the request. You'd rather like to use a filter for this. The response.setCharacterEncoding("UTF-8"); part is already implicitly done by pageEncoding="UTF-8" in top of JSP.

I also strongly recommend to replace the old fashioned <%= request.getParameter("q") %> scriptlet by EL ${param.q}, or with JSTL XML escaping ${fn:escapeXml(param.q)} to prevent XSS attacks.