Home News
68
0

 

 

 

In the Javadoc, the difference is pretty clear:

void setCharacterEncoding (string character set) Specifies the character encoding (MIME character set) of the response sent to the client, for example B. UTF-8. If the character encoding has already been set using setContentType (java.lang.String) or setLocale (java.util.Locale) , this method overwrites it. Calling setContentType (java.lang.String) with the string text / html and calling this method with a UTF-8 string is equivalent to calling setContentType with the string < code> text / html; Character set = UTF-8 .


void setContentType (string type) Specifies the content type of the response that is sent to the client if the response has not yet been validated. The specified content type may include a character encoding specification such as B. text / html; charset = UTF-8 .

Why

What is the standard character encoding for the request or response body?

If no character encoding is specified, the servlet specification requires the use of an encoding in accordance with ISO-8859-1. The character encoding of the body of the HTTP message (request or responsethat) is specified in the Content type header field. An example of such a header is the content type : text / html; Character set = ISO-8859-1 , which explicitly indicates that a standard value (ISO-8859-1) is being used.

Links: HTTP 1.1 Specification Section 3.7.1

The above general rules apply to servlets. The behavior of JSP pages is additionally defined by the JSP specification. The handling of the character encoding in the request is the same, but the character encoding in the response is slightly different. See the JSP.4.2 Response Character Encoding Chapter. For JSP pages with standard syntax, the standard response character set is normal ISO-8859-1 , for pages with XML syntax, UTF-8 .

Why does it have to be this way?

Everything that is described on this page depends on the practical interpretation of a number of specifications. When working with Java servlets, the Java Servlet specification is the primary reference, but the Servlet specification itself is based on older specifications such as HTTP. Here are a few links before pointing out exactly where these items are. More detailedA complete list can be found on the Technical Data page.

  1. The Java 4.0 Servlet Specification
  2. HTTP 1.1 protocol: syntax and message routing, HTTP 1.1 protocol: semantics and content …
  3. URI Syntax
  4. ARPA Internet Text Messages
  5. HTML 4, HTML 5
Standard encoding for request and response bodies

See “Standard Encoding for POST” below.

Standard encoding for GET

servlet response set character encoding

The character set for HTTP request strings (this is the technical term for “GET parameters”) can be found in sections 2 and 2.1 of the URI Syntax specification. The character set is defined as US-ASCII. Any character that does not appear in US-ASCII must be encoded in some way. Section 2.1 of the URI syntax specification states that non-US-ASCII characters must be encoded using % escape sequences: each character is encoded as a literal % followed by two hexadecimal codes indicating the character code. So a (US-ASCII 97 = 0x61) matches % 61 . Although the URI specification does not prescribe a standard encoding for bytes, it doesPercent-encoded, she recommends UTF-8, especially for new URI schemes, and most modern user agents have chosen UTF-8 for characters. URI encoding percentage.

Some notes on URI character encoding:

  1. ISO-8859-1 and ASCII are compatible for character codes 0x20 to 0x7E, so they are often used interchangeably.
  2. Modern browsers that encode URIs using UTF-8. Some browsers seem to use the current page encoding to encode link URIs.
  3. HTML 4.0 recommends using UTF-8 to encode the query string.
  4. When in doubt, use POST for any data that you think would be difficult to survive a query string traversal.
Standard encoding for POST

Earlier versions of the HTTP / 1.1 specification (such as RFC 2616) indicated that ISO-8859-1 is the default character set for HTTP request and response text bodies if no characters are specified. Although RFC 7231 removed this default, the servlet specification continues to follow suit. Hence, the servlet specification states that the request POST that does not indicate encoding shall be treated as ISO-8859-1 , except for the / x- www-form-urlencoded application, which by default should be interpreted as {{`}} US-ASCII` (because by definition it should initially only contain characters in the ASCII range).

Some notes on character encoding in POST request:

  1. Section 3.4.1 of RFC 2616 specified that recipients of an HTTP message must match the character encoding specified by the sender in the Content-Type header. if the encoding is supported. The missing character allows the recipient to “guess” which encoding is appropriate.
  2. Today, most web browsers do not specify the character set for the request, even if it is something other than ISO-8859-1. This seems to violate the HTTP specification. It seems that most web browsers send the request body using the page encoding that was used to create the POST (for example, the
    element came from a page with a specific encoding … c ‘is this encoding used to submit the POST data for this form) …
Percentage encoding for application / x-www-form-urlencoded

The HTML 4.01 spec states that percent encoding of all non-alphanumeric characters in application / x-www-form-urlencoded (the standard content type for submitting HTML forms) should be done with < code> US-ASCII byte sequences . However, HTML 5 changed this to use UTF-8 byte sequences that match the modern percentage encoding for URLs. Therefore, modern browsers encode UTF-8 sequences in percentages when forms are submitted with application / x-www-form-urlencoded .

The Servlet specification, however, requires servlet containers to interpret percentage encoded sequences in application / x-www-form-urlencoded as ISO-8859-1 Default configuration , the content is corrupted due to a character set mismatch. Here’s how it can be reconfigured in Tomcat.

Section 3.1 of the ARPA Internet Text Message Specification states that headers are always US-ASCII. Anything beyond this should be doneEncrypted. For more information about query strings in URIs, see the section above.

Popular ServletRequest Methods

  • getAttribute

    Returns the value of the specified attribute as an object, or not if there is no attributename exists

  • setAttribute

    Retains the attribute in this request. Attributes are reset between requests. TheseThe method is mostly

  • getParameter

    Returns the value of the query parameter as a string, or null if the parameterno. Requests

  • removeAttribute

    Removes an attribute from this request. This method is usually unnecessary becauseAttributes only

  • getRequestDispatcher

    Returns a RequestDispatcher object that acts as a wrapper for the resourceis on the specified path

  • getParameterNames

    Returns an enumeration of String objects containing parameter namesincluded in this report

  • getCharacterEncoding

    Returns the name of the character encoding used in the body of this request.This method returns null

  • getParameterMap

    Returns the java.util.Map parameters for this request. Request parametersadditional information

  • getParameterValues ​​

    Returns an arrayString objects containing all of the specified values.Has query parameters or

  • getAttributeNames

    Returns an enumeration containing the names of the attributes available to itStudy of. This method

  • getRemoteAddr

    Returns the IP (Internet Protocol) address of the client or the last proxy server sent.Demand. For HT

  • getInputStream

    Gets the request body as binary data using ServletInputStream.Either this method or #

  •