1 Unicode and Character Sets in WebStack
2 --------------------------------------
3
4 Unicode text should be converted to the chosen character set (encoding) when
5 written to the response stream.
6
7 Classic Python strings are written directly to the response stream without
8 encoding.
9
10 Character Set Semantics in WebStack
11 -----------------------------------
12
13 Character sets (or encodings) are relevant in two areas:
14
15 * The encoding of output data.
16 * The processing of input data.
17
18 When producing HTML pages containing form fields and interpreting the values of
19 such fields from a request body, it is necessary to know...
20
21 * The character set used to encode the values sent by the browser. This is
22 typically determined by...
23
24 * The character set used to encode the HTML page from which the field values
25 originated.
26
27 It is therefore also necessary to remain consistent in the usage of character
28 sets when specifying content types. WebStack enforces the following rules:
29
30 * Where the request content type specifies a character set, this is used to
31 decode the request body parameters unless explicitly overridden.
32
33 * Where the request content type does not specify a character set, a default
34 character set is used to decode the request body parameters unless
35 overridden.
36
37 * Where the response content type specifies a character set, this is used to
38 encode Unicode response data (eg. HTML pages).
39
40 * Where the response content type does not specify a character set, a default
41 character set is used to encode Unicode response data (eg. HTML pages).
42
43 Restrictions in and Omissions from Standards
44 --------------------------------------------
45
46 The encoding of character sets such as UTF-16 in HTTP POST request body
47 messages of content/media type application/x-www-form-urlencoded is not
48 properly standardised. Therefore, it is highly recommended that UTF-8 be used
49 as an encoding should the various single byte encodings (eg. ISO-8859-1) not
50 cover the range of characters to be displayed and received.
51
52 Framework Behaviour
53 -------------------
54
55 The Java Servlet API imposes restrictions on decoding request body parameters
56 by stating that the character encoding (ServletRequest.setCharacterEncoding)
57 must be set before any reading of the request body is attempted.