1.1 --- a/docs/paths.html Thu Aug 25 21:44:47 2005 +0000
1.2 +++ b/docs/paths.html Thu Aug 25 21:45:47 2005 +0000
1.3 @@ -3,7 +3,6 @@
1.4
1.5 <title>URLs and Paths</title><meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" />
1.6 <link href="styles.css" rel="stylesheet" type="text/css" /></head>
1.7 -
1.8 <body>
1.9 <h1>URLs and Paths</h1>
1.10 <p>The URL at which your application shall appear is arguably the first
1.11 @@ -54,13 +53,31 @@
1.12 <dd>This gets the entire path of a resource including parameter
1.13 information (as described in <a href="parameters.html">"Request
1.14 Parameters and Uploads"</a>).<br />
1.15 -An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see <a href="encodings.html">"Character Encodings"</a> for more information.</dd>
1.16 +An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
1.17 <dt><code>get_path_without_query</code></dt>
1.18 <dd>This gets the entire path of a resource but without any parameter
1.19 -information.</dd>
1.20 +information.<br />
1.21 +
1.22 +An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
1.23 </dl>
1.24 </div>
1.25 +<p>To obtain the above path using the WebStack API, we can write the following code:</p>
1.26 +<pre>path = trans.get_path()</pre>
1.27 +<p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>,
1.28 +some guesswork is required, but if we have decided to use UTF-8 as the
1.29 +encoding of our output, it is reasonable to specify UTF-8 here as well:</p>
1.30 +<pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre>
1.31 +<p>In many applications such nuances are not particularly important, but consider the following URL:</p>
1.32 +<pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre>
1.33 +<p>Here, the URL includes non-ASCII characters which must be
1.34 +interpreted somehow. In this case, the "URL encoded" character values
1.35 +refer to ISO-8859-1 values and can be safely inspected as follows:</p>
1.36 +<pre>path = trans.get_path("iso-8859-1")</pre>
1.37 +<p>The above usage of UTF-8 will also work in this case, but only
1.38 +because WebStack will use ISO-8859-1 as a "safe" default for character
1.39 +values it does not understand.</p>
1.40 <h2>Query Strings</h2>
1.41 +
1.42 <p>Sometimes, a "query string" will be provided as part of a URL; for
1.43 example:</p>
1.44 <pre>http://www.boddie.org.uk/application?param1=value1</pre>
1.45 @@ -70,6 +87,40 @@
1.46 inspection
1.47 is discussed in <a href="parameters.html">"Request Parameters and
1.48 Uploads"</a>.</p>
1.49 +<div class="WebStack">
1.50 +<h3>WebStack API - Getting Query Strings</h3>
1.51 +<p>WebStack provides a method to get only the query string from the URL:</p>
1.52 +<dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter
1.53 +information. Such information will be "URL encoded", meaning that
1.54 +certain characters will have the form <code>%xx</code> where <code>xx</code>
1.55 +is a two digit hexadecimal number referring to the byte value of the
1.56 +unencoded character - see below for discussion of this. </dd></dl>
1.57 +</div>
1.58 +<p>Note that unlike the path access methods, <code>get_query_string</code>
1.59 +does not accept an encoding as a parameter. Moreover, when retrieving a
1.60 +path including a query string, the encoding is not used to interpret
1.61 +"URL encoded" character values in the query string itself. Consider
1.62 +this example URL:</p>
1.63 +<pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre>
1.64 +<p>Upon requesting the path and the query string, certain differences should be noticeable:</p>
1.65 +<pre>trans.get_path("iso-8859-1") # returns /application-æ?var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-æ<br />trans.get_query_string() # returns var%F8=value%E5</pre>
1.66 +<p>One reason for this seemingly arbitrary distinction in treatment is
1.67 +the way certain servers present path information to WebStack - often
1.68 +the "URL encoded" information has been replaced by raw character values
1.69 +which must then be converted to Unicode characters. In contrast, most
1.70 +servers do not perform the same automatic conversion on the query
1.71 +string.</p>
1.72 +<p>In fact, it may become impossible to properly interpret the query
1.73 +string if it is decoded prematurely; consider this example URL:</p>
1.74 +<pre>http://www.boddie.org.uk/application?a=%26b</pre>
1.75 +<p>If we were to just decode the query string and then extract the
1.76 +parameters/fields, the result would be two empty parameters with the
1.77 +names <code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter <code>a</code> with the value <code>&b</code>.</p>
1.78 +<h3>Conclusion</h3>
1.79 +<p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see <a href="parameters.html">"Request Parameters and
1.80 +Uploads"</a>),
1.81 +and direct access to the query string should only occur in situations
1.82 +of a specialised nature such as the building of URLs for output.</p>
1.83 <h2>More About Paths</h2>
1.84 <ul>
1.85 <li><a href="path-info.html">Paths To and Within Applications</a></li>