WebStack

Annotated docs/paths.html

660:e1211a95a44f
2007-09-27 paulb [project @ 2007-09-27 17:48:43 by paulb] Tidied up the javadoc strings.
paulb@654 1
<?xml version="1.0" encoding="iso-8859-1"?>
paulb@357 2
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
paulb@436 3
<html xmlns="http://www.w3.org/1999/xhtml"><head>
paulb@654 4
  <title>URLs and Paths</title>
paulb@436 5
  <link href="styles.css" rel="stylesheet" type="text/css" /></head>
paulb@327 6
<body>
paulb@327 7
<h1>URLs and Paths</h1>
paulb@357 8
<p>The URL at which your application shall appear is arguably the first
paulb@357 9
part
paulb@357 10
of the application's user interface that any user will see. Remember
paulb@357 11
that a user of your application does not have to be a real person; in
paulb@357 12
fact,
paulb@327 13
a user can be any of the following things:</p>
paulb@327 14
<ul>
paulb@327 15
  <li>A real person entering the URL into a browser's address bar.</li>
paulb@327 16
  <li>A real person linking to your application by writing the URL in a
paulb@357 17
separate Web page.</li>
paulb@357 18
  <li>A program which has the URL defined within it and which may
paulb@357 19
manipulate the URL to perform certain kinds of operations.</li>
paulb@327 20
</ul>
paulb@357 21
<p>Some application developers have a fairly rigid view of what kind of
paulb@357 22
information a URL should contain and how it should be structured. In
paulb@357 23
this guide, we shall look at a number of different approaches.</p>
paulb@327 24
<h2>Interpreting Path Information</h2>
paulb@357 25
<p>What the URL is supposed to do is to say where (on the Internet or
paulb@357 26
on an
paulb@357 27
intranet) your application resides and which resource or service is
paulb@357 28
being
paulb@327 29
accessed, and these look like this:</p>
paulb@327 30
<pre>http://www.boddie.org.uk/python/WebStack.html</pre>
paulb@357 31
<p>In an application the full URL, containing the address of the
paulb@357 32
machine on which it is running, is not always interesting. In the
paulb@357 33
WebStack API (and in other Web programming frameworks), we also talk
paulb@654 34
about "paths" - a path is just the part of the
paulb@357 35
URL which refers to the resource or service, ignoring the actual
paulb@357 36
Internet
paulb@357 37
address, and so the above example would have a path which looks like
paulb@357 38
this:</p>
paulb@327 39
<pre>/python/WebStack.html</pre>
paulb@327 40
<p>When writing a Web application, most of the time you just need to
paulb@357 41
concentrate on the path because the address doesn't usually tell you
paulb@357 42
anything
paulb@327 43
you don't already know. What you need to do is to interpret the path
paulb@357 44
specified in the request in order to work out which resource or service
paulb@357 45
the user is trying to access.</p>
paulb@327 46
<div class="WebStack">
paulb@327 47
<h3>WebStack API - Path Methods in Transaction Objects</h3>
paulb@357 48
<p>WebStack provides the following transaction methods for inspecting
paulb@357 49
path
paulb@327 50
information:</p>
paulb@327 51
<dl>
paulb@327 52
  <dt><code>get_path</code></dt>
paulb@357 53
  <dd>This gets the entire path of a resource including parameter
paulb@357 54
information (as described in <a href="parameters.html">"Request
paulb@436 55
Parameters and Uploads"</a>).<br />
paulb@654 56
An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
paulb@327 57
  <dt><code>get_path_without_query</code></dt>
paulb@357 58
  <dd>This gets the entire path of a resource but without any parameter
paulb@453 59
information.<br />
paulb@453 60
paulb@654 61
An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd><dt><code>get_path_without_info</code></dt><dd>This gets the entire path of a resource but without any parameter
paulb@507 62
information or any special "path info" (as described in <a href="path-info.html">"Paths To and Within Applications"</a>).
paulb@507 63
The result is more or less equivalent to the location where an
paulb@507 64
application has been "published" - ie. the location of an application
paulb@507 65
in a server environment.<br />
paulb@507 66
paulb@654 67
An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
paulb@327 68
</dl>
paulb@327 69
</div>
paulb@453 70
<p>To obtain the above path using the WebStack API, we can write the following code:</p>
paulb@453 71
<pre>path = trans.get_path()</pre>
paulb@453 72
<p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>,
paulb@453 73
some guesswork is required, but if we have decided to use UTF-8 as the
paulb@453 74
encoding of our output, it is reasonable to specify UTF-8 here as well:</p>
paulb@453 75
<pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre>
paulb@453 76
<p>In many applications such nuances are not particularly important, but consider the following URL:</p>
paulb@453 77
<pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre>
paulb@453 78
<p>Here, the URL includes non-ASCII characters which must be
paulb@453 79
interpreted somehow. In this case, the "URL encoded" character values
paulb@453 80
refer to ISO-8859-1 values and can be safely inspected as follows:</p>
paulb@453 81
<pre>path = trans.get_path("iso-8859-1")</pre>
paulb@453 82
<p>The above usage of UTF-8 will also work in this case, but only
paulb@453 83
because WebStack will use ISO-8859-1 as a "safe" default for character
paulb@453 84
values it does not understand.</p>
paulb@357 85
<h2>Query Strings</h2>
paulb@453 86
paulb@335 87
<p>Sometimes, a "query string" will be provided as part of a URL; for
paulb@335 88
example:</p>
paulb@335 89
<pre>http://www.boddie.org.uk/application?param1=value1</pre>
paulb@357 90
<p>The question mark character marks the beginning of the query string
paulb@357 91
which
paulb@357 92
contains encoded parameter information; such information and its
paulb@357 93
inspection
paulb@335 94
is discussed in <a href="parameters.html">"Request Parameters and
paulb@335 95
Uploads"</a>.</p>
paulb@453 96
<div class="WebStack">
paulb@453 97
<h3>WebStack API - Getting Query Strings</h3>
paulb@654 98
<p>WebStack provides a method to get only the query string from the URL:</p>
paulb@453 99
<dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter
paulb@453 100
information. Such information will be "URL encoded", meaning that
paulb@654 101
certain characters will have the form <code>%xx</code> where <code>xx</code>
paulb@453 102
is a two digit hexadecimal number referring to the byte value of the
paulb@453 103
unencoded character - see below for discussion of this. </dd></dl>
paulb@453 104
</div>
paulb@654 105
<p>Note that unlike the path access methods, <code>get_query_string</code>
paulb@453 106
does not accept an encoding as a parameter. Moreover, when retrieving a
paulb@453 107
path including a query string, the encoding is not used to interpret
paulb@453 108
"URL encoded" character values in the query string itself. Consider
paulb@453 109
this example URL:</p>
paulb@453 110
<pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre>
paulb@453 111
<p>Upon requesting the path and the query string, certain differences should be noticeable:</p>
paulb@654 112
<pre>trans.get_path("iso-8859-1")               # returns /application-??var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-?<br />trans.get_query_string()                   # returns var%F8=value%E5</pre>
paulb@453 113
<p>One reason for this seemingly arbitrary distinction in treatment is
paulb@453 114
the way certain servers present path information to WebStack - often
paulb@453 115
the "URL encoded" information has been replaced by raw character values
paulb@453 116
which must then be converted to Unicode characters. In contrast, most
paulb@453 117
servers do not perform the same automatic conversion on the query
paulb@453 118
string.</p>
paulb@453 119
<p>In fact, it may become impossible to properly interpret the query
paulb@453 120
string if it is decoded prematurely; consider this example URL:</p>
paulb@453 121
<pre>http://www.boddie.org.uk/application?a=%26b</pre>
paulb@453 122
<p>If we were to just decode the query string and then extract the
paulb@453 123
parameters/fields, the result would be two empty parameters with the
paulb@654 124
names <code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter <code>a</code> with the value <code>&amp;b</code>.</p>
paulb@621 125
<h3>Final Note</h3>
paulb@453 126
<p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see  <a href="parameters.html">"Request Parameters and
paulb@453 127
Uploads"</a>),
paulb@453 128
and direct access to the query string should only occur in situations
paulb@453 129
of a specialised nature such as the building of URLs for output.</p>
paulb@357 130
<h2>More About Paths</h2>
paulb@357 131
<ul>
paulb@357 132
  <li><a href="path-info.html">Paths To and Within Applications</a></li>
paulb@507 133
  <li><a href="path-design.html">Path Design and Interpretation</a></li><li><a href="path-value-encoding.html">Encoding and Decoding Path Values</a></li><li><a href="path-manipulation.html">Manipulating Paths</a></li>
paulb@357 134
  <li><a href="path-info-support.html">Path Info Support in Server
paulb@357 135
Environments</a></li>
paulb@357 136
</ul>
paulb@654 137
</body></html>