An Introduction to Structured Text
http://www.zope.org/Documentation/Articles/STX
By Paul Everitt
Engineers spend a lot of time communicating, primarily by email but also in
documentation. However, writing by engineers is complicated by a simple
fact: the world consumes writing largely in presentation formats such as
HTML and PDF.
In theory this should be no problem, as we would all march happily off and
write in "DocBook" (or perhaps LaTeX), the supposed lingua franca of
documentation. However most tools don't support DocBook (or LaTex) very
well, and even if tools were mature, most engineers would reject them.
Why? Engineers spend most of their time communicating in plain text. Their
tools (vi and Emacs) are oriented toward text. The vast majority of words
they communicate are in email. Finally, what little documentation you can
squeeze out of engineers is in the form of "docstrings" in source code.
Wouldn't it be nice if there was a non-tag, text-oriented system for
engineers to express semantic meaning? This is the problem Structured Text
tackles. With Structured Text, format-independent writing becomes extremely
convenient and natural, once a few rules are learned. Furthermore,
Structured Text can be extended to cover advanced and customed uses.
To get a quick idea of what Structured Text does, the following words in
Structured Text:
Sometimes the *best* approach to complexity is simplicty. A good
structured text system is:
o Convenient
o Rich
is rendered into the following HTML:
<p>
Sometimes the <em>best</em> approach to complexity is
simplicity. A good structured text system is:
</p>
<ul>
<li><p>Convenient</p></li>
<li><p>Rich</p></li>
</ul>
and the following DocBook XML:
<para>
Sometimes the <emphasis>best</emphasis> approach to
complexity is simplicty. A good structured text system is:
</para>
<itemizedlist>
<listitem><para>Convenient</para></listitem>
<listitem><para>Rich</para></listitem>
</itemizedlist>
In fact, the text of this article is written in Structured Text. In this
article, we'll look at the basics of Structured Text, organizing large text
into sections, advanced formatting, and metadata issues.
Structured Text Basics
Let's plunge into structured text and look at the basics by correlating it
to ideas in HTML.
The most basic idea in Structured Text is a paragraph. The following snippet
of:
This is the first paragraph.
This is the second paragraph.
....is converted to the following in HTML:
<p>This is the first paragraph.</p>
<p>This is the second paragraph.</p>
That is, white space matters in Structured Text. This is a very intuitive
idea. For instance, in email paragraphs are separated by white space.
To introduce emphasis, Structured Text uses another text convention:
asterisks. Note the following snippet:
This is the *first* paragraph.
This is the **second** paragraph.
In HTML, this snippet introduces the em tag and the strong tag:
<p>This is the <em>first</em> paragraph.</p>
<p>This is the <strong>second</strong> paragraph.</p>
Again, this is a common pattern in email. Several other common patterns are
supported, such as referring to a piece of jargon:
When you see 'STX', you know this is shorthand for 'Structured
Text'.
The HTML output is as follows:
<p>When you see <code>STX</code>, you know this is shorthand for
<code>Structured Text</code>.</p>
Using Indentation
The preceding section focused on text conventions that convey a semantic
meaning. This semantic meaning, when processed by Structured Text, produces
certain HTML tags.
In Structured Text, indentation is also very important in conveying semantic
meaning. The most basic is the idea from HTML of headings.
In the following snippet, indentation is used to convey an outline-like
structure:
Using Indentation
The preceding section focused on text conventions that convey a
semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.
This produces the following HTML:
<h1>Using Indentation</h1>
<p>The preceding section focused on text conventions that convey
a semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.</p>
That is, the indentation conveyed a semantic meaning. The paragraph was
subordinate to the heading, and the relationship is thus expressed in HTML.
In fact, outline relationship can be continued:
Using Indentation
The preceding section focused on text conventions that convey a
semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.
Basics of Indentation
In this section we will investigate the basics of
indentation...
Hyperlinks
This produces the following HTML:
<h1>Using Indentation</h1>
<p>The preceding section focused on text conventions that convey
a semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.</p>
<h2>Basics of Indentation</h2>
<p>In this section we will investigate the basics of
indentation...</p>
<h2>Hyperlinks</h2>
Lists and Items
Lists are also supported in Structured Text, including unordered, ordered,
and descriptive lists. The convention unordered lists is a common pattern in
text-based communication:
HTML has three kinds of lists:
o Unordered lists
o Ordered lists
o Descriptive lists
Structured Text allows you to use the symbols *, o, and - to connote list
items. The above example produces this HTML:
<p>HTML has three kinds of lists:</p>
<ul>
<li><p>Unordered lists</p></li>
<li><p>Ordered lists</p></li>
<li><p>Descriptive lists</p></li>
</ul>
The Structured Text conventions for ordered lists is shown below:
HTML has three kinds of lists:
1. Unordered lists
2. Ordered lists
3. Descriptive lists
This produces:
<p>HTML has three kinds of lists:</p>
<ol>
<li><p>Unordered lists</p></li>
<li><p>Ordered lists</p></li>
<li><p>Descriptive lists</p></li>
</ol>
Descriptive lists are also easily accommodated using double dashes:
Unordered Lists -- Generally inclues a series of bullets when
viewed in HTML.
Ordered Lists -- HTML viewers convert the list items into a
numbered series.
Descriptive Lists -- Usually used for definitional lists such as
glossaries.
This becomes the following HTML:
<dl><dt>Unordered Lists</dt><dd><p>Generally inclues a series of
bullets when viewed in HTML.</p>
</dd>
<dt> Ordered Lists</dt><dd><p>HTML viewers convert the list
items into a numbered series.</p>
</dd>
<dt> Descriptive Lists</dt><dd><p>Usually used for definitional
lists such as glossaries.</p>
</dd>
</dl>
Example Code
As mentioned above, Structured Text authors can use an easy convention to
get the monotype semantics of the CODE tag from HTML. For instance:
When you see the dialg box, hit the 'Ok' button.
....is rendered into the following HTML:
<p>When you see the dialg box, hit the <code>Ok</code> button.</p>
However, sometimes you want long passages of code. For instance, what if you
wanted to document a Python function in the middle of an article discussing
Python? You can indicate a code block by ending a paragraph with ::, and
indenting the following paragraph(s). For instance, this Structured Text
snippet:
In our next Python example, we convert human years to dog years::
def dog_years(age):
"""Convert an age to dog years"""
return age*7
....would be converted to the following HTML:
<p>In our next Python example, we convert human years to dog
years:</p>
<pre>
def dog_years(age):
"""Convert an age to dog years"""
return age*7
</pre>
The convention of combining :: at the end of a paragraph-ending sentence and
indenting a block does more than apply CODE semantics. It also escapes the
indented block. That is how the Structured Text and HTML snippets in this
article are left alone, rather than being rendered.
For example, the less than, greater than, and ampersand symbols in this code
block are escaped:
Here's an HTML example::
<html>
<p>This is a page about dogs & cats.</p>
</html>
....to produce this HTML:
<p>Here's an HTML example:</p>
<pre>
<html>
<p>This is a page about dogs & cats.</p>
</html>
</pre>
Hyperlinks
In the previous sections we focused on ways to get certain presentation
semantics in HTML by using common text conventions.
But the web isn't just HTML. Linking words and phrases to other information
and including images are equally important. Fortunately Structured Text
supports conventions for hyperlinks and image tags.
Let's start with a simple hyperlink. If we have a Structured Text paragraph
discussing Python:
For more information on Python, please visit the "Python
website" :http://www.python.org/.
This becomes:
<p>For more information on Python, please visit the <a
href="http://www.python.org/">Python website</a>.
The convention is fairly simple:
The text of the reference is enclosed in quotes.
The second quotation mark is followed by a colon and a URL.
The URL can be followed by punctuation.
This basic convention has a number of variations. For instance, relative
URLs are possible, as are mailto URLs.
(Note: in the above example, there should not be a space between the last
quote and the colon. This is due to a bug in the version of structured text
currently running on Zope.org. This bug has been fixed in more recent
versions of Zope.)
Advanced Usage
There are more obscure extensions to Structured Text to handle cross
references, tables, images, and more.
One of the great things about structured text is that if you don't like its
rules it's fairly easy to extend. This is made possible by the recent
rewriting of Structured Text sometimes referred to as "Structured Text NG".
For example, you could create a LaTeX outputter, or you could change
structured text to recognize a different syntax for hyperlinks.
Structured Text is available in Zope (and is integrated into the Zope
Content Management Framework but you can also use it outside of Zope. To use
Structured Text in Zope just create a document or file containing structured
text, then call it like so:
<dtml-var my_document fmt=structured-text>
This will give you the HTML representation of my_document.
The Zope Book is an example of a Project that uses Structured Text outside
of Zope. The book was written in Structured Text with some modifications to
support figure handling, and the publisher's in-house markup format. Python
scripts parse the input and create output in HTML and PDF.
Structured Text use is also used in Python doc strings. A number of Python
documentation extraction tools support Structured Text. Currently work is
under way on the Python doc-sig to develop docstring conventions, and a
docstring processing system.
Conclusion
Structured Text gives you an easy way to express yourself in plain text. The
Structured Text implementation allows you to tailor the syntax and output.
Structured Text is integrated into Zope and is also usable outside Zope.
Resources
Structured Text Wiki - discusses structured text and STXNG.
reStructuredText - A Structured Text alternative being developed as a Python
docstring standard.
By Paul Everitt
Engineers spend a lot of time communicating, primarily by email but also in
documentation. However, writing by engineers is complicated by a simple
fact: the world consumes writing largely in presentation formats such as
HTML and PDF.
In theory this should be no problem, as we would all march happily off and
write in "DocBook" (or perhaps LaTeX), the supposed lingua franca of
documentation. However most tools don't support DocBook (or LaTex) very
well, and even if tools were mature, most engineers would reject them.
Why? Engineers spend most of their time communicating in plain text. Their
tools (vi and Emacs) are oriented toward text. The vast majority of words
they communicate are in email. Finally, what little documentation you can
squeeze out of engineers is in the form of "docstrings" in source code.
Wouldn't it be nice if there was a non-tag, text-oriented system for
engineers to express semantic meaning? This is the problem Structured Text
tackles. With Structured Text, format-independent writing becomes extremely
convenient and natural, once a few rules are learned. Furthermore,
Structured Text can be extended to cover advanced and customed uses.
To get a quick idea of what Structured Text does, the following words in
Structured Text:
Sometimes the *best* approach to complexity is simplicty. A good
structured text system is:
o Convenient
o Rich
is rendered into the following HTML:
<p>
Sometimes the <em>best</em> approach to complexity is
simplicity. A good structured text system is:
</p>
<ul>
<li><p>Convenient</p></li>
<li><p>Rich</p></li>
</ul>
and the following DocBook XML:
<para>
Sometimes the <emphasis>best</emphasis> approach to
complexity is simplicty. A good structured text system is:
</para>
<itemizedlist>
<listitem><para>Convenient</para></listitem>
<listitem><para>Rich</para></listitem>
</itemizedlist>
In fact, the text of this article is written in Structured Text. In this
article, we'll look at the basics of Structured Text, organizing large text
into sections, advanced formatting, and metadata issues.
Structured Text Basics
Let's plunge into structured text and look at the basics by correlating it
to ideas in HTML.
The most basic idea in Structured Text is a paragraph. The following snippet
of:
This is the first paragraph.
This is the second paragraph.
....is converted to the following in HTML:
<p>This is the first paragraph.</p>
<p>This is the second paragraph.</p>
That is, white space matters in Structured Text. This is a very intuitive
idea. For instance, in email paragraphs are separated by white space.
To introduce emphasis, Structured Text uses another text convention:
asterisks. Note the following snippet:
This is the *first* paragraph.
This is the **second** paragraph.
In HTML, this snippet introduces the em tag and the strong tag:
<p>This is the <em>first</em> paragraph.</p>
<p>This is the <strong>second</strong> paragraph.</p>
Again, this is a common pattern in email. Several other common patterns are
supported, such as referring to a piece of jargon:
When you see 'STX', you know this is shorthand for 'Structured
Text'.
The HTML output is as follows:
<p>When you see <code>STX</code>, you know this is shorthand for
<code>Structured Text</code>.</p>
Using Indentation
The preceding section focused on text conventions that convey a semantic
meaning. This semantic meaning, when processed by Structured Text, produces
certain HTML tags.
In Structured Text, indentation is also very important in conveying semantic
meaning. The most basic is the idea from HTML of headings.
In the following snippet, indentation is used to convey an outline-like
structure:
Using Indentation
The preceding section focused on text conventions that convey a
semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.
This produces the following HTML:
<h1>Using Indentation</h1>
<p>The preceding section focused on text conventions that convey
a semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.</p>
That is, the indentation conveyed a semantic meaning. The paragraph was
subordinate to the heading, and the relationship is thus expressed in HTML.
In fact, outline relationship can be continued:
Using Indentation
The preceding section focused on text conventions that convey a
semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.
Basics of Indentation
In this section we will investigate the basics of
indentation...
Hyperlinks
This produces the following HTML:
<h1>Using Indentation</h1>
<p>The preceding section focused on text conventions that convey
a semantic meaning. This semantic meaning, when processed by
Structured Text, produces certain HTML tags.</p>
<h2>Basics of Indentation</h2>
<p>In this section we will investigate the basics of
indentation...</p>
<h2>Hyperlinks</h2>
Lists and Items
Lists are also supported in Structured Text, including unordered, ordered,
and descriptive lists. The convention unordered lists is a common pattern in
text-based communication:
HTML has three kinds of lists:
o Unordered lists
o Ordered lists
o Descriptive lists
Structured Text allows you to use the symbols *, o, and - to connote list
items. The above example produces this HTML:
<p>HTML has three kinds of lists:</p>
<ul>
<li><p>Unordered lists</p></li>
<li><p>Ordered lists</p></li>
<li><p>Descriptive lists</p></li>
</ul>
The Structured Text conventions for ordered lists is shown below:
HTML has three kinds of lists:
1. Unordered lists
2. Ordered lists
3. Descriptive lists
This produces:
<p>HTML has three kinds of lists:</p>
<ol>
<li><p>Unordered lists</p></li>
<li><p>Ordered lists</p></li>
<li><p>Descriptive lists</p></li>
</ol>
Descriptive lists are also easily accommodated using double dashes:
Unordered Lists -- Generally inclues a series of bullets when
viewed in HTML.
Ordered Lists -- HTML viewers convert the list items into a
numbered series.
Descriptive Lists -- Usually used for definitional lists such as
glossaries.
This becomes the following HTML:
<dl><dt>Unordered Lists</dt><dd><p>Generally inclues a series of
bullets when viewed in HTML.</p>
</dd>
<dt> Ordered Lists</dt><dd><p>HTML viewers convert the list
items into a numbered series.</p>
</dd>
<dt> Descriptive Lists</dt><dd><p>Usually used for definitional
lists such as glossaries.</p>
</dd>
</dl>
Example Code
As mentioned above, Structured Text authors can use an easy convention to
get the monotype semantics of the CODE tag from HTML. For instance:
When you see the dialg box, hit the 'Ok' button.
....is rendered into the following HTML:
<p>When you see the dialg box, hit the <code>Ok</code> button.</p>
However, sometimes you want long passages of code. For instance, what if you
wanted to document a Python function in the middle of an article discussing
Python? You can indicate a code block by ending a paragraph with ::, and
indenting the following paragraph(s). For instance, this Structured Text
snippet:
In our next Python example, we convert human years to dog years::
def dog_years(age):
"""Convert an age to dog years"""
return age*7
....would be converted to the following HTML:
<p>In our next Python example, we convert human years to dog
years:</p>
<pre>
def dog_years(age):
"""Convert an age to dog years"""
return age*7
</pre>
The convention of combining :: at the end of a paragraph-ending sentence and
indenting a block does more than apply CODE semantics. It also escapes the
indented block. That is how the Structured Text and HTML snippets in this
article are left alone, rather than being rendered.
For example, the less than, greater than, and ampersand symbols in this code
block are escaped:
Here's an HTML example::
<html>
<p>This is a page about dogs & cats.</p>
</html>
....to produce this HTML:
<p>Here's an HTML example:</p>
<pre>
<html>
<p>This is a page about dogs & cats.</p>
</html>
</pre>
Hyperlinks
In the previous sections we focused on ways to get certain presentation
semantics in HTML by using common text conventions.
But the web isn't just HTML. Linking words and phrases to other information
and including images are equally important. Fortunately Structured Text
supports conventions for hyperlinks and image tags.
Let's start with a simple hyperlink. If we have a Structured Text paragraph
discussing Python:
For more information on Python, please visit the "Python
website" :http://www.python.org/.
This becomes:
<p>For more information on Python, please visit the <a
href="http://www.python.org/">Python website</a>.
The convention is fairly simple:
The text of the reference is enclosed in quotes.
The second quotation mark is followed by a colon and a URL.
The URL can be followed by punctuation.
This basic convention has a number of variations. For instance, relative
URLs are possible, as are mailto URLs.
(Note: in the above example, there should not be a space between the last
quote and the colon. This is due to a bug in the version of structured text
currently running on Zope.org. This bug has been fixed in more recent
versions of Zope.)
Advanced Usage
There are more obscure extensions to Structured Text to handle cross
references, tables, images, and more.
One of the great things about structured text is that if you don't like its
rules it's fairly easy to extend. This is made possible by the recent
rewriting of Structured Text sometimes referred to as "Structured Text NG".
For example, you could create a LaTeX outputter, or you could change
structured text to recognize a different syntax for hyperlinks.
Structured Text is available in Zope (and is integrated into the Zope
Content Management Framework but you can also use it outside of Zope. To use
Structured Text in Zope just create a document or file containing structured
text, then call it like so:
<dtml-var my_document fmt=structured-text>
This will give you the HTML representation of my_document.
The Zope Book is an example of a Project that uses Structured Text outside
of Zope. The book was written in Structured Text with some modifications to
support figure handling, and the publisher's in-house markup format. Python
scripts parse the input and create output in HTML and PDF.
Structured Text use is also used in Python doc strings. A number of Python
documentation extraction tools support Structured Text. Currently work is
under way on the Python doc-sig to develop docstring conventions, and a
docstring processing system.
Conclusion
Structured Text gives you an easy way to express yourself in plain text. The
Structured Text implementation allows you to tailor the syntax and output.
Structured Text is integrated into Zope and is also usable outside Zope.
Resources
Structured Text Wiki - discusses structured text and STXNG.
reStructuredText - A Structured Text alternative being developed as a Python
docstring standard.
Previous by date: Re: Fw: cvs
Next by date: wishlist
Previous by thread: Re: Fw: cvs Next by thread: wishlist
Previous by thread: Re: Fw: cvs Next by thread: wishlist