<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PHP vs .Net &#187; XML</title>
	<atom:link href="http://www.phpvs.net/category/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.phpvs.net</link>
	<description>ASP.Net and PHP go head to head</description>
	<lastBuildDate>Sat, 24 Dec 2011 18:20:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>HTML manipulation with System.Xml.XmlDocument</title>
		<link>http://www.phpvs.net/2008/02/17/html-manipulation-with-systemxmlxmldocument/</link>
		<comments>http://www.phpvs.net/2008/02/17/html-manipulation-with-systemxmlxmldocument/#comments</comments>
		<pubDate>Mon, 18 Feb 2008 06:13:10 +0000</pubDate>
		<dc:creator>morgan</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.phpvs.net/2008/02/17/html-manipulation-with-systemxmlxmldocument/</guid>
		<description><![CDATA[HTML Table of Contents Generator Example Sometimes it's easy to forget that HTML is just one type of XML, and hence you can utilize the System.Xml library for fun and profit with your HTML. System.Xml is full of powerful tools to manipulate well-formed documents, and you really don't need to know much about XML to [...]]]></description>
			<content:encoded><![CDATA[<h2>HTML Table of Contents Generator Example</h2>
<p>Sometimes it's easy to forget that HTML is just one type of XML, and hence you can utilize the <code>System.Xml</code> library for fun and profit with your HTML.  <code>System.Xml</code> is full of powerful tools to manipulate well-formed documents, and you really don't need to know much about XML to leverage it.  With two simple lines of code you can have a document loaded into a data structure that has powerful manipulation methods that allow you to do complex tasks. Such as generating a table of contents, for example.</p>
<p>Blake phoned me last night very frustrated after having spent a couple hours scouring the 'tubes for some kind of tool that would take his marked-up html document and generate a table of contents from the heading tags in it.  He started asking my advice about a C# program he had downloaded. It included three forms and over 1000 lines of code, and purported to do what he needed.  Except it didn't... it just kept crashing, and couldn't handle certain nestings of tags, etc. etc.  One look at the code made it pretty clear why... some kind of home-brewed tree structure peppered with variables like "treeUp, treeDown, treeRight, itemBegin, itemEnd".... bleeargh.  <code>XmlDocument </code>to the rescue!</p>
<p>In 45 minutes I had a program whipped up into a console app that did exactly what he needed, and it was essentially only 60 lines of code (plus some jazz for error handling/argument passing).   Let's take a look:</p>
<div class="igBar"><span id="lcsharp-4"><a href="#" onclick="javascript:showPlainTxt('csharp-4'); return false;">&gt;&gt; show as plain text</a></span></div>
<div class="syntax_hilite"><span class="langName">C#:</span>
<div id="csharp-4">
<div>
<ol>
<li>
<div><span style="color: #0600FF;">private</span> <span style="color: #0600FF;">void</span> GenerateTOC<span style="color: #000000;">&#40;</span>XmlNodeList nodelist, StringBuilder sb<span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div><span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; <span style="color: #0600FF;">foreach</span> <span style="color: #000000;">&#40;</span>XmlNode node <span style="color: #0600FF;">in</span> nodelist<span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>Regex.<span style="color: #0000FF;">IsMatch</span><span style="color: #000000;">&#40;</span>node.<span style="color: #0000FF;">Name</span>, <span style="color: #808080;">"h[1-7]"</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//We've found an &quot;h&quot; tag.&nbsp; Update our TOC stringbuilder,</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//and our original XMLDocument to add anchor tags.</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span><span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">isVerbose</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span> Console.<span style="color: #0000FF;">WriteLine</span><span style="color: #000000;">&#40;</span><span style="color: #808080;">"Found "</span> + node.<span style="color: #0000FF;">Name</span><span style="color: #000000;">&#41;</span>; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #FF0000;">String</span> tabs = <span style="color: #808080;">""</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #FF0000;">int</span> hLevel = <span style="color: #FF0000;">int</span>.<span style="color: #0000FF;">Parse</span><span style="color: #000000;">&#40;</span>node.<span style="color: #0000FF;">Name</span>.<span style="color: #0000FF;">Substring</span><span style="color: #000000;">&#40;</span><span style="color: #FF0000;color:#800000;">1</span>, <span style="color: #FF0000;color:#800000;">1</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>hLevel != <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">lastHLevel</span><span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>hLevel &lt;this.<span style="color: #0000FF;">lastHLevel</span><span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Retreat to a less indented block level</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">for</span> <span style="color: #000000;">&#40;</span><span style="color: #FF0000;">int</span> i = <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">lastHLevel</span> - <span style="color: #FF0000;color:#800000;">1</span>; i&gt; hLevel - <span style="color: #FF0000;color:#800000;">1</span>; i--<span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tabs = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> <span style="color: #FF0000;">String</span><span style="color: #000000;">&#40;</span><span style="color: #808080;">'<span style="color: #008080; font-weight: bold;">\t</span>'</span>, i<span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sb.<span style="color: #0000FF;">Append</span><span style="color: #000000;">&#40;</span>tabs + <span style="color: #808080;">"&lt;/ul&gt;<span style="color: #008080; font-weight: bold;">\n</span>"</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">else</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Indent some more - Add the level difference in indents</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">for</span> <span style="color: #000000;">&#40;</span><span style="color: #FF0000;">int</span> i = <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">lastHLevel</span>; i &lt;hLevel; i++<span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tabs = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> <span style="color: #FF0000;">String</span><span style="color: #000000;">&#40;</span><span style="color: #808080;">'<span style="color: #008080; font-weight: bold;">\t</span>'</span>, i<span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sb.<span style="color: #0000FF;">Append</span><span style="color: #000000;">&#40;</span>tabs + <span style="color: #808080;">"&lt;/ul&gt;<span style="color: #008080; font-weight: bold;">\n</span>"</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Set lastHLevel to the current HLevel</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">lastHLevel</span> = hLevel;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Generate the TOC entry for this node, with a link to it's anchor.</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tabs = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> <span style="color: #FF0000;">String</span><span style="color: #000000;">&#40;</span><span style="color: #808080;">'<span style="color: #008080; font-weight: bold;">\t</span>'</span>, <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">lastHLevel</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sb.<span style="color: #0000FF;">Append</span><span style="color: #000000;">&#40;</span>tabs + <span style="color: #808080;">"&lt;li&gt;&lt;a href=<span style="color: #008080; font-weight: bold;">\"</span>#toc"</span> + <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">tocCount</span> + <span style="color: #808080;">"<span style="color: #008080; font-weight: bold;">\"</span>&gt;"</span> + node.<span style="color: #0000FF;">InnerXml</span> + <span style="color: #808080;">"&lt;/a&gt;&lt;/li&gt;<span style="color: #008080; font-weight: bold;">\n</span>"</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Add an anchor tag to the node in the original document</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node.<span style="color: #0000FF;">InnerXml</span> = <span style="color: #808080;">"&lt;a name=<span style="color: #008080; font-weight: bold;">\"</span>toc"</span> + <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">tocCount</span>.<span style="color: #0000FF;">ToString</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span> + <span style="color: #808080;">"<span style="color: #008080; font-weight: bold;">\"</span>&gt;"</span> + node.<span style="color: #0000FF;">InnerXml</span> + <span style="color: #808080;">"&lt;/a&gt;"</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">tocCount</span>++;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Now recurse over child nodes</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>node.<span style="color: #0000FF;">ChildNodes</span>.<span style="color: #0000FF;">Count</span>&gt; <span style="color: #FF0000;color:#800000;">0</span><span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; GenerateTOC<span style="color: #000000;">&#40;</span>node.<span style="color: #0000FF;">ChildNodes</span>, sb<span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008080; font-style: italic;">//Finish whatever &lt;ul&gt; level we have open if we're the last child of the root.</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>node.<span style="color: #0000FF;">NextSibling</span> == <span style="color: #0600FF;">null</span> &amp;&amp; node.<span style="color: #0000FF;">ParentNode</span>.<span style="color: #0000FF;">ParentNode</span> == <span style="color: #0600FF;">null</span><span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">for</span> <span style="color: #000000;">&#40;</span><span style="color: #FF0000;">int</span> i = <span style="color: #FF0000;color:#800000;">0</span>; i &lt;this.<span style="color: #0000FF;">lastHLevel</span>; i++<span style="color: #000000;">&#41;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #FF0000;">String</span> tabs = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> <span style="color: #FF0000;">String</span><span style="color: #000000;">&#40;</span><span style="color: #808080;">'<span style="color: #008080; font-weight: bold;">\t</span>'</span>, <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">lastHLevel</span> - i - <span style="color: #FF0000;color:#800000;">1</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sb.<span style="color: #0000FF;">Append</span><span style="color: #000000;">&#40;</span>tabs + <span style="color: #808080;">"&lt;/ul&gt;<span style="color: #008080; font-weight: bold;">\n</span>"</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div>&nbsp; &nbsp; <span style="color: #000000;">&#125;</span></div>
</li>
<li>
<div><span style="color: #000000;">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
So in line 3, we start looping over every node (i.e. html element) in the document.  Line 5 checks to see if the current node is a header tag with a simple Regular Expression.  Lines 11-34 control the indent level of the TOC's html output - we use one &lt;ul&gt; level for each header level.  (So an h5 tag is nested in 5 &lt;ul&gt; tags.)  Line 37-38 adds some html output for the TOC for the current node, namely we create a TOC list item.  Finally, lines 41 and 42 modify the original <code>XmlDocument </code>object by adding an anchor tag to the html of the current node.  Then we recursively call the function again with the current node's children.  The last bit of code polishes off our TOC output at the very end of our recursion. </p>
<p>(At this point, real purists might interject with the fact that 10 lines of code and an XSLT stylesheet could do the same thing; I'd agree, except in practice I find that executing simple loop-driven tasks with XSLT to be quite cumbersome, and I doubt I could do anything with XSLT in 45 minutes.)</p>
<p>So to use the function above, simply harness the raw power of the <code>System.Xml.XmlDocument </code> object, like so:</p>
<div class="igBar"><span id="lcsharp-5"><a href="#" onclick="javascript:showPlainTxt('csharp-5'); return false;">&gt;&gt; show as plain text</a></span></div>
<div class="syntax_hilite"><span class="langName">C#:</span>
<div id="csharp-5">
<div>
<ol>
<li>
<div>XmlDocument htmldoc = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> XmlDocument<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>htmldoc.<span style="color: #0000FF;">PreserveWhitespace</span> = <span style="color: #0600FF;">true</span>;</div>
</li>
<li>
<div>htmldoc.<span style="color: #0000FF;">Load</span><span style="color: #000000;">&#40;</span><span style="color: #808080;">"myfile.html"</span><span style="color: #000000;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Assuming your HTML is well-formed, you can now pass <code>htmldoc.ChildNodes</code> and a <code>StringBuilder </code>into the recursive function above, and your <code>StringBuilder </code>will come back full of HTML table of contents goodness.  Additionally, your <code>XmlDocument </code>variable will have the corresponding anchors added to the header tags.  Just simply output your <code>StringBuilder </code>and <code>XmlDocument </code>to a file, and voila!  Instant HTML table of contents!  (Might look something like below:)</p>
<div class="igBar"><span id="lcsharp-6"><a href="#" onclick="javascript:showPlainTxt('csharp-6'); return false;">&gt;&gt; show as plain text</a></span></div>
<div class="syntax_hilite"><span class="langName">C#:</span>
<div id="csharp-6">
<div>
<ol>
<li>
<div>StringBuilder sb = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> StringBuilder<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div><span style="color: #008080; font-style: italic;">//Assume that the root node is not an &lt;h&gt; tag and build our TOC from the children.</span></div>
</li>
<li>
<div>thisApp.<span style="color: #0000FF;">GenerateTOC</span><span style="color: #000000;">&#40;</span>htmldoc.<span style="color: #0000FF;">ChildNodes</span>, sb<span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div><span style="color: #008080; font-style: italic;">//Output TOC</span></div>
</li>
<li>
<div>FileStream fs = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> FileStream<span style="color: #000000;">&#40;</span><span style="color: #808080;">"TOC.html"</span>, FileMode.<span style="color: #0000FF;">Create</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>StreamWriter sw = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> StreamWriter<span style="color: #000000;">&#40;</span>fs<span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>sw.<span style="color: #0000FF;">Write</span><span style="color: #000000;">&#40;</span>sb.<span style="color: #0000FF;">ToString</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>sw.<span style="color: #0000FF;">Close</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div><span style="color: #008080; font-style: italic;">//Output original document with new &lt;a&gt; tags</span></div>
</li>
<li>
<div>XmlWriter xw = <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> XmlTextWriter<span style="color: #000000;">&#40;</span><span style="color: #808080;">"OriginalWithAnchors.html"</span>, Encoding.<span style="color: #0000FF;">UTF8</span><span style="color: #000000;">&#41;</span>;</div>
</li>
<li>
<div>htmldoc.<span style="color: #0000FF;">WriteTo</span><span style="color: #000000;">&#40;</span>xw<span style="color: #000000;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>All that in less than 80 lines of code, 45 minutes, and no XSD's, XSLT, or really, any XML at all.  XmlDocument.Load() is simply one of the greatest functions in the .Net framework.  Instant document object with an implicit tree structure.</p>
<p>Download the code here:  <a href='http://www.phpvs.net/wp-content/uploads/2008/02/htmltoc.zip' title='HTML Table of Contents Generator'>HTML Table of Contents Generator</a>.  It includes a binary .exe file in the "bin\Release" directory, so you don't need Visual Studio if you just want to run the above program <img src='http://www.phpvs.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   Simply call <code>htmltoc.exe infile.html</code>, and you'll have TOC.html and OriginalWithAnchors.html outputted.  TOC.html contains your nicely formatted table of contents, with links to all the anchors in OriginalWithAnchors.html.</p>
<p>
<a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2fwww.phpvs.net%2f2008%2f02%2f17%2fhtml-manipulation-with-systemxmlxmldocument%2f"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2fwww.phpvs.net%2f2008%2f02%2f17%2fhtml-manipulation-with-systemxmlxmldocument%2f&#038;bgcolor=FF9933&#038;cbgcolor=D4E1FD" border="0" alt="kick it on DotNetKicks.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpvs.net/2008/02/17/html-manipulation-with-systemxmlxmldocument/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

