项目一:万能的XML

最近感觉自己的工作中可能用到python,这语言实在是太强大了所以再次开始python的学习

通过联系小项目的形式练练手,下面是第一个项目,万能的XML

要使用python解析xml文件,我们需要用到SAX语法分分析器,这样我们就可以把xml解析的工作交给分析器取工作了。

本项目的主要工作:

  • 整个网站用一个XML文件描述,其中包括独立的网页和文件目录的描述
  • 程序能根据XML信息创建对应的html和目录文件
  • 能轻松地改变整个网站的设计,并以新的设计为基础重新生成新的网页和目录结构

自定一的xml文件如下:

<website>
	<page name="index" title="Home Page">
		<h1>Welcome to my Home Page</h1>

		<p>Hi. there. My name is Zeus. And this is my home page .Here are some of my interests:</p>

		<ul>
			<li><a href="interests/shouting.html">Shouting</a></li>
			<li><a href="interests/sleeping.html">Sleeping</a></li>
			<li><a href="interests/eating.html">Eating</a></li>
		</ul>
	</page>
	<directory name="interests">
		<page name="shouting" title="shouting">
			<h1>Mr. Zeus's Shouting Page</h1>

			<p>...</p>
		</page>
		<page name="sleeping" title="sleeping">
			<h1>Mr. Hello's Sleeping Page</h1>

			<p>...</p>
		</page>
		<page name="eating" title="eating">
			<h1>Mr. World's Eating Page</h1>

			<p>...</p>
		</page>
	</directory>
</website>

对应的python解析xml代码如下:

#coding=utf-8
#!/usr/bin/python
from xml.sax.handler import ContentHandler
from xml.sax import parse
import os

class Dispatcher:
	def dispatch(self, prefix, name, attrs=None):
		mname = prefix + name.capitalize()
		dname = 'default' + prefix.capitalize()
		method = getattr(self, mname, None)
		if callable(method): args = ()
		else:
			method = getattr(self, dname, None)
			args = name,
		if prefix == 'start': args += attrs,
		if callable(method): method(*args)
	
	def startElement(self, name, attrs):
		self.dispatch('start', name, attrs)

	def endElement(self, name):
		self.dispatch('end', name)

class WebsiteConstructor(Dispatcher, ContentHandler):
	passthrough = False

	def __init__(self, directory):
		self.directory = [directory]
		self.ensureDirectory()

	def ensureDirectory(self):
		path = os.path.join(*self.directory)
		if not os.path.isdir(path): os.makedirs(path)

	def characters(self, chars):
		if self.passthrough: self.out.write(chars)
	
	def defaultStart(self, name, attrs):
		if self.passthrough:
			self.out.write('<' + name)
			for key, val in attrs.items():
				self.out.write(' %s="%s"' % (key, val))
			self.out.write('>')
	
	def defaultEnd(self, name):
		if self.passthrough:
			self.out.write('</%s>' % name)
	
	def startDirectory(self, attrs):
		self.directory.append(attrs['name'])
		self.ensureDirectory()
	
	def endDirectory(self):
		self.directory.pop()
	
	def startPage(self, attrs):
		filename = os.path.join(*self.directory+[attrs['name']+'.html'])
		self.out = open(filename, 'w')
		self.writeHeader(attrs['title'])
		self.passthrough = True

	def endPage(self):
		self.passthrough = False
		self.writeFooter()
		self.out.close()

	def writeHeader(self, title):
		self.out.write('<html>\n<head>\n	<title>')
		self.out.write(title)
		self.out.write('</title>\n </head>\n	<body>\n')

	def writeFooter(self):
		self.out.write('\n </body>\n</html>\n')

parse('website.xml', WebsiteConstructor('public_html'))

猜你喜欢

转载自zjuerlemon.iteye.com/blog/2317869