最近感觉自己的工作中可能用到python,这语言实在是太强大了所以再次开始python的学习
通过联系小项目的形式练练手,下面是第一个项目,万能的XML
要使用python解析xml文件,我们需要用到SAX语法分分析器,这样我们就可以把xml解析的工作交给分析器取工作了。
本项目的主要工作:
- 整个网站用一个XML文件描述,其中包括独立的网页和文件目录的描述
- 程序能根据XML信息创建对应的html和目录文件
- 能轻松地改变整个网站的设计,并以新的设计为基础重新生成新的网页和目录结构
自定一的xml文件如下:
<website> <page name="index" title="Home Page"> <h1>Welcome to my Home Page</h1> <p>Hi. there. My name is Zeus. And this is my home page .Here are some of my interests:</p> <ul> <li><a href="interests/shouting.html">Shouting</a></li> <li><a href="interests/sleeping.html">Sleeping</a></li> <li><a href="interests/eating.html">Eating</a></li> </ul> </page> <directory name="interests"> <page name="shouting" title="shouting"> <h1>Mr. Zeus's Shouting Page</h1> <p>...</p> </page> <page name="sleeping" title="sleeping"> <h1>Mr. Hello's Sleeping Page</h1> <p>...</p> </page> <page name="eating" title="eating"> <h1>Mr. World's Eating Page</h1> <p>...</p> </page> </directory> </website>
对应的python解析xml代码如下:
#coding=utf-8 #!/usr/bin/python from xml.sax.handler import ContentHandler from xml.sax import parse import os class Dispatcher: def dispatch(self, prefix, name, attrs=None): mname = prefix + name.capitalize() dname = 'default' + prefix.capitalize() method = getattr(self, mname, None) if callable(method): args = () else: method = getattr(self, dname, None) args = name, if prefix == 'start': args += attrs, if callable(method): method(*args) def startElement(self, name, attrs): self.dispatch('start', name, attrs) def endElement(self, name): self.dispatch('end', name) class WebsiteConstructor(Dispatcher, ContentHandler): passthrough = False def __init__(self, directory): self.directory = [directory] self.ensureDirectory() def ensureDirectory(self): path = os.path.join(*self.directory) if not os.path.isdir(path): os.makedirs(path) def characters(self, chars): if self.passthrough: self.out.write(chars) def defaultStart(self, name, attrs): if self.passthrough: self.out.write('<' + name) for key, val in attrs.items(): self.out.write(' %s="%s"' % (key, val)) self.out.write('>') def defaultEnd(self, name): if self.passthrough: self.out.write('</%s>' % name) def startDirectory(self, attrs): self.directory.append(attrs['name']) self.ensureDirectory() def endDirectory(self): self.directory.pop() def startPage(self, attrs): filename = os.path.join(*self.directory+[attrs['name']+'.html']) self.out = open(filename, 'w') self.writeHeader(attrs['title']) self.passthrough = True def endPage(self): self.passthrough = False self.writeFooter() self.out.close() def writeHeader(self, title): self.out.write('<html>\n<head>\n <title>') self.out.write(title) self.out.write('</title>\n </head>\n <body>\n') def writeFooter(self): self.out.write('\n </body>\n</html>\n') parse('website.xml', WebsiteConstructor('public_html'))