使用python解析xml成对应的html示例分享


Posted in Python onApril 02, 2014

SAX将dd.xml解析成html。当然啦,如果得到了xml对应的xsl文件可以直接用libxml2将其转换成html。

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
#---------------------------------------
#   程序:XML解析器
#   版本:01.0
#   作者:mupeng
#   日期:2013-12-18
#   语言:Python 2.7
#   功能:将xml解析成对应的html
#   注解:该程序用xml.sax模块的parse函数解析XML,并生成事件
#   继承ContentHandler并重写其事件处理函数
#   Dispatcher主要用于相应标签的起始、结束事件的派发
#---------------------------------------
from xml.sax.handler import ContentHandler
from xml.sax import parse
class Dispatcher:
    def dispatch(self, prefix, name, attrs=None):
        mname = prefix + name.capitalize()
        dname = 'default' + prefix.capitalize()
        method = getattr(self, mname, None)
        if callable(method): args = ()
        else:
            method = getattr(self, dname, None)
            #args = name
        #if prefix == 'start': args += attrs
        if callable(method): method()
    def startElement(self, name, attrs):
        self.dispatch('start', name, attrs)
    def endElement(self, name):
        self.dispatch('end', name)
class Website(Dispatcher, ContentHandler):
    def __init__(self):
        self.fout = open('ddt_SAX.html', 'w')
        self.imagein = False
        self.desflag = False
        self.item = False
        self.title = ''
        self.link = ''
        self.guid = ''
        self.url = ''
        self.pubdate = ''
        self.description = ''
        self.temp = ''
        self.prx = ''
    def startChannel(self):
        self.fout.write('''<html>\n<head>\n<title> RSS-''')
    def endChannel(self):
       self.fout.write('''
                    <tr><td height="20"></td></tr>
                    </table>
                    </center>
                    <script>
    function  GetTimeDiff(str)
    {
     if(str == '')
     {
      return '';
     }
     var pubDate = new Date(str);
     var nowDate = new Date();
     var diffMilSeconds = nowDate.valueOf()-pubDate.valueOf();
     var days = diffMilSeconds/86400000;
     days = parseInt(days);
     diffMilSeconds = diffMilSeconds-(days*86400000);
     var hours = diffMilSeconds/3600000;
     hours = parseInt(hours);
     diffMilSeconds = diffMilSeconds-(hours*3600000);
     var minutes = diffMilSeconds/60000;
     minutes = parseInt(minutes);
     diffMilSeconds = diffMilSeconds-(minutes*60000);
     var seconds = diffMilSeconds/1000;
     seconds = parseInt(seconds);
     var returnStr = "±±¾©·¢²¼Ê±¼ä£º" + pubDate.toLocaleString();
     if(days > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + days + "Ìì" + hours + "Сʱ" + minutes + "·ÖÖÓ£©";
     }
     else if (hours > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + hours + "Сʱ" + minutes + "·ÖÖÓ£©";
     }
     else if (minutes > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + minutes + "·ÖÖÓ£©";
     }
     return returnStr;
    }
    function GetSpanText()
    {
     var pubDate;
     var pubDateArray;
     var spanArray = document.getElementsByTagName("span");
     for(var i = 0; i < spanArray.length; i++)
     {
      pubDate = spanArray[i].innerHTML;
      document.getElementsByTagName("span")[i].innerHTML = GetTimeDiff(pubDate);   
     }
    }
    GetSpanText();
   </script>
                </body>
                </html>
                ''')
       self.fout.close()
    def characters(self, chars):
        if chars.strip():
            #chars = chars.strip()
            self.temp += chars
            #print self.temp
       
    def startTitle(self):
        if self.item:
            self.fout.write('''
                        <tr bgcolor="#eeeeee">\n<td style="padding-top:5px;padding-left:5px;" height="30">\n<B>
                    ''')
    def endTitle(self):
        if not self.imagein and not self.item:
            self.title = self.temp
            self.temp = ''
            self.fout.write(self.title.encode('gb2312'))
            #self.title = self.temp
            self.fout.write('''
                </title>\n</head>\n<body>\n<center>\n
                <script>\n
                        function copyLink()
                        {
                                clipboardData.setData("Text",window.location.href);
                                alert("RSSÁ´½ÓÒѾ­¸´ÖƵ½¼ôÌù°å");
                        }
                        function subscibeLink()
                        {
                                var str = window.location.pathname;
                                while(str.match(/^\//))
                                {
                                        str = str.replace(/^\//,"");
                                }
                                window.open("http://rss.sina.com.cn/my_sina_web_rss_news.html?url=" + str,"_self");
                        }
                        </script>\n
                <table width="750" cellpadding="0" cellspacing="0">\n
                <tr>\n
                <td align="right" style="padding-right:15px;" valign="bottom">\n
            ''')
        if self.item:
            self.title = self.temp
            self.temp = ''
            self.fout.write(self.title.encode('gb2312'))
            self.fout.write('''
                        </B>
                        </td>
                        </tr>
                        <tr bgcolor="#eeeeee">
                        <td style="padding-left:5px;">
                        ''')
    def startImage(self):
        self.imagein = True
    def endImage(self):
        self.imagein = False
    def startLink(self):
        if self.imagein:
            self.fout.write('''<A href=" ''')
            
    def endLink(self):
        self.link = self.temp
        self.temp = ''
        if self.imagein:
            self.fout.write(self.link.encode('gb2312'))
            self.fout.write('''" target="_blank">\n ''')
        elif self.item:
            #self.link = self.temp
            pass
        else:
            self.fout.write(self.link)
            self.fout.write(''' " target="
      _blank
     "> ''')
            self.fout.write(self.title.encode('gb2312'))
            self.fout.write(''' </A></B></td>
                            </tr>
                            <tr><td colspan="2" align="center">
                            ''')
            self.fout.write(self.description.encode('gb2312'))
            self.fout.write('''
                        </td></tr>
                        <tr style="font-size:12px;" bgcolor="#eeeeff"><td colspan="2" style="font-size:14px;padding-top:5px;padding-bottom:5px;"><b><a href="javascript:copyLink();">¸´ÖÆ´ËÒ³Á´½Ó</a>                <a href="javascript:subscibeLink();">ÎÒҪǶÈë¸ÃÐÂÎÅÁÐ±íµ½ÎÒµÄÒ³Ãæ£¨¼òµ¥¡¢¿ìËÙ¡¢ÊµÊ±¡¢Ãâ·Ñ£©</a></b></td></tr>
                        </table>
                        <table width="750" cellpadding="0" cellspacing="0">
                            ''')
    def startUrl(self):
        if self.imagein:
            self.fout.write('''<IMG src=" ''')
    def endUrl(self):
        self.url = self.temp
        self.temp = ''
        if self.imagein:
            self.fout.write(self.url.encode('gb2312'))
            self.fout.write('''" border="0">\n
                            </A>
                            </td>
                            <td align="left" valign="bottom" style="padding-bottom:8px;"><B><A href="
                            ''')
        if self.item:
            #self.url = self.temp
            pass
    def defaultStart(self):
        pass
    def defaultEnd(self):
        self.temp = ''
    def startDescription(self):
        pass
    def endDescription(self):
        self.description = self.temp
        self.temp = ''
        if self.item:
            #self.fout.write('¡¡¡¡')
            self.fout.write(self.description.encode('gb2312'))
    def endGuid(self):
        self.guid = self.temp
    def endPubdate(self):
        if not self.temp.startswith('http'):
         self.pubdate = self.temp
         self.temp = ''
        else:
            self.pubdate = ''
    def startItem(self):
        self.item = True
    def endItem(self):
        self.item = False
        self.fout.write('''
                            </td>
                            </tr>
                            <tr bgcolor="#eeeeee">
                            <td style="padding-top:5px;padding-left:5px;">
                            <A href="''')
        self.fout.write(self.link)
        self.fout.write(''' " target="_blank"> ''')
        self.fout.write(self.guid)
        self.fout.write('''
                        </A>
                        </td>
                        </tr>
                        <tr bgcolor="#eeeeee">
                        <td style="padding-top:5px;padding-left:5px;padding-bottom:5px;"><span>''')
        self.fout.write(self.pubdate)
        self.fout.write('''</span></td>
                        </tr>
                        <tr height="10"><td></td></tr>''')
#程序入口
if __name__ == '__main__':
    parse('ddt.xml', Website())
Python 相关文章推荐
python列表与元组详解实例
Nov 01 Python
Python 中的with关键字使用详解
Sep 11 Python
Python面向对象编程之继承与多态详解
Jan 16 Python
Python 中Pickle库的使用详解
Feb 24 Python
Python工厂函数用法实例分析
May 14 Python
基于wxPython的GUI实现输入对话框(1)
Feb 27 Python
基于python实现文件加密功能
Jan 06 Python
python实现逢七拍腿小游戏的思路详解
May 26 Python
Python-for循环的内部机制
Jun 12 Python
解决tensorflow 释放图,删除变量问题
Jun 23 Python
Django如何实现防止XSS攻击
Oct 13 Python
python解析照片拍摄时间进行图片整理
Jul 23 Python
Python爬虫框架Scrapy安装使用步骤
Apr 01 #Python
使用python绘制人人网好友关系图示例
Apr 01 #Python
python异步任务队列示例
Apr 01 #Python
用Python编程实现语音控制电脑
Apr 01 #Python
35个Python编程小技巧
Apr 01 #Python
ptyhon实现sitemap生成示例
Mar 30 #Python
python实现百度关键词排名查询
Mar 30 #Python
You might like
简单的pgsql pdo php操作类实现代码
2016/08/25 PHP
PHP+MySQL实现消息队列的方法分析
2018/05/09 PHP
PHP实现将base64编码字符串转换成图片示例
2018/06/22 PHP
Laravel5.7 数据库操作迁移的实现方法
2019/04/12 PHP
让您的菜单不离网站
2006/10/03 Javascript
比较详细的关于javascript中void(0)的具体含义解释
2007/08/02 Javascript
javascript代码加载优化方法
2011/01/30 Javascript
jquery maxlength使用说明
2011/09/09 Javascript
通过jquery还原含有rowspan、colspan的table的实现方法
2012/02/10 Javascript
Firefox中通过JavaScript复制数据到剪贴板(Copy to Clipboard 跨浏览器版)
2013/11/22 Javascript
Javscript调用iframe框架页面中函数的方法
2014/11/01 Javascript
JS+CSS实现简单滑动门(滑动菜单)效果
2015/09/19 Javascript
jQuery前端开发35个小技巧
2016/05/24 Javascript
使用JavaScript判断手机浏览器是横屏还是竖屏问题
2016/08/02 Javascript
JS作用域闭包、预解释和this关键字综合实例解析
2016/12/16 Javascript
vue2.0模拟锚点的实例
2018/03/14 Javascript
Vue对象赋值视图不更新问题及解决方法
2019/06/03 Javascript
node.js使用 http-proxy 创建代理服务器操作示例
2020/02/10 Javascript
详解vue 组件的实现原理
2020/11/12 Javascript
[29:23]2014 DOTA2国际邀请赛中国区预选赛 LGD-GAMING VS CIS 第一场1
2014/05/23 DOTA
[53:49]LGD vs Fnatic 2018国际邀请赛小组赛BO2 第二场 8.18
2018/08/19 DOTA
Python实现截屏的函数
2015/07/25 Python
python 获取毫秒数,计算调用时长的方法
2019/02/20 Python
CSS3 创建网页动画实现弹跳球动效果
2018/10/30 HTML / CSS
印度化妆品购物网站:Nykaa
2018/07/22 全球购物
美国最大婚纱连锁店运营商:David’s Bridal
2019/03/12 全球购物
Envie de Fraise意大利:法国网上推出的孕妇装品牌
2020/10/18 全球购物
司机的工作范围及职责
2013/11/13 职场文书
毕业生就业推荐信范文
2013/12/01 职场文书
会计专业自荐信范文
2013/12/02 职场文书
初三化学教学反思
2014/01/23 职场文书
内刊编辑求职自荐书范文
2014/02/19 职场文书
校企合作协议书
2014/04/16 职场文书
机关门卫的岗位职责
2014/04/29 职场文书
学习演讲稿范文
2014/05/10 职场文书
周一早安温馨问候祝福语!
2019/07/15 职场文书