使用python解析xml成对应的html示例分享


Posted in Python onApril 02, 2014

SAX将dd.xml解析成html。当然啦,如果得到了xml对应的xsl文件可以直接用libxml2将其转换成html。

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
#---------------------------------------
#   程序:XML解析器
#   版本:01.0
#   作者:mupeng
#   日期:2013-12-18
#   语言:Python 2.7
#   功能:将xml解析成对应的html
#   注解:该程序用xml.sax模块的parse函数解析XML,并生成事件
#   继承ContentHandler并重写其事件处理函数
#   Dispatcher主要用于相应标签的起始、结束事件的派发
#---------------------------------------
from xml.sax.handler import ContentHandler
from xml.sax import parse
class Dispatcher:
    def dispatch(self, prefix, name, attrs=None):
        mname = prefix + name.capitalize()
        dname = 'default' + prefix.capitalize()
        method = getattr(self, mname, None)
        if callable(method): args = ()
        else:
            method = getattr(self, dname, None)
            #args = name
        #if prefix == 'start': args += attrs
        if callable(method): method()
    def startElement(self, name, attrs):
        self.dispatch('start', name, attrs)
    def endElement(self, name):
        self.dispatch('end', name)
class Website(Dispatcher, ContentHandler):
    def __init__(self):
        self.fout = open('ddt_SAX.html', 'w')
        self.imagein = False
        self.desflag = False
        self.item = False
        self.title = ''
        self.link = ''
        self.guid = ''
        self.url = ''
        self.pubdate = ''
        self.description = ''
        self.temp = ''
        self.prx = ''
    def startChannel(self):
        self.fout.write('''<html>\n<head>\n<title> RSS-''')
    def endChannel(self):
       self.fout.write('''
                    <tr><td height="20"></td></tr>
                    </table>
                    </center>
                    <script>
    function  GetTimeDiff(str)
    {
     if(str == '')
     {
      return '';
     }
     var pubDate = new Date(str);
     var nowDate = new Date();
     var diffMilSeconds = nowDate.valueOf()-pubDate.valueOf();
     var days = diffMilSeconds/86400000;
     days = parseInt(days);
     diffMilSeconds = diffMilSeconds-(days*86400000);
     var hours = diffMilSeconds/3600000;
     hours = parseInt(hours);
     diffMilSeconds = diffMilSeconds-(hours*3600000);
     var minutes = diffMilSeconds/60000;
     minutes = parseInt(minutes);
     diffMilSeconds = diffMilSeconds-(minutes*60000);
     var seconds = diffMilSeconds/1000;
     seconds = parseInt(seconds);
     var returnStr = "±±¾©·¢²¼Ê±¼ä£º" + pubDate.toLocaleString();
     if(days > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + days + "Ìì" + hours + "Сʱ" + minutes + "·ÖÖÓ£©";
     }
     else if (hours > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + hours + "Сʱ" + minutes + "·ÖÖÓ£©";
     }
     else if (minutes > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + minutes + "·ÖÖÓ£©";
     }
     return returnStr;
    }
    function GetSpanText()
    {
     var pubDate;
     var pubDateArray;
     var spanArray = document.getElementsByTagName("span");
     for(var i = 0; i < spanArray.length; i++)
     {
      pubDate = spanArray[i].innerHTML;
      document.getElementsByTagName("span")[i].innerHTML = GetTimeDiff(pubDate);   
     }
    }
    GetSpanText();
   </script>
                </body>
                </html>
                ''')
       self.fout.close()
    def characters(self, chars):
        if chars.strip():
            #chars = chars.strip()
            self.temp += chars
            #print self.temp
       
    def startTitle(self):
        if self.item:
            self.fout.write('''
                        <tr bgcolor="#eeeeee">\n<td style="padding-top:5px;padding-left:5px;" height="30">\n<B>
                    ''')
    def endTitle(self):
        if not self.imagein and not self.item:
            self.title = self.temp
            self.temp = ''
            self.fout.write(self.title.encode('gb2312'))
            #self.title = self.temp
            self.fout.write('''
                </title>\n</head>\n<body>\n<center>\n
                <script>\n
                        function copyLink()
                        {
                                clipboardData.setData("Text",window.location.href);
                                alert("RSSÁ´½ÓÒѾ­¸´ÖƵ½¼ôÌù°å");
                        }
                        function subscibeLink()
                        {
                                var str = window.location.pathname;
                                while(str.match(/^\//))
                                {
                                        str = str.replace(/^\//,"");
                                }
                                window.open("http://rss.sina.com.cn/my_sina_web_rss_news.html?url=" + str,"_self");
                        }
                        </script>\n
                <table width="750" cellpadding="0" cellspacing="0">\n
                <tr>\n
                <td align="right" style="padding-right:15px;" valign="bottom">\n
            ''')
        if self.item:
            self.title = self.temp
            self.temp = ''
            self.fout.write(self.title.encode('gb2312'))
            self.fout.write('''
                        </B>
                        </td>
                        </tr>
                        <tr bgcolor="#eeeeee">
                        <td style="padding-left:5px;">
                        ''')
    def startImage(self):
        self.imagein = True
    def endImage(self):
        self.imagein = False
    def startLink(self):
        if self.imagein:
            self.fout.write('''<A href=" ''')
            
    def endLink(self):
        self.link = self.temp
        self.temp = ''
        if self.imagein:
            self.fout.write(self.link.encode('gb2312'))
            self.fout.write('''" target="_blank">\n ''')
        elif self.item:
            #self.link = self.temp
            pass
        else:
            self.fout.write(self.link)
            self.fout.write(''' " target="
      _blank
     "> ''')
            self.fout.write(self.title.encode('gb2312'))
            self.fout.write(''' </A></B></td>
                            </tr>
                            <tr><td colspan="2" align="center">
                            ''')
            self.fout.write(self.description.encode('gb2312'))
            self.fout.write('''
                        </td></tr>
                        <tr style="font-size:12px;" bgcolor="#eeeeff"><td colspan="2" style="font-size:14px;padding-top:5px;padding-bottom:5px;"><b><a href="javascript:copyLink();">¸´ÖÆ´ËÒ³Á´½Ó</a>                <a href="javascript:subscibeLink();">ÎÒҪǶÈë¸ÃÐÂÎÅÁÐ±íµ½ÎÒµÄÒ³Ãæ£¨¼òµ¥¡¢¿ìËÙ¡¢ÊµÊ±¡¢Ãâ·Ñ£©</a></b></td></tr>
                        </table>
                        <table width="750" cellpadding="0" cellspacing="0">
                            ''')
    def startUrl(self):
        if self.imagein:
            self.fout.write('''<IMG src=" ''')
    def endUrl(self):
        self.url = self.temp
        self.temp = ''
        if self.imagein:
            self.fout.write(self.url.encode('gb2312'))
            self.fout.write('''" border="0">\n
                            </A>
                            </td>
                            <td align="left" valign="bottom" style="padding-bottom:8px;"><B><A href="
                            ''')
        if self.item:
            #self.url = self.temp
            pass
    def defaultStart(self):
        pass
    def defaultEnd(self):
        self.temp = ''
    def startDescription(self):
        pass
    def endDescription(self):
        self.description = self.temp
        self.temp = ''
        if self.item:
            #self.fout.write('¡¡¡¡')
            self.fout.write(self.description.encode('gb2312'))
    def endGuid(self):
        self.guid = self.temp
    def endPubdate(self):
        if not self.temp.startswith('http'):
         self.pubdate = self.temp
         self.temp = ''
        else:
            self.pubdate = ''
    def startItem(self):
        self.item = True
    def endItem(self):
        self.item = False
        self.fout.write('''
                            </td>
                            </tr>
                            <tr bgcolor="#eeeeee">
                            <td style="padding-top:5px;padding-left:5px;">
                            <A href="''')
        self.fout.write(self.link)
        self.fout.write(''' " target="_blank"> ''')
        self.fout.write(self.guid)
        self.fout.write('''
                        </A>
                        </td>
                        </tr>
                        <tr bgcolor="#eeeeee">
                        <td style="padding-top:5px;padding-left:5px;padding-bottom:5px;"><span>''')
        self.fout.write(self.pubdate)
        self.fout.write('''</span></td>
                        </tr>
                        <tr height="10"><td></td></tr>''')
#程序入口
if __name__ == '__main__':
    parse('ddt.xml', Website())
Python 相关文章推荐
Python字符串详细介绍
May 09 Python
Python模块搜索概念介绍及模块安装方法介绍
Jun 03 Python
利用Python-iGraph如何绘制贴吧/微博的好友关系图详解
Nov 02 Python
python中WSGI是什么,Python应用WSGI详解
Nov 24 Python
Python面向对象之类的定义与继承用法示例
Jan 14 Python
对django views中 request, response的常用操作详解
Jul 17 Python
python数据类型之间怎么转换技巧分享
Aug 20 Python
python编写计算器功能
Oct 25 Python
python3连接mysql获取ansible动态inventory脚本
Jan 19 Python
Python+Appium实现自动化测试的使用步骤
Mar 24 Python
Python爬取阿拉丁统计信息过程图解
May 12 Python
Python tkinter之ComboBox(下拉框)的使用简介
Feb 05 Python
Python爬虫框架Scrapy安装使用步骤
Apr 01 #Python
使用python绘制人人网好友关系图示例
Apr 01 #Python
python异步任务队列示例
Apr 01 #Python
用Python编程实现语音控制电脑
Apr 01 #Python
35个Python编程小技巧
Apr 01 #Python
ptyhon实现sitemap生成示例
Mar 30 #Python
python实现百度关键词排名查询
Mar 30 #Python
You might like
论建造顺序的重要性
2020/03/04 星际争霸
用Flash图形化数据(二)
2006/10/09 PHP
PHP中动态显示签名和ip原理
2007/03/28 PHP
PHP中快速生成随机密码的几种方式
2017/04/17 PHP
jquery中的on方法使用介绍
2013/12/29 Javascript
JS创建类和对象的两种不同方式
2014/08/08 Javascript
jQuery插件制作之全局函数用法实例
2015/06/01 Javascript
JS实现星星评分功能实例代码(两种方法)
2016/06/09 Javascript
js简单实现调整网页字体大小的方法
2016/07/23 Javascript
浅谈Vue数据绑定的原理
2018/01/08 Javascript
Node.js中文件系统fs模块的使用及常用接口
2020/03/06 Javascript
vue+node 实现视频在线播放的实例代码
2020/10/19 Javascript
Nuxt 项目性能优化调研分析
2020/11/07 Javascript
three.js如何实现3D动态文字效果
2021/03/03 Javascript
Python的条件语句与运算符优先级详解
2015/10/13 Python
Python使用win32com实现的模拟浏览器功能示例
2017/07/13 Python
用Python实现KNN分类算法
2017/12/22 Python
python实现图书管理系统
2018/03/12 Python
详谈python3中用for循环删除列表中元素的坑
2018/04/19 Python
对python使用http、https代理的实例讲解
2018/05/07 Python
Python使用一行代码获取上个月是几月
2018/08/30 Python
Pandas过滤dataframe中包含特定字符串的数据方法
2018/11/07 Python
Django基础三之视图函数的使用方法
2019/07/18 Python
pytorch实现Tensor变量之间的转换
2020/02/17 Python
PyQt5 界面显示无响应的实现
2020/03/26 Python
python网络爬虫实现发送短信验证码的方法
2021/02/25 Python
DVF官方网站:美国时装界尊尚品牌
2017/08/29 全球购物
财务专业大学生职业生涯规划范文
2013/12/30 职场文书
给医务人员表扬信
2014/01/12 职场文书
机房搬迁方案
2014/05/01 职场文书
航空学院求职信
2014/06/11 职场文书
签订劳动合同通知书
2015/04/16 职场文书
举起手来观后感
2015/06/09 职场文书
MySql开发之自动同步表结构
2021/05/28 MySQL
redis实现的四种常见限流策略
2021/06/18 Redis
MySQL数据库简介与基本操作
2022/05/30 MySQL