libreoffice python 操作word及excel文档的方法


Posted in Python onJuly 04, 2019

1、开始、关闭libreoffice服务;

开始之前同步字体文件时间,是因为创建soffice服务时,服务会检查所需加载的文件的时间,如果其认为时间不符,则其可能会重新加载,耗时较长,因此需事先统一时间。

使用时如果需要多次调用,最后每次调用均开启后关闭,否则libreoffice会创建一个缓存文档并越用越大,处理时间会增加。

class OfficeProcess(object):
  def __init__(self):
    self.p = 0
    subprocess.Popen('find /usr/share/fonts | xargs touch -m -t 201801010000.00', shell=True)

  def start_office(self):
    self.p = subprocess.Popen('soffice --pidfile=sof.pid --invisible --accept="socket,host=localhost,port=2002;urp;"', shell=True)
    while True:
      try:
        local_context = uno.getComponentContext()
        resolver = local_context.getServiceManager().createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', local_context)
        resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')
        return
      except:
        print(ts(), "wait for connecting soffice...")
        time.sleep(1)
        continue

  def stop_office(self):
    with open("sof.pid", "rb") as f:
      try:
        os.kill(int(f.read()), signal.SIGTERM)
        self.p.wait()
      except:
        pass

2、init service manager

local_context = uno.getComponentContext()
    service_manager = local_context.getServiceManager()
    resolver = service_manager.createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', local_context)
    self.ctx = resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')
    self.smgr = self.ctx.ServiceManager
    self.desktop = self.smgr.createInstanceWithContext('com.sun.star.frame.Desktop', self.ctx)

3、从二进制数据中读取doc文档

def ImportFromMemory(self, data):
    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)
    istream.initialize((uno.ByteSequence(data), ))
    pv = PropertyValue()
    pv.Name = 'InputStream'
    pv.Value = istream
    self.doc = {'doc': []}
    try:
      self.document = self.desktop.loadComponentFromURL('private:stream/swriter', '_blank', 0, (pv, ))
      self.text = self.document.getText()
    except:
      self.text = None

4、读取doc文档中的数据

def ExportToJson(self):
    try:
      l = self.__ParseText(self.text, self.__Callback(self.doc['doc']))
      self.doc['length'] = l
    except:
      self.doc = {'doc': [], 'length': 0}
    return json.dumps(self.doc)

@staticmethod
  def __Callback(alist):
    def Append(sth):
      alist.append(sth)
    return Append
def __ParseText(self, text, func):
    l = 0
    text_it = text.createEnumeration()
    while text_it.hasMoreElements():
      element = text_it.nextElement()
      if element.supportsService('com.sun.star.text.Paragraph'):
        l += self.__ParseParagraph(element, func)
      elif element.supportsService('com.sun.star.text.TextTable'):
        l += self.__ParseTable(element, func)
      else:
        pass
    return l
def __ParseParagraph(self, paragraph, func):
    p = {'paragraph': []}
    l = 0
    paragraph_it = paragraph.createEnumeration()
    while paragraph_it.hasMoreElements():
      portion = paragraph_it.nextElement()
      if portion.TextPortionType == 'Text':
        l += self.__ParsePortionText(portion, self.__Callback(p['paragraph']))
      elif portion.TextPortionType == 'SoftPageBreak':
        pass
      elif portion.TextPortionType == 'TextField':
        l += self.__ParsePortionText(portion, self.__Callback(p['paragraph']))
      else:
        l += self.__ParseTextContent(portion, self.__Callback(p['paragraph']))
    if hasattr(paragraph, 'createContentEnumeration'):
      l += self.__ParseTextContent(paragraph, self.__Callback(p['paragraph']))
    p['length'] = l
    func(p)
    return l

  def __ParseTextContent(self, textcontent, func):
    l = 0
    content_it = textcontent.createContentEnumeration('com.sun.star.text.TextContent')
    while content_it.hasMoreElements():
      element = content_it.nextElement()
      if element.supportsService('com.sun.star.text.TextGraphicObject'):
        l += self.__ParsePortionGraphic(element, func)
      elif element.supportsService('com.sun.star.text.TextEmbeddedObject'):
        pass
      elif element.supportsService('com.sun.star.text.TextFrame'):
        l += self.__ParseFrame(element, func)
      elif element.supportsService('com.sun.star.drawing.GroupShape'):
        l += self.__ParseGroup(element, func)
      else:
        pass
    return l

  def __ParseFrame(self, frame, func):
    f = {'frame': []}
    l = self.__ParseText(frame.getText(), self.__Callback(f['frame']))
    f['length'] = l
    func(f)
    return l

  def __ParseGroup(self, group, func):
    l = 0
    for i in range(group.getCount()):
      it = group.getByIndex(i)
      if it.supportsService('com.sun.star.drawing.Text'):
        l += self.__ParseFrame(it, func)
      else:
        pass
    return l

  def __ParsePortionText(self, portion_text, func):
    func({'portion': portion_text.String, 'length': len(portion_text.String)})
    return len(portion_text.String)

  def __ParsePortionGraphic(self, portion_graphic, func):
    gp = self.smgr.createInstanceWithContext('com.sun.star.graphic.GraphicProvider', self.ctx)
    stream = self.smgr.createInstanceWithContext('com.sun.star.io.TempFile', self.ctx)
    pv1 = PropertyValue()
    pv1.Name = 'OutputStream'
    pv1.Value = stream
    pv2 = PropertyValue()
    pv2.Name = 'MimeType'
    pv2.Value = 'image/png'
    gp.storeGraphic(portion_graphic.Graphic, (pv1, pv2))
    stream.getOutputStream().flush()
    stream.seek(0)
    l = stream.getInputStream().available()
    b = uno.ByteSequence(b'')
    stream.seek(0)
    l, b = stream.getInputStream().readBytes(b, l)
    img = {'image': base64.b64encode(b.value).decode('ascii')}
    img['height'] = portion_graphic.Height
    img['width'] = portion_graphic.Width
    img['actualheight'] = portion_graphic.ActualSize.Height
    img['actualwidth'] = portion_graphic.ActualSize.Width
    img['croptop'] = portion_graphic.GraphicCrop.Top
    img['cropbottom'] = portion_graphic.GraphicCrop.Bottom
    img['cropleft'] = portion_graphic.GraphicCrop.Left
    img['cropright'] = portion_graphic.GraphicCrop.Right
    img['length'] = 0
    func(img)
    return 0

  def __ParseTable(self, table, func):
    l = 0
    try:
      matrix = self.__GetTableMatrix(table)
      seps = self.__GetTableSeparators(table)
      t = {}
      count = 0
      for ri in matrix.keys():
        t[ri] = {}
        for ci in matrix[ri].keys():
          t[ri][ci] = dict(matrix[ri][ci])
          del t[ri][ci]['cell']
          t[ri][ci]['content'] = []
          l += self.__ParseText(matrix[ri][ci]['cell'], self.__Callback(t[ri][ci]['content']))
          count += t[ri][ci]['rowspan'] * t[ri][ci]['colspan']
      if count != len(t) * len(seps):
        raise ValueError('count of cells error')
      func({'table': t, 'row': len(t), 'column': len(seps), 'length': l, 'tableid': self.table_id})
      self.table_id += 1
    except:
      l = 0
      print('discard wrong table')
    return l

  @staticmethod
  def __GetTableSeparators(table):
    result = [table.TableColumnRelativeSum]
    for ri in range(table.getRows().getCount()):
      result += [s.Position for s in table.getRows().getByIndex(ri).TableColumnSeparators]
    result = sorted(set(result))
    for i in range(len(result) - 1):
      result[i] += 1 if result[i] + 1 == result[i + 1] else 0
    return sorted(set(result))

  @staticmethod
  def __NameToRC(name):
    r = int(re.sub('[A-Za-z]', '', name)) - 1
    cstr = re.sub('[0-9]', '', name)
    c = 0
    for i in range(len(cstr)):
      if cstr[i] >= 'A' and cstr[i] <= 'Z':
        c = c * 52 + ord(cstr[i]) - ord('A')
      else:
        c = c * 52 + 26 + ord(cstr[i]) - ord('a')
    return r, c

  @staticmethod
  def __GetTableMatrix(table):
    result = {}
    for name in table.getCellNames():
      ri, ci = WordToJson.__NameToRC(name)
      cell = table.getCellByName(name)
      if ri not in result:
        result[ri] = {}
      result[ri][ci] = {'cell': cell, 'rowspan': cell.RowSpan, 'name': name}

    seps = WordToJson.__GetTableSeparators(table)
    for ri in result.keys():
      sep = [s.Position for s in table.getRows().getByIndex(ri).TableColumnSeparators] + [table.TableColumnRelativeSum]
      sep = sorted(set(sep))
      for ci in result[ri].keys():
        right = seps.index(sep[ci]) if sep[ci] in seps else seps.index(sep[ci] + 1)
        left = -1 if ci == 0 else seps.index(sep[ci - 1]) if sep[ci - 1] in seps else seps.index(sep[ci - 1] + 1)
        result[ri][ci]['colspan'] = right - left
    return result

5、写doc文档

self.doco = self.desktop.loadComponentFromURL('private:factory/swriter', '_blank', 0, ())
    self.texto = self.doco.getText()
    self.cursoro = self.texto.createTextCursor()
    self.cursoro.ParaBottomMargin = 500
def __WriteText(self, text, texto, cursoro):
    for it in text:
      if 'paragraph' in it:
        self.__WriteParagraph(it, texto, cursoro)
      elif 'image' in it:
        self.__WritePortionGraphic(it, texto, cursoro)
      elif 'table' in it:
        self.__WriteTable(it, texto, cursoro)

  def __WriteParagraph(self, paragraph, texto, cursoro):
    if paragraph['length'] > 0:
      if 'result' in paragraph:
        for it in paragraph['result']:
          texto.insertString(cursoro, it['trans_sen'], False)
      else:
        texto.insertString(cursoro, paragraph['paragraph'], False)
      texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)

  def __WritePortionGraphic(self, portion_graphic, texto, cursoro):
    png_base64 = portion_graphic['image']
    png = base64.b64decode(png_base64)
    gp = self.smgr.createInstanceWithContext('com.sun.star.graphic.GraphicProvider', self.ctx)
    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)
    istream.initialize((uno.ByteSequence(png), ))
    pv = PropertyValue()
    pv.Name = 'InputStream'
    pv.Value = istream

    actualsize = uno.createUnoStruct('com.sun.star.awt.Size')
    actualsize.Height = portion_graphic['actualheight'] if 'actualheight' in portion_graphic else portion_graphic['height']
    actualsize.Width = portion_graphic['actualwidth'] if 'actualwidth' in portion_graphic else portion_graphic['width']
    graphiccrop = uno.createUnoStruct('com.sun.star.text.GraphicCrop')
    graphiccrop.Top = portion_graphic['croptop'] if 'croptop' in portion_graphic else 0
    graphiccrop.Bottom = portion_graphic['cropbottom'] if 'cropbottom' in portion_graphic else 0
    graphiccrop.Left = portion_graphic['cropleft'] if 'cropleft' in portion_graphic else 0
    graphiccrop.Right = portion_graphic['cropright'] if 'cropright' in portion_graphic else 0

    image = self.doco.createInstance('com.sun.star.text.TextGraphicObject')
    image.Surround = NONE
    image.Graphic = gp.queryGraphic((pv, ))
    image.Height = portion_graphic['height']
    image.Width = portion_graphic['width']
    image.setPropertyValue('ActualSize', actualsize)
    image.setPropertyValue('GraphicCrop', graphiccrop)
    texto.insertTextContent(cursoro, image, False)
    texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)

  def __WriteTable(self, table, texto, cursoro):
    tableo = self.doco.createInstance('com.sun.star.text.TextTable')
    tableo.initialize(table['row'], table['column'])
    texto.insertTextContent(cursoro, tableo, False)
#    texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)
    tcursoro = tableo.createCursorByCellName("A1")

    hitbug = False
    if table['row'] > 1:
      tcursoro.goDown(1, True)
      hitbug = tcursoro.getRangeName() == 'A1'

    for ri in sorted([int(r) for r in table['table'].keys()]):
      rs = table['table'][str(ri)]
      for ci in sorted([int(c) for c in rs.keys()]):
        cell = rs[str(ci)]
        if hitbug == False and (cell['rowspan'] > 1 or cell['colspan'] > 1):
          tcursoro.gotoCellByName(cell['name'], False)
          if cell['rowspan'] > 1:
            tcursoro.goDown(cell['rowspan'] - 1, True)
          if cell['colspan'] > 1:
            tcursoro.goRight(cell['colspan'] - 1, True)
          tcursoro.mergeRange()
        ctexto = tableo.getCellByName(cell['name'])
        if ctexto == None:
          continue
        ccursoro = ctexto.createTextCursor()
        ccursoro.CharWeight = FontWeight.NORMAL
        ccursoro.CharWeightAsian = FontWeight.NORMAL
        ccursoro.ParaAdjust = LEFT
        self.__WriteText(cell['content'], ctexto, ccursoro)

6、生成二进制的doc文档数据

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.doco.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'MS Word 2007 XML', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datao = streamo.readBytes(None, streamo.available())

7、从doc文档数据生成pdf的二进制数据

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.doco.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'writer_pdf_Export', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datap = streamo.readBytes(None, streamo.available())

8、读取excel二进制数据

def ImportFromMemory(self, data):
    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)
    istream.initialize((uno.ByteSequence(data), ))
    pv = PropertyValue()
    pv.Name = 'InputStream'
    pv.Value = istream
    self.doc = {'doc': []}
    try:
      print("before loadComponentFromURL")
      self.document = self.desktop.loadComponentFromURL('private:stream/scalc', '_blank', 0, (pv, ))
      self.sheets = self.document.getSheets()
      print("ImportFromMemory done")
    except:
      print("ImportFromMemory failed")
      self.sheets = None

9、读取excel的文本数据

def ExportToJson(self):
    try:
      l = self.__ParseText(self.sheets, self.__Callback(self.doc['doc']))
      self.doc['length'] = l
    except:
      self.doc = {'doc': [], 'length': 0}
    return json.dumps(self.doc)
def __ParseText(self, sheets, func):
    l = 0
    sheets_it = sheets.createEnumeration()
    while sheets_it.hasMoreElements():
      element = sheets_it.nextElement()
      if element.supportsService('com.sun.star.sheet.Spreadsheet'):
        l += self.__ParseSpreadsheet(element, func)
    return l

  def __ParseSpreadsheet(self, spreadsheet, func):
    l = 0
    p = {'spreadsheet': []}
    visible_cells_it = spreadsheet.queryVisibleCells().getCells().createEnumeration()
    while visible_cells_it.hasMoreElements():
      cell = visible_cells_it.nextElement()
      type = cell.getType()
      if type == self.EMPTY:
        print("cell.type==empty")
      elif type == self.VALUE:
        print("cell.type==VALUE", "value=", cell.getValue(), cell.getCellAddress ())
      elif type == self.TEXT:
        print("cell.type==TEXT","content=", cell.getString().encode("UTF-8"), cell.getCellAddress ())
        l += self.__ParseCellText(spreadsheet, cell, self.__Callback(p['spreadsheet']))
        print("__ParseCellText=", p)
      elif type == self.FORMULA:
        print("cell.type==FORMULA", "formula=", cell.getValue())
    p['length'] = l
    func(p)
    return l

  def __ParseCellText(self, sheet, cell, func):
    try:
      x = cell.getCellAddress().Column
      y = cell.getCellAddress().Row
      sheetname = sheet.getName()
    except:
      x = -1
      y = -1
      sheetname = None
    func({'celltext': cell.getString(), 'x': x, 'y': y, 'sheetname': sheetname, 'length': len(cell.getString())})
    return len(cell.getString())
 self.EMPTY = uno.Enum("com.sun.star.table.CellContentType", "EMPTY")
    self.TEXT = uno.Enum("com.sun.star.table.CellContentType", "TEXT")
    self.FORMULA = uno.Enum("com.sun.star.table.CellContentType", "FORMULA")
    self.VALUE = uno.Enum("com.sun.star.table.CellContentType", "VALUE")

10、替换excel的文本信息

def ImportFromJson(self, data):
    doc = json.loads(data)
    try:
      self.__WriteText(doc['doc'])
    except:
      pass
def __WriteText(self, text):
    print("__WriteText begin:", text)
    sheet = None
    for it in text:
      if 'paragraph' in it and 'sheetname' in it:
        if sheet == None or sheet.getName() != it['sheetname']:
          try:
            sheet = self.sheets.getByName(it['sheetname'])
            print("getsheet:", it['sheetname'], "=", sheet.getName())
          except:
            sheet = None
            continue
        self.__WriteParagraph(it, sheet)

  def __WriteParagraph(self, paragraph, sheet):
    print("__WriteParagraph")
    if paragraph['length'] > 0:
      try:
        x = paragraph['x']
        y = paragraph['y']
        print("getcell:", x, y)
        cell = sheet.getCellByPosition(x, y)
        print("getcell done")
      except:
        return
      if 'result' in paragraph:
        for it in paragraph['result']:
          print("cell=", cell.getString())
          cell.setString(it['trans_sen'])
          print("cell,", cell.getString(), ",done")

11、生成excel文档二进制数据

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.document.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'Calc MS Excel 2007 XML', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datao = streamo.readBytes(None, streamo.available())

12、生成excel的pdf文档

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.document.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'calc_pdf_Export', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datap = streamo.readBytes(None, streamo.available())

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python类型强制转换long to int的代码
Feb 10 Python
Python 异常处理实例详解
Mar 12 Python
python八大排序算法速度实例对比
Dec 06 Python
用Python编写一个简单的CS架构后门的方法
Nov 20 Python
python 执行文件时额外参数获取的实例
Dec 18 Python
python机器学习包mlxtend的安装和配置详解
Aug 21 Python
python3.x 生成3维随机数组实例
Nov 28 Python
Python终端输出彩色字符方法详解
Feb 11 Python
python实现简单俄罗斯方块
Mar 13 Python
详解基于Jupyter notebooks采用sklearn库实现多元回归方程编程
Mar 25 Python
Python替换NumPy数组中大于某个值的所有元素实例
Jun 08 Python
Python使用tkinter制作在线翻译软件
Feb 22 Python
Python实现12306火车票抢票系统
Jul 04 #Python
如何利用Pyecharts可视化微信好友
Jul 04 #Python
python 获取等间隔的数组实例
Jul 04 #Python
python 中pyqt5 树节点点击实现多窗口切换问题
Jul 04 #Python
Python机器学习算法库scikit-learn学习之决策树实现方法详解
Jul 04 #Python
Python 中PyQt5 点击主窗口弹出另一个窗口的实现方法
Jul 04 #Python
Python+opencv 实现图片文字的分割的方法示例
Jul 04 #Python
You might like
多重?l件?合查?(一)
2006/10/09 PHP
PHP字符编码问题之GB2312 VS UTF-8解决方法
2011/06/23 PHP
php字符串截取的简单方法
2013/07/04 PHP
PHP 5.5 创建和验证哈希最简单的方法详解
2013/11/07 PHP
php 获取页面中指定内容的实现类
2014/01/23 PHP
PHP 9 大缓存技术总结
2015/09/17 PHP
Yii2第三方类库插件Imagine的安装和使用
2017/07/06 PHP
PHP性能分析工具xhprof的安装使用与注意事项
2017/12/19 PHP
基于jquery的回到页面顶部按钮
2011/06/27 Javascript
jQuery 的全选(全非选)即取得被选中的值使用介绍
2013/11/12 Javascript
jquery动态添加元素事件失效问题解决方法
2014/05/23 Javascript
JavaScript使用replace函数替换字符串的方法
2015/04/06 Javascript
javascript学习笔记之函数定义
2015/06/25 Javascript
jQuery+HTML5实现图片上传前预览效果
2015/08/20 Javascript
js创建jsonArray传输至后台及后台全面解析
2016/04/11 Javascript
利用JavaScript判断浏览器类型及版本
2016/08/23 Javascript
jQuery扩展+xml实现表单验证功能的方法
2016/12/25 Javascript
jQuery 改变P标签文本值方法
2018/02/24 jQuery
Element-UI Table组件上添加列拖拽效果实现方法
2018/04/14 Javascript
webpack4 入门最简单的例子介绍
2018/09/05 Javascript
刷新页面后让控制台的js代码继续执行
2019/09/20 Javascript
vue实现井字棋游戏
2020/09/29 Javascript
JS操作JSON常用方法(10w阅读)
2020/12/06 Javascript
[00:33]DOTA2上海特级锦标赛 CDEC战队宣传片
2016/03/04 DOTA
[08:40]Navi Vs Newbee
2018/06/07 DOTA
Python的numpy库中将矩阵转换为列表等函数的方法
2018/04/04 Python
python模块smtplib学习
2018/05/22 Python
Python3中exp()函数用法分析
2019/02/19 Python
解决Python图形界面中设置尺寸的问题
2020/03/05 Python
Python内置方法和属性应用:反射和单例(推荐)
2020/06/19 Python
Django-celery-beat动态添加周期性任务实现过程解析
2020/11/26 Python
Otel.com:折扣酒店预订
2017/08/24 全球购物
一套SQL笔试题
2016/08/14 面试题
小组名称和口号
2014/06/09 职场文书
学校党的群众路线教育实践活动总结报告
2014/07/03 职场文书
2014年国庆节演讲稿
2014/09/02 职场文书