2007-07-02

 

Python Cookbook 1.25 在Unix终端上转换HTML为文本

需求:

需要让HTML在unix终端上文本可视化,比如支持粗体和下划线.

讨论:

最简单的方法是实现一个过滤脚本,将HTML做为输入,处理文本,然后将终端做为输出.因为本节针对的对象是unix终端,我们可以使用tput命令来获得控制字符的信息,可以使用Python标准库的os模块的popen函数来实现:

#!/usr/bin/env python
import sys, os, htmllib, formatter
# use Unix tput to get the escape sequences for bold, underline, reset
set_bold = os.popen('tput bold').read( )
set_underline = os.popen('tput smul').read( )
perform_reset = os.popen('tput sgr0').read( )
class TtyFormatter(formatter.AbstractFormatter):
    ''' a formatter that keeps track of bold and italic font states, and
        emits terminal control sequences accordingly.
    '''
    def _ _init_ _(self, writer):
        # first, as usual, initialize the superclass
        formatter.AbstractFormatter._ _init_ _(self, writer)
        # start with neither bold nor italic, and no saved font state
        self.fontState = False, False
        self.fontStack = [  ]
    def push_font(self, font):
        # the `font' tuple has four items, we only track the two flags
        # about whether italic and bold are active or not
        size, is_italic, is_bold, is_tt = font
        self.fontStack.append((is_italic, is_bold))
        self._updateFontState( )
    def pop_font(self, *args):
        # go back to previous font state
        try:
            self.fontStack.pop( )
        except IndexError:
            pass
        self._updateFontState( )
    def updateFontState(self):
        # emit appropriate terminal control sequences if the state of
        # bold and/or italic(==underline) has just changed
        try:
            newState = self.fontStack[-1]
        except IndexError:
            newState = False, False
        if self.fontState != newState:
            # relevant state change: reset terminal
            print perform_reset,
            # set underine and/or bold if needed
            if newState[0]:
                print set_underline,
            if newState[1]:
                print set_bold,
            # remember the two flags as our current font-state
            self.fontState = newState
# make writer, formatter and parser objects, connecting them as needed
myWriter = formatter.DumbWriter( )
if sys.stdout.isatty( ):
    myFormatter = TtyFormatter(myWriter)
else:
    myFormatter = formatter.AbstractFormatter(myWriter)
myParser = htmllib.HTMLParser(myFormatter)
# feed all of standard input to the parser, then terminate operations
myParser.feed(sys.stdin.read( ))
myParser.close( )

由Python标准库提供的基本类formatter.AbstractFormatter,它可以在任何适用的场合工作.另一方面,子类TtyFormatter则更多关注本节的问题.它可以获得unix命令tput的输出,从而取得控制字符的信息,最后实现在终端上表现粗体和下划线.
许多系统并没有提供可使用的tput命令,如linux,Mac OS等,然而我们仍然可以使用TtyFormatter类,换句话说,本节的代码可以适用于任何*ix系统.
如果你的终端还支持别的控制字符,可以修改TtyFormatter类,比如,在windows操作系统,cmd.exe控制台支持ANSI控制字符,你可以适当的修改TtyFormatter来在windows平台运行该脚本.
另外的情况,你也许想使用别的命令来获得更多的特殊字符,如lynx -dump-  ,那你同样也可以修改该类来实现自己的需求,当然,你的机器上必须要有lynx命令哦.

相关说明:

os.popen(...)
    popen(command [, mode='r' [, bufsize]]) -> pipe
   
    Open a pipe to/from a command returning a file object.

标签:


Comments: 发表评论



<< Home

This page is powered by Blogger. Isn't yours?