2007-07-02
Python Cookbook 1.25 在Unix终端上转换HTML为文本
需求:
需要让HTML在unix终端上文本可视化,比如支持粗体和下划线.
讨论:
最简单的方法是实现一个过滤脚本,将HTML做为输入,处理文本,然后将终端做为输出.因为本节针对的对象是unix终端,我们可以使用tput命令来获得控制字符的信息,可以使用Python标准库的os模块的popen函数来实现:
#!/usr/bin/env python
import sys, os, htmllib, formatter
# use Unix tput to get the escape sequences for bold, underline, reset
set_bold = os.popen('tput bold').read( )
set_underline = os.popen('tput smul').read( )
perform_reset = os.popen('tput sgr0').read( )
class TtyFormatter(formatter.AbstractFormatter):
''' a formatter that keeps track of bold and italic font states, and
emits terminal control sequences accordingly.
'''
def _ _init_ _(self, writer):
# first, as usual, initialize the superclass
formatter.AbstractFormatter._ _init_ _(self, writer)
# start with neither bold nor italic, and no saved font state
self.fontState = False, False
self.fontStack = [ ]
def push_font(self, font):
# the `font' tuple has four items, we only track the two flags
# about whether italic and bold are active or not
size, is_italic, is_bold, is_tt = font
self.fontStack.append((is_italic, is_bold))
self._updateFontState( )
def pop_font(self, *args):
# go back to previous font state
try:
self.fontStack.pop( )
except IndexError:
pass
self._updateFontState( )
def updateFontState(self):
# emit appropriate terminal control sequences if the state of
# bold and/or italic(==underline) has just changed
try:
newState = self.fontStack[-1]
except IndexError:
newState = False, False
if self.fontState != newState:
# relevant state change: reset terminal
print perform_reset,
# set underine and/or bold if needed
if newState[0]:
print set_underline,
if newState[1]:
print set_bold,
# remember the two flags as our current font-state
self.fontState = newState
# make writer, formatter and parser objects, connecting them as needed
myWriter = formatter.DumbWriter( )
if sys.stdout.isatty( ):
myFormatter = TtyFormatter(myWriter)
else:
myFormatter = formatter.AbstractFormatter(myWriter)
myParser = htmllib.HTMLParser(myFormatter)
# feed all of standard input to the parser, then terminate operations
myParser.feed(sys.stdin.read( ))
myParser.close( )
由Python标准库提供的基本类formatter.AbstractFormatter,它可以在任何适用的场合工作.另一方面,子类TtyFormatter则更多关注本节的问题.它可以获得unix命令tput的输出,从而取得控制字符的信息,最后实现在终端上表现粗体和下划线.
许多系统并没有提供可使用的tput命令,如linux,Mac OS等,然而我们仍然可以使用TtyFormatter类,换句话说,本节的代码可以适用于任何*ix系统.
如果你的终端还支持别的控制字符,可以修改TtyFormatter类,比如,在windows操作系统,cmd.exe控制台支持ANSI控制字符,你可以适当的修改TtyFormatter来在windows平台运行该脚本.
另外的情况,你也许想使用别的命令来获得更多的特殊字符,如lynx -dump- ,那你同样也可以修改该类来实现自己的需求,当然,你的机器上必须要有lynx命令哦.
相关说明:
os.popen(...)
popen(command [, mode='r' [, bufsize]]) -> pipe
Open a pipe to/from a command returning a file object.
需要让HTML在unix终端上文本可视化,比如支持粗体和下划线.
讨论:
最简单的方法是实现一个过滤脚本,将HTML做为输入,处理文本,然后将终端做为输出.因为本节针对的对象是unix终端,我们可以使用tput命令来获得控制字符的信息,可以使用Python标准库的os模块的popen函数来实现:
#!/usr/bin/env python
import sys, os, htmllib, formatter
# use Unix tput to get the escape sequences for bold, underline, reset
set_bold = os.popen('tput bold').read( )
set_underline = os.popen('tput smul').read( )
perform_reset = os.popen('tput sgr0').read( )
class TtyFormatter(formatter.AbstractFormatter):
''' a formatter that keeps track of bold and italic font states, and
emits terminal control sequences accordingly.
'''
def _ _init_ _(self, writer):
# first, as usual, initialize the superclass
formatter.AbstractFormatter._ _init_ _(self, writer)
# start with neither bold nor italic, and no saved font state
self.fontState = False, False
self.fontStack = [ ]
def push_font(self, font):
# the `font' tuple has four items, we only track the two flags
# about whether italic and bold are active or not
size, is_italic, is_bold, is_tt = font
self.fontStack.append((is_italic, is_bold))
self._updateFontState( )
def pop_font(self, *args):
# go back to previous font state
try:
self.fontStack.pop( )
except IndexError:
pass
self._updateFontState( )
def updateFontState(self):
# emit appropriate terminal control sequences if the state of
# bold and/or italic(==underline) has just changed
try:
newState = self.fontStack[-1]
except IndexError:
newState = False, False
if self.fontState != newState:
# relevant state change: reset terminal
print perform_reset,
# set underine and/or bold if needed
if newState[0]:
print set_underline,
if newState[1]:
print set_bold,
# remember the two flags as our current font-state
self.fontState = newState
# make writer, formatter and parser objects, connecting them as needed
myWriter = formatter.DumbWriter( )
if sys.stdout.isatty( ):
myFormatter = TtyFormatter(myWriter)
else:
myFormatter = formatter.AbstractFormatter(myWriter)
myParser = htmllib.HTMLParser(myFormatter)
# feed all of standard input to the parser, then terminate operations
myParser.feed(sys.stdin.read( ))
myParser.close( )
由Python标准库提供的基本类formatter.AbstractFormatter,它可以在任何适用的场合工作.另一方面,子类TtyFormatter则更多关注本节的问题.它可以获得unix命令tput的输出,从而取得控制字符的信息,最后实现在终端上表现粗体和下划线.
许多系统并没有提供可使用的tput命令,如linux,Mac OS等,然而我们仍然可以使用TtyFormatter类,换句话说,本节的代码可以适用于任何*ix系统.
如果你的终端还支持别的控制字符,可以修改TtyFormatter类,比如,在windows操作系统,cmd.exe控制台支持ANSI控制字符,你可以适当的修改TtyFormatter来在windows平台运行该脚本.
另外的情况,你也许想使用别的命令来获得更多的特殊字符,如lynx -dump- ,那你同样也可以修改该类来实现自己的需求,当然,你的机器上必须要有lynx命令哦.
相关说明:
os.popen(...)
popen(command [, mode='r' [, bufsize]]) -> pipe
Open a pipe to/from a command returning a file object.
标签: Python