拒绝懒惰的家: Python Cookbook 3.6 自动查询假日信息

2007-08-13

Python Cookbook 3.6 自动查询假日信息

需求:

每个国家,地区,民族甚至是同一个公司的假期设置都不尽相同.你需要自动查出在两个日期间假期的个数.

讨论:

在两个日期之间,可能有日期不定的节日,比如美国的复活节和劳动节,还有根据复活节计算的节日,比如礼节日(Boxing Day),或者有固定日期的节日, 比如圣诞节.甚至你们公司自己的节日(比如老板的生日).你都可以使用datetime和第三方包util来处理这些情况.
一个比较方便的方法是将不同的情况分别处理,封装在不同的函数里:

import datetime
from dateutil import rrule, easter
try: set
except NameError: from sets import Set as set
def all_easter(start, end):
    # return the list of Easter dates within start..end
    easters = [easter.easter(y)
               for y in xrange(start.year, end.year+1)]
    return [d for d in easters if start<=d<=end]
def all_boxing(start, end):
    # return the list of Boxing Day dates within start..end
    one_day = datetime.timedelta(days=1)
    boxings = [easter.easter(y)+one_day
               for y in xrange(start.year, end.year+1)]
    return [d for d in boxings if start<=d<=end]
def all_christmas(start, end):
    # return the list of Christmas Day dates within start..end
    christmases = [datetime.date(y, 12, 25)
                   for y in xrange(start.year, end.year+1)]
    return [d for d in christmases if start<=d<=end]
def all_labor(start, end):
    # return the list of Labor Day dates within start..end
    labors = rrule.rrule(rrule.YEARLY, bymonth=9, byweekday=rrule.MO(1),
                         dtstart=start, until=end)
    return [d.date( ) for d in labors]   # no need to test for in-between here
def read_holidays(start, end, holidays_file='holidays.txt'):
    # return the list of dates from holidays_file within start..end
    try:
        holidays_file = open(holidays_file)
    except IOError, err:
        print 'cannot read holidays (%r):' % (holidays_file,), err
        return [ ]
    holidays = [ ]
    for line in holidays_file:
        # skip blank lines and comments
        if line.isspace( ) or line.startswith('#'):
            continue
        # try to parse the format: YYYY, M, D
        try:
            y, m, d = [int(x.strip( )) for x in line.split(',')]
            date = datetime.date(y, m, d)
        except ValueError:
            # diagnose invalid line and just go on
            print "Invalid line %r in holidays file %r" % (
                line, holidays_file)
            continue
        if start<=date<=end:
            holidays.append(date)
    holidays_file.close( )
    return holidays
holidays_by_country = {
    # map each country code to a sequence of functions
    'US': (all_easter, all_christmas, all_labor),
    'IT': (all_easter, all_boxing, all_christmas),
}
def holidays(cc, start, end, holidays_file=' holidays.txt'):
    # read applicable holidays from the file
    all_holidays = read_holidays(start, end, holidays_file)
    # add all holidays computed by applicable functions
    functions = holidays_by_country.get(cc, ( ))
    for function in functions:
        all_holidays += function(start, end)
    # eliminate duplicates
    all_holidays = list(set(all_holidays))
    # uncomment the following 2 lines to return a sorted list:
    # all_holidays.sort( )
    # return all_holidays
    return len(all_holidays)    # comment this out if returning list
if _ _name_ _ == '_ _main_ _':
    test_file = open('test_holidays.txt', 'w')
    test_file.write('2004, 9, 6\n')
    test_file.close( )
    testdates = [ (datetime.date(2004, 8, 1), datetime.date(2004, 11, 14)),
                  (datetime.date(2003, 2, 28), datetime.date(2003, 5, 30)),
                  (datetime.date(2004, 2, 28), datetime.date(2004, 5, 30)),
                ]
    def test(cc, testdates, expected):
        for (s, e), expect in zip(testdates, expected):
            print 'total holidays in %s from %s to %s is %d (exp %d)' % (
                    cc, s, e, holidays(cc, s, e, test_file.name), expect)
            print
    test('US', testdates, (1,1,1) )
    test('IT', testdates, (1,2,2) )
    import os
    os.remove(test_file.name)

在我工作的公司里面,有几个工会,公司的节假日由几个工会协商制定.另外,我需要将下雪天和发布产品的日期作为"官方"节假日.为了处理各种情况,比较方便的方法是将不同类型的节假日放在不同的函数中处理,比如上例中我们对all_Easter和all_labor的处理.对于各种不同的情况都有列出 ,所以你很容易写出自己的来.
尽管半开区间(区间左边界被包含而右边界不包含)是Python的标准(因为它有更好的灵活性以及能避免用户的一些使用问题),本节使用全封闭区间来处理.不幸的是,无论时间区间多明确的给出,dateutil还是那样工作.所以我们的选择很明显.
每个函数确保满足你的要求的日期才被返回:一个datetime.date实例列表被传递给函数,而且它们都是处于给定日期区间内的.比如,在all_labor里面,我们强制将使用dateutil的rrule的datetime.datetime类型由转换为datetime.date类型后返回 .
有些公司可能仅仅一次设置某些天为假期(比如下雪天),我们可以使用一个文本文件来保存这些数据.在我们的例子中,read_holidays函数用于处理和分析这个文件,你可以将这一部分代码用"fuzzy"日期分析器来处理.
如果你在程序运行过程中可能会多次查询,最好将读文件部分优化为一次处理,然后将内容保存在列表中以便后面使用.然而,"不成熟的优化是程序罪恶的根源"记得Knuth的话吗?避免使用哪怕最"显然"的优化,也要保证程序的清晰和灵活,假设我们的程序是运行在交互式环境中,每次读取文件避免了判断文件是否被修改的麻烦.
因为不同的国家的节假日不同,所以本节的代码也提供了一个holidays_by_country的字典.你可以根据不同的需要来更新这个字典.需要注意的是这个字典允许不同的生成器函数调用,依据你给定的国家代码.如果你的国家有很多州,你很容易创建一个基于州信息的字典,传递州编号来替代国家编号.holidays函数会调用合适的函数,并组合结果,去除重复数据,并返回结果的个数.当然,你也可以返回结果的列表,仅仅需要去掉上面代码中的两行注释就可以了.

标签： Python

# posted by tinylee @ 11:42 上午

Comments: 发表评论

<< Home

拒绝懒惰的家

2007-08-13

Python Cookbook 3.6 自动查询假日信息

我的简介

Links

archives