读书人

[D]从词典淘选所有符合某种“模式”的

发布时间: 2012-09-02 21:00:34 作者: rapoo

[D]从词典筛选所有符合某种“模式”的成语
比如“AABC型”格格不入、高高在上;“ABCA型” 豆萁燃豆、年复一年 “ABBC型”:不了了之
词典下载地址:http://ishare.iask.sina.com.cn/f/5053238.html


---------------------------
Double行动:
原帖分数:60
帖子加分:60


[解决办法]
貌似没什么意义啊。windows 下这个貌似可以:

Perl code
#!/usr/bin/env perluse Encode qw(encode decode);$aabc = qr/((\p{Han})\2\p{Han}{2})/;$abca = qr/((\p{Han})\p{Han}{2}\2)/;$abbc = qr/(\p{Han}(\p{Han})\2\p{Han})/;sub do_dict {    $pat = shift;    $num = 0;    open $fd, '<', '中国成语大辞典.txt' or die $!;    print '-' x 80, "\n";    while (<$fd>) {        $num++;        $line = decode('GB2312', $_);        $found = 0;        while ($line =~ /\s$pat\s/g) {            if (!$found) {                print "$num:";                $found++;            }            $words = encode('GB2312', $1);            print " $words";        }        print "\n" if $found;    }    close $fd or die $!;    print '-' x 80, "\n";}do_dict($aabc);do_dict($abca);do_dict($abbc);
[解决办法]
探讨

引用:

貌似没什么意义啊。windows 下这个貌似可以:

Perl code
#!/usr/bin/env perl

use Encode qw(encode decode);

$aabc = qr/((\p{Han})\2\p{Han}{2})/;
$abca = qr/((\p{Han})\p{Han}{2}\2)/;
$abbc = qr/……

[解决办法]
one way is to build an index of the idioms, for example: "格格不入" has pattern 'aabc', "豆萁燃豆" has pattern 'abca', etc.

Here is the code to get the index of a given idiom:

Python code
>>> def pattern_code(x):    index = {}    nextletter = ord('a')    for c in x:        if c not in index:            index[c] = chr(nextletter)            nextletter += 1    return ''.join(index[c] for c in x)>>> pattern_code('xxir')'aabc'>>> pattern_code('格格不入')'aabc' 

读书人网 >perl python

热点推荐