读书人

怎么从这个特殊的字符串中将小弟我需要

发布时间: 2012-04-10 21:03:56 作者: rapoo

如何从这个特殊的字符串中将我需要的词提取出来
我现在有这样一个文件,保存着这么字符串:
{"sn":1,"ls":true,"bg":0,"ed":0,"ws":[
{"bg":0,"cw":[{"w":"3","sc":0}]},
{"bg":0,"cw":[{"w":"月","sc":0}]},
{"bg":0,"cw":[{"w":"20","sc":0}]},
{"bg":0,"cw":[{"w":"号","sc":0}]},
{"bg":0,"cw":[{"w":"合肥","sc":0}]},
{"bg":0,"cw":[{"w":"到","sc":0}]},
{"bg":0,"cw":[{"w":"北京","sc":0}]},
{"bg":0,"cw":[{"w":"的","sc":0}]},
{"bg":0,"cw":[{"w":"火车","sc":0}]},
{"bg":0,"cw":[{"w":"。","sc":0}]}]}
为了便于观察,我将每个要提取的字换行了,原本无换行。

我的问题是:提取出日期:3月20日,地点:合肥,地点:北京

我的想法是:先将其中所有的词(3月20号合肥到北京的火车)全提取出来,每个词都存到vector<String> 中,然后对于地点
我就建立一个很大的数组(保存所有的城市名),然后一个一个查找匹配,
日期就根据关键词提取:如在月前面提取3,号前面提取20,好像不是所有格情况都行, 因为这是个特殊情况,
还可能包含其他词语。

[解决办法]
那个
{"bg":0,"cw":[{"w":
是不是固定的?
直接判断字符串就行了。
这是个特殊情况,还可能包含其他词语。那就只有把其它情况列出来,才能知道如何处理了...
[解决办法]
格式固定的话 可以使用正则表达式
[解决办法]
{"w":"3","sc":0}

你要提取的就是 {"w":" 和 ","sc":0} 之间的值。这正是正则的强项

正则我凭印象写个:

\{\"\w\":\"(.*?)、",\"sc\":0\}

boost中有regex, xpressive好几个正则库
[解决办法]
用boost::tokenizer。
[解决办法]
前面格式固定"bg":0,"cw":[{"w":提取紧接着的“”内容就行了!
[解决办法]
以下内容来自regtest软件:
字符串:{"sn":1,"ls":true,"bg":0,"ed":0,"ws":[{"bg":0,"cw":[{"w":"3","sc":0}]},{"bg":0,"cw":[{"w":"月","sc":0}]},{"bg":0,"cw":[{"w":"20","sc":0}]},{"bg":0,"cw":[{"w":"号","sc":0}]},{"bg":0,"cw":[{"w":"合肥","sc":0}]},{"bg":0,"cw":[{"w":"到","sc":0}]},{"bg":0,"cw":[{"w":"北京","sc":0}]},{"bg":0,"cw":[{"w":"的","sc":0}]},{"bg":0,"cw":[{"w":"火车","sc":0}]},{"bg":0,"cw":[{"w":"。","sc":0}]}]}
正则表达式:\{\"bg\":0,\"cw\":\[\{\"w\":\"(.*?)\",\"sc\":0}]}
检索结果:
1.{"bg":0,"cw":[{"w":"3","sc":0}]}
(1).3
2.{"bg":0,"cw":[{"w":"月","sc":0}]}
(1).月
3.{"bg":0,"cw":[{"w":"20","sc":0}]}
(1).20
4.{"bg":0,"cw":[{"w":"号","sc":0}]}
(1).号
5.{"bg":0,"cw":[{"w":"合肥","sc":0}]}
(1).合肥


6.{"bg":0,"cw":[{"w":"到","sc":0}]}
(1).到
7.{"bg":0,"cw":[{"w":"北京","sc":0}]}
(1).北京
8.{"bg":0,"cw":[{"w":"的","sc":0}]}
(1).的
9.{"bg":0,"cw":[{"w":"火车","sc":0}]}
(1).火车
10.{"bg":0,"cw":[{"w":"。","sc":0}]}
(1).。


[解决办法]

C/C++ code
#include <stdio.h>char s[]="{\"sn\":1,\"ls\":true,\"bg\":0,\"ed\":0,\"ws\":[{\"bg\":0,\"cw\":[{\"w\":\"3\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"月\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"20\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"号\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"合肥\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"到\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"北京\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"的\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"火车\",\"sc\":0}]},{\"bg\":0,\"cw\":[{\"w\":\"。\",\"sc\":0}]}]}";char *p;char t[80];int v,n,k;void main() {    p=s;    while (1) {        k=sscanf(p,"{\"bg\":0,\"cw\":[{\"w\":\"%[^\"]\",\"sc\":0}]}%n",t,&n);        printf("k,t,n=%d,%s,%d\n",k,t,n);        if (1==k) {            p+=n;        } else if (0==k) {            p++;        } else {//EOF==k            break;        }    }    printf("End.\n");}//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=0,,0//k,t,n=1,3,32//k,t,n=0,3,32//k,t,n=1,月,33//k,t,n=0,月,33//k,t,n=1,20,33//k,t,n=0,20,33//k,t,n=1,号,33//k,t,n=0,号,33//k,t,n=1,合肥,35//k,t,n=0,合肥,35//k,t,n=1,到,33//k,t,n=0,到,33//k,t,n=1,北京,35//k,t,n=0,北京,35//k,t,n=1,的,33//k,t,n=0,的,33//k,t,n=1,火车,35//k,t,n=0,火车,35//k,t,n=1,。,33//k,t,n=0,。,33//k,t,n=0,。,33//k,t,n=-1,。,33//End. 

读书人网 >C++

热点推荐