请教提取字符串的问题
我想在一个HTML中提取一段字符 HTML结构如下:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN ">
<HTML>
<HEAD>
<meta name= "GENERATOR " content= "Microsoft® HTML Help Workshop 4.1 ">
<!-- Sitemap 1.0 -->
</HEAD> <BODY>
<UL>
<LI> <OBJECT type= "text/sitemap ">
<param name= "Name " value= "Robst ">
<param name= "Name " value= "CopyRight ">
<param name= "Local " value= "Introduction\CopyRight.htm ">
<param name= "Name " value= "FAQ ">
<param name= "Local " value= "FAQ\FAQ.htm ">
<param name= "Name " value= "Welcome to Robst ">
<param name= "Local " value= "Welcome\welcome.htm ">
<param name= "URL " value= "Introduction\KnowRobst.htm ">
<param name= "Name " value= "Known BlueSoleil ">
</OBJECT>
<LI> <OBJECT type= "text/sitemap ">
<param name= "Name " value= "Dial-Up Networking ">
<param name= "Name " value= "Mobile ">
<param name= "Local " value= "Connection\Mobile\Mobile.htm ">
</OBJECT>
<LI> <OBJECT type= "text/sitemap ">
<param name= "Name " value= "Environment ">
<param name= "Local " value= "Welcome\welcome.htm ">
</OBJECT>
.................
我想提取每一个 <LI> <OBJECT type= "text/sitemap ">
之后第一行的value的值,引号中的字符串。就是 <param name= "Name " value= "Robst "> 中我提取出 Robst,并写入新文件中, 其他行不管,直到下一个 <LI> <OBJECT type= "text/sitemap "> 。
请问该如何做啊 我对字符操作不是很熟,请各位帮帮忙吧 谢谢!!
[解决办法]
去学“正则表达式”
[解决办法]
那就用string类的find吧,自己看看string类的帮助。
[解决办法]
"text/sitemap "> \s.*value= "(.*)[^ "] " //貌似可以
boss 真傻
[解决办法]
我也够傻 上边错了是这个。
"text/sitemap "> \s.*value= "[^ "](.*) "
[解决办法]
不能用正则那就 手动解析吧。
test.txt文件内容:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN ">
<HTML>
<HEAD>
<meta name= "GENERATOR " content= "Microsoft® HTML Help Workshop 4.1 ">
<!-- Sitemap 1.0 -->
</HEAD> <BODY>
<UL>
<LI> <OBJECT type= "text/sitemap ">
<param name= "Name " value= "Robst ">
<param name= "Name " value= "CopyRight ">
<param name= "Local " value= "Introduction\CopyRight.htm ">
</OBJECT>
<LI> <OBJECT type= "text/sitemap ">
<param name= "Name " value= "Dial-Up Networking ">
<param name= "Name " value= "Mobile ">
<param name= "Local " value= "Connection\Mobile\Mobile.htm ">
</OBJECT>
#include <fstream>
//#include <string>
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
string line, value;
int flag=0;
ifstream infile( "test.txt ");
while(!infile.eof())
{
getline(infile, line);
if(line == " </OBJECT> ")
flag=0;
if(flag)
{
value=line.substr(line.find( "value ")+7); //截取 value= " 后面的string
value=value.substr(0, value.length()-2); //去除后面 ">
cout < <value < <endl; //输出结果,这个结果也可以另外处理,比如写到其他文件
}
if(line == " <LI> <OBJECT type=\ "text/sitemap\ "> ")
flag=1;
}
system( "pause ");
return 0;
}