如何读取Word中的文本?
小弟最近在搞个小东东,想把Word中的文本读取到数据库并储存,无奈C#水平差得一塌糊涂,所以在读取文本的时候步骤特繁琐,代码如下:
string filepath = @ "C:\\\WEBNEW\Word\FORMAL\test.doc ";
string filepath2 = @ "C:\\temp\temp.txt ";
if (File.Exists(filepath))
{
Word.Application newApp = new Word.Application();
// 指定源文件和目标文件
object Source = filepath;
object Target = filepath2;
object Unknown = Type.Missing;
// 打开要转换的Word文件
newApp.Documents.Open(ref Source, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);
// 指定文档的类型
object format = Word.WdSaveFormat.wdFormatText;
//改变文档类型
newApp.ActiveDocument.SaveAs(ref Target, ref format,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);
//关闭word实例
newApp.ActiveDocument.Close(ref Unknown, ref Unknown, ref Unknown);
newApp.Quit(ref Unknown,ref Unknown,ref Unknown);
StreamReader ts = new StreamReader(File.Open(filepath2, FileMode.Open, FileAccess.Read, FileShare.ReadWrite), System.Text.Encoding.Default);
MyData = ts.ReadToEnd();
MyData = MyData.Replace( "\r ", " <br> ");
MyData = MyData.Replace( "\n ", " <br> ");
MyData = MyData.Replace( " ' ", "’ ");
ts.Close();
File.Delete(filepath2);
}
采用的步骤是先将word转换为txt文档,再读取txt的文本内容。请问是否有更简单的办法可以直接提取出word中的纯文本呢???
[解决办法]
http://www.codeproject.com/aspnet/wordapplication.asp 看看这个连接,有帮助的