读书人

.net读取pdf文本(1)

发布时间: 2012-10-09 10:21:45 作者: rapoo

.net读取pdf文本(一)

最难的是转PDF啦!最开始是使用XPDF来做,但是语言那么多,编码那么杂,上哪里去找合适的办法啊,而且要求在运行时调用.EXE文件,估计异常一大堆。

索性去找PDFBox,而且要命的是传说这个不支持中文!这个是一个开源的java项目,编码出来当然是java的啦,怎么用.NET调用呢?

正在郁闷毛躁.net读取pdf文本(1)的时候,我看到了一个外国博客上的文章studentclub.ro/lucians_weblog/archive/2007/03/22/read-from-a-pdf-file-using-c.aspx

文章如下:

know, this may seem like a simple task, and you will probably find references on the web about how to do this. But I’ll also write a blog post on this topic, as I came across this problem today.

So, if you have a PDF file and don’t know how to read data from it, here it is what you could do.

?

First of all, you’ll need some DLLs that will help you manipulate the PDF files. I came across the PDFBox. What is PDFBox? I’ll cite from their website: PDFBox is an open source Java PDF library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities.

?

Oh, nice, you’ll say, but I need a .NET solution. Don’t worry. Even though PDFBox is written in Java, there is also a .NET version that is available. It utilizes IKVM (also, a very interesting project: an implementation of the Java language for .NET Framework and Mono) to create a fully functioning PDF library for the .NET framework. The released version contains a bin directory with all of the required DLL files.

?

So you’ll have to download the PDFBox package. In this package you’ll find a bin directory. To read your PDF file, you’ll need the following files:

读书人网 >PowerDesigner

热点推荐