读书人

java读取pdf文件类型的源

发布时间: 2012-09-05 15:19:35 作者: rapoo

java读取pdf文件类型的流
在控制台只输出了第一页的内容,其余的都没有读出来,控制台的信息是:

2012-7-31 11:22:26 org.apache.pdfbox.util.PDFStreamEngine processOperator
信息: unsupported/disabled operation: EI



求解决方法

[解决办法]
public String getPdfContent(String filePath){
String excute="pdftotext";

String[] cmd=new String[]{excute, "-enc", "UTF-8", "-q", filePath,"-"};
Process p=null;
try {
p=Runtime.getRuntime().exec(cmd);
} catch (IOException e) {
e.printStackTrace();
}

BufferedInputStream bis=new BufferedInputStream(p.getInputStream());

InputStreamReader reader=null;

try {
reader=new InputStreamReader(bis,"UTF-8");
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
}

StringBuffer sb=new StringBuffer();

try {
BufferedReader br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();
while (line != null) {
sb.append(line);
sb.append(" ");
line = br.readLine();
}
} catch (Exception e) {
e.printStackTrace();
}

return sb.toString();
}

读书人网 >Eclipse开发

热点推荐