导入word到Fckeditor（java兑现）

导入word到Fckeditor（java实现）
最近项目可以说到达了一个里程碑,借这篇文章把前面的技术进行总结.

我们的项目是给一个政府单位开发的，后台其实是个CMS系统,客户非要完成一个功能就是把WORD直接导入到Web 编辑器中，我们是用的是Fckeditor2.5版本，这个功能让我很头疼，想了几天没有思路，但是忽然看到了网上的一篇文章地址如下：
http://topic.csdn.net/u/20091020/21/b77f825b-4a18-4a86-b642-8d38ffef9e12.html
3楼的哥们把代码贴了上了，不错的思路。

首先用调用COM组件把Word转为html ，然后通过截取重要的源代码，最后把这代码放到fck编辑器中,我在做的中间还遇到了很多技术细节问题，下面来看我的实现
使用jacob 来把word转成html
Java代码  /**        * 把word文件转换成html文件        *           * @param src        *                        原文件        * @param out        *                        目标文件        */       public static synchronized void word2Html(String src, String out) {         ActiveXComponent app = null;         try {           app = new ActiveXComponent("Word.Application");// 启动word              app.setProperty("Visible", new Variant(false));              // 设置word不可见           Dispatch docs = app.getProperty("Documents").toDispatch();              Dispatch doc = Dispatch.invoke(docs, "Open", Dispatch.Method, new Object[] { src, new Variant(false), new Variant(true) }, new int[1]).    toDispatch();           // 打开word文件 8转为 html 9转为 mht           Dispatch.invoke(doc, "SaveAs", Dispatch.Method, new Object    []    {out, new Variant(8) }, new int[1]);              Variant f = new Variant(false);              Dispatch.call(doc, "Close", f);            } catch (Exception e) {           e.printStackTrace();         } finally {         // 注意这里一定 要关闭否则服务器端会有很多winword.exe进程          app.invoke("Quit", new Variant[] {});           app = null;         }          }   [/color]Java代码  /**       * 把word文件转换成html文件       *          * @param src       *                        原文件       * @param out       *                        目标文件       */      public static synchronized void word2Html(String src, String out) {        ActiveXComponent app = null;        try {          app = new ActiveXComponent("Word.Application");// 启动word            app.setProperty("Visible", new Variant(false));            // 设置word不可见          Dispatch docs = app.getProperty("Documents").toDispatch();            Dispatch doc = Dispatch.invoke(docs, "Open", Dispatch.Method, new Object[] { src, new Variant(false), new Variant(true) }, new int[1]).   toDispatch();          // 打开word文件 8转为 html 9转为 mht          Dispatch.invoke(doc, "SaveAs", Dispatch.Method, new Object    []    {out, new Variant(8) }, new int[1]);            Variant f = new Variant(false);            Dispatch.call(doc, "Close", f);          } catch (Exception e) {          e.printStackTrace();        } finally {        // 注意这里一定 要关闭否则服务器端会有很多winword.exe进程         app.invoke("Quit", new Variant[] {});          app = null;        }        }  !-----------------------------&gt;[color=green]上面的代码其实完成的功能其实就是通过调用COM组件打开word程序然后隐藏窗口然后把打开的word文件另存为html.2.用Apache的CommonsIO读取文件Java代码  /**        * 根据文件名读取出html代码        *           * @param fileName        * @return        */       public static synchronized String getHtmlCode(String fileName) {            InputStream in = null;         String result = null;         try {           in = new FileInputStream(fileName);           result = IOUtils.toString(in, "gb2312");         } catch (Exception e) {           e.printStackTrace();         } finally {           IOUtils.closeQuietly(in);         }         return result;       }   Java代码  /**       * 根据文件名读取出html代码       *          * @param fileName       * @return       */      public static synchronized String getHtmlCode(String fileName) {          InputStream in = null;        String result = null;        try {          in = new FileInputStream(fileName);          result = IOUtils.toString(in, "gb2312");        } catch (Exception e) {          e.printStackTrace();        } finally {          IOUtils.closeQuietly(in);        }        return result;      }  [/color]!---------------------------&gt;[color=blue]默认转成的html文件就是gb2312编码的 这里注意你读取出来的字符串必须是包含空格的，意思就是把读取出来的字符串拷出来放到文本文档里面的代码和html的源代码格式完全一样.    3.截取body代码   Java代码  /**    * 截取body内容    *     * @param bodyCode    * @return    */   public static synchronized String performBodyCode(String htmlCode) {        String bodyCode = "";        // 处理body        int bodyIndex = htmlCode.indexOf("&lt;body");           int bodyEndIndex = htmlCode.indexOf("&lt;/html&gt;");           if (bodyIndex != -1 && bodyEndIndex != -1) {            htmlCode = htmlCode.substring(bodyIndex, bodyEndIndex);            //bodyCode = StringUtils.replace(htmlCode, "v:imagedata", "img");            //bodyCode = StringUtils.replace(bodyCode, "&lt;/v:imagedata&gt;", "");            bodyCode=htmlCode;         }        htmlCode = null;        return bodyCode;    } [/color]  !---------------------------------&gt;Java代码  /**   * 截取body内容   *    * @param bodyCode   * @return   */  public static synchronized String performBodyCode(String htmlCode) {       String bodyCode = "";       // 处理body       int bodyIndex = htmlCode.indexOf("&lt;body");         int bodyEndIndex = htmlCode.indexOf("&lt;/html&gt;");         if (bodyIndex != -1 && bodyEndIndex != -1) {           htmlCode = htmlCode.substring(bodyIndex, bodyEndIndex);           //bodyCode = StringUtils.replace(htmlCode, "v:imagedata", "img");           //bodyCode = StringUtils.replace(bodyCode, "&lt;/v:imagedata&gt;", "");           bodyCode=htmlCode;        }       htmlCode = null;       return bodyCode;   }  !-------------------------------&gt;[color=indigo]转成的html代码中很多一部分是无用的代码 我们需要对他进行减肥 已经标签的替换.  4.处理html代码中的style标签    Java代码  /**    * 处理Style标签中的内容    *     * @param htmlCode    * @return    */   public static synchronized String performStyleCode(String htmlCode) {        String result = "";           int index = 0;           int styleStartIndex = 0;           int styleEndIndex = 0;           // 截取&lt;style&gt;标签中开始部分的坐标           while (index &lt; htmlCode.length()) {            int styleIndexStartTemp = htmlCode.indexOf("&lt;style&gt;", index);               if (styleIndexStartTemp == -1) {                break;            }            int styleContentStartIndex = htmlCode.indexOf("&lt;!--", styleIndexStartTemp);               if (styleContentStartIndex - styleIndexStartTemp == 9) {                styleStartIndex = styleIndexStartTemp;                break;            }            index = styleIndexStartTemp + 7;        }           // 截取style标签中后面部分的坐标        index = 0;        while (index &lt; htmlCode.length()) {            int styleContentEndIndex = htmlCode.indexOf("--&gt;", index);               if (styleContentEndIndex == -1) {                break;            }            int styleEndIndexTemp = htmlCode.indexOf("&lt;/style&gt;", styleContentEndIndex);               if (styleEndIndexTemp - styleContentEndIndex == 5) {                styleEndIndex = styleEndIndexTemp;                break;            }            index = styleContentEndIndex + 4;        }           result = htmlCode.substring(styleStartIndex, styleEndIndex + 8);           return result;    }   [/color]!------------------------------&gt;Java代码  /**   * 处理Style标签中的内容   *    * @param htmlCode   * @return   */  public static synchronized String performStyleCode(String htmlCode) {       String result = "";         int index = 0;         int styleStartIndex = 0;         int styleEndIndex = 0;         // 截取&lt;style&gt;标签中开始部分的坐标         while (index &lt; htmlCode.length()) {           int styleIndexStartTemp = htmlCode.indexOf("&lt;style&gt;", index);             if (styleIndexStartTemp == -1) {               break;           }           int styleContentStartIndex = htmlCode.indexOf("&lt;!--", styleIndexStartTemp);             if (styleContentStartIndex - styleIndexStartTemp == 9) {               styleStartIndex = styleIndexStartTemp;               break;           }           index = styleIndexStartTemp + 7;       }         // 截取style标签中后面部分的坐标       index = 0;       while (index &lt; htmlCode.length()) {           int styleContentEndIndex = htmlCode.indexOf("--&gt;", index);             if (styleContentEndIndex == -1) {               break;           }           int styleEndIndexTemp = htmlCode.indexOf("&lt;/style&gt;", styleContentEndIndex);             if (styleEndIndexTemp - styleContentEndIndex == 5) {               styleEndIndex = styleEndIndexTemp;               break;           }           index = styleContentEndIndex + 4;       }         result = htmlCode.substring(styleStartIndex, styleEndIndex + 8);         return result;   }  /** * 处理Style标签中的内容 *  * @param htmlCode * @return */public static synchronized String performStyleCode(String htmlCode) {String result = "";int index = 0;int styleStartIndex = 0;int styleEndIndex = 0;// 截取&lt;style&gt;标签中开始部分的坐标while (index &lt; htmlCode.length()) {int styleIndexStartTemp = htmlCode.indexOf("&lt;style&gt;", index);if (styleIndexStartTemp == -1) {break;}int styleContentStartIndex = htmlCode.indexOf("&lt;!--", styleIndexStartTemp);if (styleContentStartIndex - styleIndexStartTemp == 9) {styleStartIndex = styleIndexStartTemp;break;}index = styleIndexStartTemp + 7;}// 截取style标签中后面部分的坐标index = 0;while (index &lt; htmlCode.length()) {int styleContentEndIndex = htmlCode.indexOf("--&gt;", index);if (styleContentEndIndex == -1) {break;}int styleEndIndexTemp = htmlCode.indexOf("&lt;/style&gt;", styleContentEndIndex);if (styleEndIndexTemp - styleContentEndIndex == 5) {styleEndIndex = styleEndIndexTemp;break;}index = styleContentEndIndex + 4;}result = htmlCode.substring(styleStartIndex, styleEndIndex + 8);return result;}      word转为html后里面有很多的style标签 其中    &lt;style&gt;         &lt;!---   内容省略          ---&gt;    &lt;style&gt;    类似于如上带html注释的style标签才是有用的 其余全是无用的.上面的代码就是把这有用的代码截取出来.如果你在第2部的时候格式读取正确,那么上面的代码截取出来的代码肯定没问题. !---------------------------------&gt;[color=indigo]5.处理word文件中的图片 Java代码  /**    * 处理body中的图片内容    * @param bodyContent    * @return    */   public static synchronized String performBodyImg(String bodyContent) {           //根据图片名称预览图片action的地址        String newImgSrc = "tumbnail.action?fileName=";           //存放word文件的物理位置        String filePath = ResourceBundle.getBundle("sysConfig").getString("userFilePath.word");           //存放图片的物理位置        String imgPath = ResourceBundle.getBundle("sysConfig").getString("userFilePath.image");           Parser parser = Parser.createParser(bodyContent, "gb2312");           ImgTagVisitor imgTag = new ImgTagVisitor();           try {            parser.visitAllNodesWith(imgTag);            // 得到所有图片地址            List&lt;String&gt; imgUrls = imgTag.getSrcStringList();               for (String url : imgUrls) {                String uuid = UUID.randomUUID().toString();                   String extName = url.substring(url.lastIndexOf("."));                   String newImgFileName = newImgSrc + uuid + extName;                   bodyContent = StringUtils.replace(bodyContent, url, newImgFileName);                   bodyContent = StringUtils.replace(bodyContent, url, newImgFileName);                   ImageUtils.copy(filePath + url, imgPath + uuid + extName);               }           } catch (ParserException e) {            e.printStackTrace();        }        String result = bodyContent;                //去除多余的代码        result = StringUtils.replace(result, "&lt;![endif]&gt;", "");           result = StringUtils.replace(result, "&lt;![if !vml]&gt;", "");           bodyContent = null;           return result;    }   [/color]!-------------------------------&gt;Java代码  /**   * 处理body中的图片内容   * @param bodyContent   * @return   */  public static synchronized String performBodyImg(String bodyContent) {         //根据图片名称预览图片action的地址       String newImgSrc = "tumbnail.action?fileName=";         //存放word文件的物理位置       String filePath = ResourceBundle.getBundle("sysConfig").getString("userFilePath.word");         //存放图片的物理位置       String imgPath = ResourceBundle.getBundle("sysConfig").getString("userFilePath.image");         Parser parser = Parser.createParser(bodyContent, "gb2312");         ImgTagVisitor imgTag = new ImgTagVisitor();         try {           parser.visitAllNodesWith(imgTag);           // 得到所有图片地址           List&lt;String&gt; imgUrls = imgTag.getSrcStringList();             for (String url : imgUrls) {               String uuid = UUID.randomUUID().toString();                 String extName = url.substring(url.lastIndexOf("."));                 String newImgFileName = newImgSrc + uuid + extName;                 bodyContent = StringUtils.replace(bodyContent, url, newImgFileName);                 bodyContent = StringUtils.replace(bodyContent, url, newImgFileName);                 ImageUtils.copy(filePath + url, imgPath + uuid + extName);             }         } catch (ParserException e) {           e.printStackTrace();       }       String result = bodyContent;              //去除多余的代码       result = StringUtils.replace(result, "&lt;![endif]&gt;", "");         result = StringUtils.replace(result, "&lt;![if !vml]&gt;", "");         bodyContent = null;         return result;   }  /** * 处理body中的图片内容 * @param bodyContent * @return */public static synchronized String performBodyImg(String bodyContent) {    //根据图片名称预览图片action的地址String newImgSrc = "tumbnail.action?fileName=";//存放word文件的物理位置String filePath = ResourceBundle.getBundle("sysConfig").getString("userFilePath.word");//存放图片的物理位置String imgPath = ResourceBundle.getBundle("sysConfig").getString("userFilePath.image");Parser parser = Parser.createParser(bodyContent, "gb2312");ImgTagVisitor imgTag = new ImgTagVisitor();try {parser.visitAllNodesWith(imgTag);// 得到所有图片地址List&lt;String&gt; imgUrls = imgTag.getSrcStringList();for (String url : imgUrls) {String uuid = UUID.randomUUID().toString();String extName = url.substring(url.lastIndexOf("."));String newImgFileName = newImgSrc + uuid + extName;bodyContent = StringUtils.replace(bodyContent, url, newImgFileName);bodyContent = StringUtils.replace(bodyContent, url, newImgFileName);ImageUtils.copy(filePath + url, imgPath + uuid + extName);}} catch (ParserException e) {e.printStackTrace();}String result = bodyContent;//去除多余的代码result = StringUtils.replace(result, "&lt;![endif]&gt;", "");result = StringUtils.replace(result, "&lt;![if !vml]&gt;", "");bodyContent = null;return result;}  上面的代码中用到了开源的html解析工具htmlparser 用他来进行分析得到所有图片的链接 然后把图片的链接用Apache的Commons-lang包中的StrutsUtils替换成我修改了fck中预览图片的action     下面是我自己实现ImgTagVisitor 代码     Java代码  package com.bettem.cms.web.utils.htmlparser;        import java.util.ArrayList;     import java.util.List;        import org.htmlparser.Tag;     import org.htmlparser.Text;     import org.htmlparser.visitors.NodeVisitor;     /**        *           * 说明：htmlparser 解析 Img 标签所用类        * *******************        * 日期 人员            * 2010-2-3 Liqiang        */     public class ImgTagVisitor extends NodeVisitor {          private List&lt;String&gt; srcList;       private StringBuffer textAccumulator;          public ImgTagVisitor() {         srcList = new ArrayList&lt;String&gt;();            textAccumulator = new StringBuffer();          }          public void visitTag(Tag tag) {         if (tag.getTagName().equalsIgnoreCase("img")) {           srcList.add(tag.getAttribute("src"));         }       }          public List&lt;String&gt; getSrcStringList() {         return srcList;       }          public void visitStringNode(Text stringNode) {         String text = stringNode.getText();            textAccumulator.append(text);       }          public String getText() {            return textAccumulator.toString();       }        }   Java代码  package com.bettem.cms.web.utils.htmlparser;      import java.util.ArrayList;    import java.util.List;      import org.htmlparser.Tag;    import org.htmlparser.Text;    import org.htmlparser.visitors.NodeVisitor;    /**       *          * 说明：htmlparser 解析 Img 标签所用类       * *******************       * 日期 人员           * 2010-2-3 Liqiang       */    public class ImgTagVisitor extends NodeVisitor {        private List&lt;String&gt; srcList;      private StringBuffer textAccumulator;        public ImgTagVisitor() {        srcList = new ArrayList&lt;String&gt;();          textAccumulator = new StringBuffer();        }        public void visitTag(Tag tag) {        if (tag.getTagName().equalsIgnoreCase("img")) {          srcList.add(tag.getAttribute("src"));        }      }        public List&lt;String&gt; getSrcStringList() {        return srcList;      }        public void visitStringNode(Text stringNode) {        String text = stringNode.getText();          textAccumulator.append(text);      }        public String getText() {          return textAccumulator.toString();      }      }  package com.bettem.cms.web.utils.htmlparser; import java.util.ArrayList; import java.util.List; import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.visitors.NodeVisitor; /**     *        * 说明：htmlparser 解析 Img 标签所用类     * *******************     * 日期 人员         * 2010-2-3 Liqiang     */ public class ImgTagVisitor extends NodeVisitor {   private List&lt;String&gt; srcList;   private StringBuffer textAccumulator;   public ImgTagVisitor() {     srcList = new ArrayList&lt;String&gt;();     textAccumulator = new StringBuffer();   }   public void visitTag(Tag tag) {     if (tag.getTagName().equalsIgnoreCase("img")) {       srcList.add(tag.getAttribute("src"));     }   }   public List&lt;String&gt; getSrcStringList() {     return srcList;   }   public void visitStringNode(Text stringNode) {     String text = stringNode.getText();     textAccumulator.append(text);   }   public String getText() {     return textAccumulator.toString();   } }    6.移除多余的v:imagedata标签   Java代码  /**    * 移除多余的v:imagedata标签    * @param content    * @return    */   public static synchronized String removeImagedataTag(String content) {        Parser parser = null;        Lexer lexer = null;        AndFilter andFilter = null;        NodeList nl = null;        try {            parser = new Parser(content, Parser.STDOUT);            lexer = new Lexer(content);            andFilter = new AndFilter(new NotFilter(new TagNameFilter("v:imagedata")), new NotFilter(new TagNameFilter("v:imagedata")));            nl = parser.extractAllNodesThatMatch(andFilter);        } catch (ParserException e) {            e.printStackTrace();        }        return nl.toHtml();       }   Java代码  /**   * 移除多余的v:imagedata标签   * @param content   * @return   */  public static synchronized String removeImagedataTag(String content) {       Parser parser = null;       Lexer lexer = null;       AndFilter andFilter = null;       NodeList nl = null;       try {           parser = new Parser(content, Parser.STDOUT);           lexer = new Lexer(content);           andFilter = new AndFilter(new NotFilter(new TagNameFilter("v:imagedata")), new NotFilter(new TagNameFilter("v:imagedata")));           nl = parser.extractAllNodesThatMatch(andFilter);       } catch (ParserException e) {           e.printStackTrace();       }       return nl.toHtml();     }  /** * 移除多余的v:imagedata标签 * @param content * @return */public static synchronized String removeImagedataTag(String content) {Parser parser = null;Lexer lexer = null;AndFilter andFilter = null;NodeList nl = null;try {parser = new Parser(content, Parser.STDOUT);lexer = new Lexer(content);andFilter = new AndFilter(new NotFilter(new TagNameFilter("v:imagedata")), new NotFilter(new TagNameFilter("v:imagedata")));nl = parser.extractAllNodesThatMatch(andFilter);} catch (ParserException e) {e.printStackTrace();}return nl.toHtml();}       在word转html的时候大图片会被自动压缩成小图片 但是原来的大图片还会存在在代码里，上面的代码把多余的标签过滤掉.  最后看下我action中的代码  Java代码  /**          * 导入word文件          *             * @return          */          public synchronized String exportWord()          {                  String content = null;                  String path = ResourceBundle.getBundle("sysConfig").getString("userFilePath.word");                  InputStream ins = null;                  OutputStream wordFile = null;                  String htmlPath = null;                  String wordPath = null;                  // 处理上传的word文件                  try                  {                          String uuid = UUID.randomUUID().toString();                          // 截取扩展名                          String fileName = uuid + filedataFileName.substring(filedataFileName.lastIndexOf("."));                          // 生存html文件名                          String wordHtmlFileName = uuid + ".html";                          ins = new FileInputStream(filedata);                          wordPath = path + fileName;                          wordFile = new FileOutputStream(wordPath);                             IOUtils.copy(ins, wordFile);                             // word转html                             htmlPath = path + wordHtmlFileName;                             WordUtils.word2Html(wordPath, htmlPath);                          String wordHtmlContent = WordUtils.getHtmlCode(htmlPath);                          // 处理样式                          String styleCode = WordUtils.performStyleCode(wordHtmlContent);                             String bodyCode = WordUtils.performBodyCode(wordHtmlContent);                          // 处理文章中的图片                          bodyCode = WordUtils.performBodyImg(bodyCode);                             content = styleCode + bodyCode;                          styleCode = null;                          bodyCode = null;                             WordUtils.removeImagedataTag(content);                     }                  catch (FileNotFoundException e)                  {                          e.printStackTrace();                  }                  catch (IOException e)                  {                          e.printStackTrace();                  }                  finally                  {                          IOUtils.closeQuietly(wordFile);                          IOUtils.closeQuietly(ins);                          try                          {                                  File word = new File(wordPath);                                  File file = new File(htmlPath);                                  if (file.exists())                                  {                                          file.delete();                                          word.delete();                                          FileUtils.deleteDirectory(new File(htmlPath.substring(0, htmlPath.lastIndexOf(".")) + ".files"));                                  }                             }                          catch (IOException e)                          {                                  e.printStackTrace();                          }                  }                     // 读取word文件内容，添加到content中                  // 放到request中                  ServletActionContext.getRequest().setAttribute("content", content);                  ServletActionContext.getRequest().setAttribute("add", true);                  return SUCCESS;          }   Java代码  /**         * 导入word文件         *            * @return         */         public synchronized String exportWord()         {                 String content = null;                 String path = ResourceBundle.getBundle("sysConfig").getString("userFilePath.word");                 InputStream ins = null;                 OutputStream wordFile = null;                 String htmlPath = null;                 String wordPath = null;                 // 处理上传的word文件                 try                 {                         String uuid = UUID.randomUUID().toString();                         // 截取扩展名                         String fileName = uuid + filedataFileName.substring(filedataFileName.lastIndexOf("."));                         // 生存html文件名                         String wordHtmlFileName = uuid + ".html";                         ins = new FileInputStream(filedata);                         wordPath = path + fileName;                         wordFile = new FileOutputStream(wordPath);                           IOUtils.copy(ins, wordFile);                           // word转html                           htmlPath = path + wordHtmlFileName;                           WordUtils.word2Html(wordPath, htmlPath);                         String wordHtmlContent = WordUtils.getHtmlCode(htmlPath);                         // 处理样式                         String styleCode = WordUtils.performStyleCode(wordHtmlContent);                           String bodyCode = WordUtils.performBodyCode(wordHtmlContent);                         // 处理文章中的图片                         bodyCode = WordUtils.performBodyImg(bodyCode);                           content = styleCode + bodyCode;                         styleCode = null;                         bodyCode = null;                           WordUtils.removeImagedataTag(content);                   }                 catch (FileNotFoundException e)                 {                         e.printStackTrace();                 }                 catch (IOException e)                 {                         e.printStackTrace();                 }                 finally                 {                         IOUtils.closeQuietly(wordFile);                         IOUtils.closeQuietly(ins);                         try                         {                                 File word = new File(wordPath);                                 File file = new File(htmlPath);                                 if (file.exists())                                 {                                         file.delete();                                         word.delete();                                         FileUtils.deleteDirectory(new File(htmlPath.substring(0, htmlPath.lastIndexOf(".")) + ".files"));                                 }                           }                         catch (IOException e)                         {                                 e.printStackTrace();                         }                 }                   // 读取word文件内容，添加到content中                 // 放到request中                 ServletActionContext.getRequest().setAttribute("content", content);                 ServletActionContext.getRequest().setAttribute("add", true);                 return SUCCESS;         }
导入word到Fckeditor（java兑现）

热点推荐