由ftp4j导致的中文乱码问题的解决方法
?
由ftp4j导致的中文乱码问题的解决方法
本文来自:http://www.vktone.com/articles/ftp4j_chaos.html
ftp4j是一个FTP客户端Java类库,实现了FTP客户端应具有的大部分功能。可以用来传输文件(包括上传和下 载),浏览远程FTP服务器上的目录和文件,创建、删除、改名、移动远程目录和文件。ftp4j提供多种方式连接到远程FTP服务器包括:通过 TCP/IP直连,通过FTP代理、HTTP代理、SOCKS4/4a代理和SOCKS5代理连接、通过SSL安全连接等。
在本站的博客系统后台管理程序中就使用了ftp4j作为博客上传至空间的组件,但在开始使用的时候,发现将静态网页上传至空间后,用浏览器打开是乱码(惨不忍睹),要手动修改成UTF-8编码才能正常浏览,实际上所有网页是采用GBK编码的。
首页的html开头部分如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" lang="zh-CN"><head>??? <title>移动互联网技术及应用 - V客小站 - V客小站</title>
使用Linux下的hexdump工具查看如下,注意黄色背景的部分:
[root@localhost blog_wwwroot]# hexdump -C index.html | head -10000000000? 3c 21 44 4f 43 54 59 50? 45 20 68 74 6d 6c 20 50? |<!DOCTYPE html P|00000010? 55 42 4c 49 43 20 22 2d? 2f 2f 57 33 43 2f 2f 44? |UBLIC "-//W3C//D|00000020? 54 44 20 58 48 54 4d 4c? 20 31 2e 30 20 54 72 61? |TD XHTML 1.0 Tra|00000030? 6e 73 69 74 69 6f 6e 61? 6c 2f 2f 45 4e 22 20 22? |nsitional//EN" "|00000040? 68 74 74 70 3a 2f 2f 77? 77 77 2e 77 33 2e 6f 72? |http://www.w3.or|00000050? 67 2f 54 52 2f 78 68 74? 6d 6c 31 2f 44 54 44 2f? |g/TR/xhtml1/DTD/|00000060? 78 68 74 6d 6c 31 2d 74? 72 61 6e 73 69 74 69 6f? |xhtml1-transitio|00000070? 6e 61 6c 2e 64 74 64 22? 3e 0a 3c 68 74 6d 6c 20? |nal.dtd">.<html |00000080? 78 6d 6c 6e 73 3d 22 68? 74 74 70 3a 2f 2f 77 77? |xmlns="http://ww|00000090? 77 2e 77 33 2e 6f 72 67? 2f 31 39 39 39 2f 78 68? |w.w3.org/1999/xh|000000a0? 74 6d 6c 22 20 78 6d 6c? 3a 6c 61 6e 67 3d 22 7a? |tml" xml:lang="z|000000b0? 68 2d 43 4e 22 20 6c 61? 6e 67 3d 22 7a 68 2d 43? |h-CN" lang="zh-C|000000c0? 4e 22 3e 0a 3c 68 65 61? 64 3e 0a 09 3c 74 69 74? |N">.<head>..<tit|000000d0? 6c 65 3e d2 c6 b6 af bb? a5 c1 aa cd f8 bc bc ca? |le>.............|000000e0? f5 bc b0 d3 a6 d3 c3 20? 2d 20 56 bf cd d0 a1 d5? |....... - V.....|000000f0? be 20 2d 20 56 bf cd d0? a1 d5 be 3c 2f 74 69 74? |. - V......</tit|00000100? 6c 65 3e 0a 09 3c 6d 65? 74 61 20 68 74 74 70 2d? |le>..<meta http-|下面是从空间下载的文件内容,其中黄色背景的部分与上面对应:(很显然,是不一样的)[root@localhost ~]# curl -s http://www.vktone.com/ | hexdump -C index.html | head -10000000000? 3c 21 44 4f 43 54 59 50? 45 20 68 74 6d 6c 20 50? |<!DOCTYPE html P|00000010? 55 42 4c 49 43 20 22 2d? 2f 2f 57 33 43 2f 2f 44? |UBLIC "-//W3C//D|00000020? 54 44 20 58 48 54 4d 4c? 20 31 2e 30 20 54 72 61? |TD XHTML 1.0 Tra|00000030? 6e 73 69 74 69 6f 6e 61? 6c 2f 2f 45 4e 22 20 22? |nsitional//EN" "|00000040? 68 74 74 70 3a 2f 2f 77? 77 77 2e 77 33 2e 6f 72? |http://www.w3.or|00000050? 67 2f 54 52 2f 78 68 74? 6d 6c 31 2f 44 54 44 2f? |g/TR/xhtml1/DTD/|00000060? 78 68 74 6d 6c 31 2d 74? 72 61 6e 73 69 74 69 6f? |xhtml1-transitio|00000070? 6e 61 6c 2e 64 74 64 22? 3e 0a 3c 68 74 6d 6c 20? |nal.dtd">.<html |00000080? 78 6d 6c 6e 73 3d 22 68? 74 74 70 3a 2f 2f 77 77? |xmlns="http://ww|00000090? 77 2e 77 33 2e 6f 72 67? 2f 31 39 39 39 2f 78 68? |w.w3.org/1999/xh|000000a0? 74 6d 6c 22 20 78 6d 6c? 3a 6c 61 6e 67 3d 22 7a? |tml" xml:lang="z|000000b0? 68 2d 43 4e 22 20 6c 61? 6e 67 3d 22 7a 68 2d 43? |h-CN" lang="zh-C|000000c0? 4e 22 3e 0a 3c 68 65 61? 64 3e 0a 09 3c 74 69 74? |N">.<head>..<tit|000000d0? 6c 65 3e e7 bb 89 e8 af? b2 e5 a7 a9 e6 b5 9c e6? |le>.............|000000e0? 8e 95 e4 bb 88 e7 bc 83? e6 88 9e e5 a6 a7 e9 8f? |................|000000f0? 88 ee 88 9a e5 bc b7 e6? 90 b4 e6 97 82 e6 95 a4? |................|00000100? 20 2d 20 56 e7 80 b9 e3? 88 a0 e7 9a ac e7 bb 94? | - V............|00000110? ef bf bd 20 2d 20 56 e7? 80 b9 e3 88 a0 e7 9a ac? |... - V.........|00000120? e7 bb 94 ef bf bd 2f 74? 69 74 6c 65 3e 0a 09 3c? |....../title>..<|
仔细想了一下,我本地文件的编码为GBK,而用浏览器浏览空间里的文件需要改成UTF-8编码才对,这应该是ftp4j在上传文件时进行了编码转换(或者它默认把文件当成UTF-8编码的)。为了避免ftp4j自动转换,应该采用binary方式传送,在上传之前进行设置即可。client.setType(FTPClient.TYPE_BINARY);client.upload(filename);下面的文字是从ftp4j的官方文档中摘录下来的,说明了ftp4j对文本和二进制传输方式的处理方式:http://www.sauronsoftware.it/projects/ftp4j/manual.php?PHPSESSID=l7o2bb276feu51v4p8cih0sq81#16Another data transfer key concept concerns the binary and the textual types. When a transfer is binary the file is treated as a binary stream, and it is stored by the target machine as it is received from the source. A textual data transfer, instead, treats the transferred file as a character stream, performing charset transformation. Suppose your client is running on a Windows platform, while the server runs on UNIX, whose default charsets are usually different. The client send a file to the server selecting textual type. The client assumes that the file is encoded with the machine standard charset, so it decodes every character and encodes it in an intermediate charset before sending. The server receives the stream, decode the intermediate charset and encodes the file with its machine default charset before storing. Bytes has been changed, but contents are the same.You can choose your transfer type calling:client.setType(FTPClient.TYPE_TEXTUAL);client.setType(FTPClient.TYPE_BINARY);client.setType(FTPClient.TYPE_AUTO);The TYPE_AUTO constant, which is also the default one, let the client pick the type automatically: a textual transfer will be performed if the extension of the file is between the ones the client recognizes as textual type markers. File extensions are sniffed through a FTPTextualExtensionRecognizer (it.sauronsoftware.ftp4j.FTPTextualExtensionRecognizer) instance. The default extension recognizer, which is an instance of it.sauronsoftware.ftp4j.recognizers.DefaultTextualExtensionRecognizer, recognizes these extensions as textual ones:
abc acgi aip asm asp c c cc cc com conf cppcsh css cxx def el etx f f f77 f90 f90 flxfor for g h h hh hh hlb htc htm html htmlshtt htx idc jav jav java java js ksh listlog lsp lst lsx m m mar mcf p pas php pl plpm py rexx rt rt rtf rtx s scm scm sdml sgmsgm sgml sgml sh shtml shtml spc ssi talktcl tcsh text tsv txt uil uni unis uri urisuu uue vcs wml wmls wsc xml zshYou can build your own recognizer implementing the FTPTextualExtensionRecognizer interface, but maybe you'll like more to instance the convenience class ParametricTextualExtensionRecognizer (it.sauronsoftware.ftp4j.recognizers.ParametricTextualExtensionRecognizer). Anyway, don't forget to plug your recognizer in the client:
client.setTextualExtensionRecognizer(myRecognizer);
本文来自:http://www.vktone.com/articles/ftp4j_chaos.html
更多精彩内容请访问:http://www.vktone.com/
?