读书人

采集豆瓣遇到乱码有关问题几天了还没

发布时间: 2014-04-19 16:19:00 作者: rapoo

采集豆瓣遇到乱码问题,几天了还没解决,求高手
我的环境是windows7 32位,用的IDE是wingide,已经把wingide设置为utf8,运行后始终显示乱码,很奇怪:

#coding:utf-8

import urllib
import urllib2
import re
import sys

default_encoding = 'utf-8'
if sys.getdefaultencoding() != default_encoding:
reload(sys)
sys.setdefaultencoding(default_encoding)

#豆瓣电影分类
#doubanlist = ["剧情","喜剧","动作","爱情","科幻","动画","悬疑","惊悚","恐怖","纪录片","短片","情色","同性","音乐","歌舞","家庭","儿童","传记","历史","战争","犯罪","西部","奇幻","冒险","灾难","武侠","古装","鬼怪","运动","戏曲"]
urls = "http://movie.douban.com/category/q"
headers = {
"Host":"movie.douban.com",
"Connection":"keep-alive",
"X-Requested-With":"XMLHttpRequest",
"Accept-Encoding":"gzip,deflate,sdch",
"Accept-Language":"zh-CN,zh;q=0.8,en;q=0.6",
"Content-Type":"application/x-www-form-urlencoded",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36",
"Referer":"http://movie.douban.com/category/"
}
postdata=urllib.urlencode({
"types[]":"",
"district":"",
"era":"",
"category":"all",
"unwatched":"false",
"available":"false",
"sortBy":"score",
"page":"1",
"ck":"null",
"source":"paginator",
"types[]":"剧情"
})
req = urllib2.Request(
url = urls,
data = postdata,
headers = headers
)
content = urllib2.urlopen(req).read()
print content




[解决办法]
"Accept-Encoding":"gzip,deflate,sdch"
得到压缩流直接打印自然乱,试试用gzip模块解压吧...

读书人网 >perl python

热点推荐