读取网页源码时出现乱码,求解决方案!
我试过用三种方法去读取,碰到utf-8的网页,得到的中文全是乱码。以前用java和c#做过类似的功能,在java中通过转换流的编码,c#通过改http头都可以解决乱码问题。但在c++中似乎都不管用,求适合的解决方案。
最好能通过设置http请求使得到的直接就是正确的内容,这样省效率。
如果不行,那就用转换编码的方式把得到的乱码转成正确的中文。
下面给出用过的三种方法,给大家做个参考。试过用socket,但成功率不理想,所以都是用http请求的方式。
1 用Wininet的Interne系列函数
- C/C++ code
#include <iostream>#include <stdio.h>#include <string>#include <windows.h>#include <wininet.h>#include <tchar.h>#include <stdlib.h>#pragma comment(lib, "Wininet.lib") using namespace std;……string* html=new string;HINTERNET hSession = InternetOpen(_T("UrlTest"), INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);if(hSession != NULL){ HINTERNET hHttp = InternetOpenUrl(hSession, _T("http://www.moko.cc"), NULL, 0, INTERNET_FLAG_DONT_CACHE, 0); if (hHttp != NULL){ //wprintf_s(_T("%s\n"), url); t_char Temp[MAXSIZE]; ULONG Number = 1; while (Number > 0){ InternetReadFile(hHttp, Temp, MAXSIZE - 1, &Number); Temp[Number] = '\0'; *html=*html+Temp; } InternetCloseHandle(hHttp); hHttp = NULL; } InternetCloseHandle(hSession); hSession = NULL;}cout<<*html;delete html;
这种方法无法改http头,但应该可以通过转码实现。
2 用Wininet的Request系列函数
- C/C++ code
#include <iostream>#include <stdio.h>#include <string>#include <windows.h>#include <wininet.h>#include <tchar.h>#include <stdlib.h>#pragma comment(lib, "Wininet.lib") using namespace std;……HINTERNET hSession = InternetOpen("MSDN SurfBear",PRE_CONFIG_INTERNET_ACCESS,NULL,INTERNET_INVALID_PORT_NUMBER,0) ;HINTERNET hConnect = InternetConnect(hSession,"www.moko.cc",INTERNET_INVALID_PORT_NUMBER,"","",INTERNET_SERVICE_HTTP,0,0) ;HINTERNET hHttpFile = HttpOpenRequest(hConnect, "GET","/",HTTP_VERSION,NULL,0,INTERNET_FLAG_DONT_CACHE,0) ;PCHAR g_Accept_Encoding = "Accept-Encoding:utf-8\r\n";HttpAddRequestHeaders(hHttpFile, g_Accept_Encoding, (DWORD)strlen(g_Accept_Encoding),HTTP_ADDREQ_FLAG_ADD & HTTP_ADDREQ_FLAG_REPLACE);BOOL bSendRequest = HttpSendRequest(hHttpFile, NULL, 0, 0, 0);char bufQuery[320] ;DWORD dwLengthBufQuery = sizeof(bufQuery);BOOL bQuery = HttpQueryInfo(hHttpFile, HTTP_QUERY_CONTENT_LENGTH, bufQuery, &dwLengthBufQuery,0) ;DWORD dwFileSize = (DWORD)atol(bufQuery) ;char* buffer = new char[dwFileSize+1] ;DWORD dwBytesRead=0 ;BOOL bRead=InternetReadFile(hHttpFile, buffer, dwFileSize+1, &dwBytesRead); while (dwBytesRead > 0){ InternetReadFile(hHttpFile, buffer, dwFileSize + 1, &dwBytesRead); buffer[dwBytesRead] = '\0'; cout<<buffer; }InternetCloseHandle(hHttpFile);InternetCloseHandle(hConnect) ;InternetCloseHandle(hSession) ;
这种方可以改http头,似乎也可以转码。
3 用libcurl
- C/C++ code
#include <iostream>#include <stdio.h>#include <string>#include <windows.h>#include <tchar.h>#include "include/curl/curl.h"#include <stdlib.h>#pragma comment(lib, "libcurl_imp.lib")……CURL *curl;CURLcode res;curl = curl_easy_init();if(curl) { struct curl_slist *headers=NULL; headers = curl_slist_append(headers, "Accept-Encoding:utf-8"); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); curl_easy_setopt(curl, CURLOPT_USERAGENT, "FireFox"); curl_easy_setopt(curl, CURLOPT_URL,"www.moko.cc"); curl_easy_perform(curl); curl_easy_cleanup(curl); }
用libcrul也可以改http头,不知是否能将得到的内容转码。而且以我目前对它的了解,libcurl只能将得到的内容输出到控制台,不能赋值给变量
[解决办法]
不知道还可以改http头直接获取GBK编码的网页源文件,一直都是直接获取下来,然后再进行转换的。
帮顶!
[解决办法]
帮你挡一下
[解决办法]
utf8转GBK的,网上很多啊,
一般就是你说的第一种方法,然后,转换