使用winhttp 组件获取某页面的html文本,中文出现乱码.
代码很简单,如下:
- C# code
using System;using System.Collections.Generic;using System.Text;using System.Net;namespace winhttptest{ class Program { static void Main(string[] args) { WinHttp.WinHttpRequest whr = new WinHttp.WinHttpRequest(); string url = Console.ReadLine(); while (url != string.Empty) { whr.Open("GET", url, false); whr.Send(""); string html = whr.ResponseText; Console.WriteLine(html); url = Console.ReadLine(); } } }}
输入一个完整的uri(例如:http://news.sina.com.cn/c/2008-10-12/095416439207.shtml),回车,显示出来的文本中中文都是乱码,如何解决?
[解决办法]
- C# code
string html = whr.ResponseText;html = Encoding.GetEncoding("GB2312").GetString(Encoding.UTF8.GetBytes(html));
[解决办法]
- C# code
using System;using System.Collections.Generic;using System.Text;using System.Net;using System.IO;namespace winhttptest{ class Program { private static string GetResponse(string url) { url.Trim(); HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); req.AllowAutoRedirect = true; req.MaximumAutomaticRedirections = 3; //text/xml;charset=uft-8 req.UserAgent = "Mozilla/4.0 (compatible;MSIE 6.0;Windows NT 5.2;.NET CLR 1.1.4322)"; req.Referer = req.RequestUri.ToString(); req.KeepAlive = true; //req.Method = "Get"; req.Timeout = -1; HttpWebResponse webresponse = null; try { webresponse = (HttpWebResponse)req.GetResponse(); if (webresponse != null) { StreamReader reader = new StreamReader(webresponse.GetResponseStream(), System.Text.Encoding.GetEncoding("gb2312")); return reader.ReadToEnd(); } } catch (System.Net.WebException ex) { return ex.Message; } if (webresponse != null) { return ""; } return ""; } static void Main(string[] args) { string html = GetResponse("http://news.sina.com.cn/c/2008-10-12/095416439207.shtml"); Console.WriteLine(html); Console.ReadLine(); } }}
[解决办法]
主要还是编码问题 有些是utf8 有些是gb2312 根据实际情况修改