读书人

【获取网页源码的方法】解决大家的有关

发布时间: 2012-09-23 10:28:11 作者: rapoo

【获取网页源码的方法】解决大家的问题同时提出一个问题
网上的很多的获取网页源代码的方法都要求已知编码方式,如果方式未知呢?
采用我的编码可以解决问题

VB.NET code
    Function GetByDiv2(ByVal code As String, ByVal divBegin As String, ByVal divEnd As String)  '获取分隔符所夹的内容[完成,未测试]        '仅用于获取编码数据        Dim lgStart As Integer        Dim lens As Integer        Dim lgEnd As Integer        lens = Len(divBegin)        If InStr(1, code, divBegin) = 0 Then GetByDiv2 = "" : Exit Function        lgStart = InStr(1, code, divBegin) + CInt(lens)        lgEnd = InStr(lgStart, code, divEnd)        If lgEnd = 0 Then GetByDiv2 = "" : Exit Function        GetByDiv2 = Mid(code, lgStart, lgEnd - lgStart)    End Function    Public Function getHtmlStr(ByVal strURL As String) As String  '获取源码        '2012-08-14 deal with gb2312 and utf-8        'On Error Resume Next        Dim codeStr As String = PreGetHtml(strURL, "UTF-8")        Dim CodeSet As String = UCase(Replace(GetByDiv2(codeStr, "charset=", """"), """", ""))        If CodeSet = "" Then CodeSet = "UTF-8"        getHtmlStr = PreGetHtml(strURL, CodeSet)    End Function    Function PreGetHtml(ByVal strURL As String, Optional ByVal codeType As String = "")        '2012-08-14 deal with gb2312 and utf-8        On Error Resume Next        Dim httpReq As System.Net.HttpWebRequest        Dim httpResp As System.Net.HttpWebResponse        Dim httpURL As New System.Uri(strURL)        Dim sTime As Date = CDate("1990-09-21")        httpReq = CType(WebRequest.Create(httpURL), HttpWebRequest)        httpReq.Method = "GET"        'httpReq.Headers.Add("If-Modified-Since", "0")        httpReq.IfModifiedSince = sTime        httpResp = CType(httpReq.GetResponse(), HttpWebResponse)        PreGetHtml = ""        Dim reader As StreamReader = New StreamReader(httpResp.GetResponseStream, System.Text.Encoding.GetEncoding(codeType))        PreGetHtml = reader.ReadToEnd        reader.Close()        httpResp.Close()    End Function

大家试试,是不是所有的编码的网页都可以正确的获得源码而且没有乱码。

我的问题是,为什么这里的httpResp.GetResponseStream只能读一次,然后就变成了不可读,求高手帮助!

[解决办法]
辛苦了,2年前我有篇博客文章,用了类似方法,不过是正则。
你的实现是正确的。恭喜。
[解决办法]
探讨

VB.NET code
……
charSet = Replace([b]GetByDiv[/b](tCode, "charset=", """"), """", "") '进行编码类型识别
'以上,获取编码类型……

[解决办法]
都是高人啊!学习了
[解决办法]
目前不涉及这一块!

读书人网 >VB Dotnet

热点推荐