Help 网页抓取源码----神奇网址求抓
抓取不到源码的网址
http://www1.macys.com/catalog/product/index.ognc?ID=596761
用HttpWebRequest死活抓不到源码,报重定向太多。监视了下cookie。加了一堆还是没解决,求教有人能抓的到么。
-------------------------
ps:同样的产品页:比如 http://www1.macys.com/catalog/product/index.ognc?ID=603770 抓取就没问题。一样的代码抓取上面的网址就不行。网上能搜到的代码一一试验了下,均不行。没一个能抓到上面网址源码
-------------------------
测试方法代码:
- C# code
private static string getContent(string Url) { string content = ""; try { HttpWebRequest wreq = (HttpWebRequest)WebRequest.Create(Url); wreq.MaximumAutomaticRedirections = 4; wreq.MaximumResponseHeadersLength = 4; //wreq.Credentials = System.Net.CredentialCache.DefaultCredentials; //wreq.Referer = "http://www.macys.com"; //wreq.Headers.Add(HttpRequestHeader.Cookie, "macys_online=4416704358; shippingCountry=US; currency=USD;"); wreq.Method = "Get"; wreq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30"; CookieContainer cookieCon = new CookieContainer(); //CookieCollection cc = new CookieCollection(); //cc.Add(new System.Net.Cookie("currency", "USD", "/", "macys.com")); //cc.Add(new System.Net.Cookie("PPP", "24", "/", "macys.com")); //cc.Add(new System.Net.Cookie("SignedIn", "0", "/", "macys.com")); //cc.Add(new System.Net.Cookie("shippingCountry", "US", "/", "macys.com")); //cookieCon.Add(cc); wreq.CookieContainer = cookieCon; HttpWebResponse wresp = (HttpWebResponse)wreq.GetResponse(); StreamReader sr = new StreamReader(wresp.GetResponseStream()); content = sr.ReadToEnd(); } catch (Exception ex) { content = ex.Message; } return content; }
[解决办法]
你直接抓
跳转过的不行吗
http://www1.macys.com/shop/product/treasured-hearts-diamond-ring-sterling-silver-black-white-diamond-heart-ring-1-4-ct.-t.w.?ID=596761&intnl=true&intnl=true
[解决办法]
这个页面跳转的话,应该是有一个refer的url,你给这个url加上在试试!
[解决办法]
首先,你直接在浏览器里面输入
http://www1.macys.com/catalog/product/index.ognc?ID=596761
能得到源代码吗?得到的是你期望的结果吗
注意是直接输入。
有的页面是需要从上一级页面中点击进来的,否则是不行的
[解决办法]
好像可以禁止查看源码!
[解决办法]
[url=http://blog.csdn.net/yysyangyangyangshan/article/details/6661886]试试这个类,要抓取的url作为参数初始化/[url]
[解决办法]
试试这个类,要抓取的url作为参数初始化
[解决办法]
那个网址应该有个入口的,不能直接进去啊,都跳转了,如果你知道那个网址是从哪个页面进去的,贴出来。。。
或者自己试一下,将那个网址作为refer加到代码里面,看看行不行
[解决办法]
要是人家用个flash来实现页面,你偏要抓html,也是枉然。