| Recommend this page to a friend! |
| Spider Engine | > | All threads | > | Understanding patterns | > | (Un) Subscribe thread alerts |
| |||||||||||||||
me again
I'm very new to progamming your script looks very interresting I'm playing with example5_google_results.php but I would like to return for example only the links of the websites that google indexes... not the cached, not the related, nothing else than a list of link something like link 1 = http://www.link1.com link 2 = http://www.link2.com but I really don't undertstand how do to it thanx you in advance for your help Steffy
it's very simple ... instead of using "title","content","cache","similar", you should replace that with "dummy" in $obj->pattern. then, all the results are having this two variables, "dummy" and "link". you'll use the "link" variable as you need.
enjoy.
thank you for your time and tips, but I'm blonde !!
here is how I configure your class ============================================================================= $obj=new MySpider(); $obj->url="http://www.google.com/search?q=kitesurf&start={range[0]}"; $obj->range=array(0=>array("start"=>0,"end"=>1,"step"=>1)); //$obj->pattern_definition=array("dummy","link","title","content","cache","similar"); //dummy is used for content that changes between pages and we are not interested in it $obj->pattern_definition=array("dummy","link"); //dummy is used for content that changes between pages and we are not interested in it $obj->start='<div>'; $obj->end=array("to_process"=>array('</body>'),"not_to_process"=>array()); //$obj->pattern='<div class=g{p[dummy]}><h2 class=r><a{p[dummy]}href="{p[link]}"{p[dummy]}class=l{p[dummy]}>{p[title]}</a></h2><table border=0 cellpadding=0 cellspacing=0><tr><td class=j><font size=-1>{p[content]}<br><span{p[dummy]}>{p[dummy]}</span><nobr><a class=fl href="{p[cache]}">{p[dummy]}</a> - <a class=fl href="{p[similar]}">{p[dummy]}</a></nobr></font></td></tr></table></div>'; $obj->pattern='<h2 class=r><a{p[dummy]}href="{p[link]}"{p[dummy]}class=l{p[dummy]}><br />'; $obj->fetchData(); ================================================================ anf here it what it output =============================================================== Url: http://www.google.com/search?q=kitesurf&start=0 processing! Array ( [0] => Array ( [dummy] => [link] => http://www.kite-surf.com/ ) [1] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:7zy9asTytgAJ:www.kite-surf.com/+kitesurf&hl=en&ct=clnk&cd=1&ie=UTF-8 ) [2] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kite-surf.com/ ) [3] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:siM2eG3yXtAJ:www.prokitesurf.com/+kitesurf&hl=en&ct=clnk&cd=2&ie=UTF-8 ) [4] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.prokitesurf.com/ ) [5] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:tKJCI76umJAJ:www.kitesurf.com/+kitesurf&hl=en&ct=clnk&cd=3&ie=UTF-8 ) [6] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kitesurf.com/ ) [7] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:Ws_E5o4bE70J:kitesurfingschool.org/howto.htm+kitesurf&hl=en&ct=clnk&cd=4&ie=UTF-8 ) [8] => Array ( [dummy] => [link] => http://kitesurfingschool.org/ ) [9] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:ClpdkWWJKKIJ:kitesurfingschool.org/+kitesurf&hl=en&ct=clnk&cd=5&ie=UTF-8 ) [10] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:kitesurfingschool.org/ ) [11] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:1dhkGBb_ufYJ:en.wikipedia.org/wiki/Kitesurfing+kitesurf&hl=en&ct=clnk&cd=6&ie=UTF-8 ) [12] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:en.wikipedia.org/wiki/Kitesurfing ) [13] => Array ( [dummy] => [link] => http://www.planetkitesurf.com/ ) [14] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:3Ie_N0I2siMJ:www.planetkitesurf.com/+kitesurf&hl=en&ct=clnk&cd=7&ie=UTF-8 ) [15] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.planetkitesurf.com/ ) [16] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:iiJiIYaUVU4J:www.kitesurf.ie/+kitesurf&hl=en&ct=clnk&cd=8&ie=UTF-8 ) [17] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kitesurf.ie/ ) [18] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:74vvf5HhPDAJ:www.kitesurfusa.com/+kitesurf&hl=en&ct=clnk&cd=9&ie=UTF-8 ) [19] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kitesurfusa.com/ ) ) Url: http://www.google.com/search?q=kitesurf&start=0 has been processed in 1.46 sec ! Url: http://www.google.com/search?q=kitesurf&start=1 processing! Array ( [0] => Array ( [dummy] => [link] => http://www.prokitesurf.com/ ) [1] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:siM2eG3yXtAJ:www.prokitesurf.com/+kitesurf&hl=en&ct=clnk&cd=2&ie=UTF-8 ) [2] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.prokitesurf.com/ ) [3] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:tKJCI76umJAJ:www.kitesurf.com/+kitesurf&hl=en&ct=clnk&cd=3&ie=UTF-8 ) [4] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kitesurf.com/ ) [5] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:Ws_E5o4bE70J:kitesurfingschool.org/howto.htm+kitesurf&hl=en&ct=clnk&cd=4&ie=UTF-8 ) [6] => Array ( [dummy] => [link] => http://kitesurfingschool.org/ ) [7] => Array ( [dummy] => [link] => http://kitesurfingschool.org/ ) [8] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:ClpdkWWJKKIJ:kitesurfingschool.org/+kitesurf&hl=en&ct=clnk&cd=5&ie=UTF-8 ) [9] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:kitesurfingschool.org/ ) [10] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:1dhkGBb_ufYJ:en.wikipedia.org/wiki/Kitesurfing+kitesurf&hl=en&ct=clnk&cd=6&ie=UTF-8 ) [11] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:en.wikipedia.org/wiki/Kitesurfing ) [12] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:3Ie_N0I2siMJ:www.planetkitesurf.com/+kitesurf&hl=en&ct=clnk&cd=7&ie=UTF-8 ) [13] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.planetkitesurf.com/ ) [14] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:iiJiIYaUVU4J:www.kitesurf.ie/+kitesurf&hl=en&ct=clnk&cd=8&ie=UTF-8 ) [15] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kitesurf.ie/ ) [16] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:74vvf5HhPDAJ:www.kitesurfusa.com/+kitesurf&hl=en&ct=clnk&cd=9&ie=UTF-8 ) [17] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kitesurfusa.com/ ) [18] => Array ( [dummy] => [link] => http://66.102.9.104/search?q=cache:q2t4h4LRp5oJ:www.kiteboardingholidays.com/+kitesurf&hl=en&ct=clnk&cd=10&ie=UTF-8 ) [19] => Array ( [dummy] => [link] => /search?hl=en&ie=UTF-8&q=related:www.kiteboardingholidays.com/ ) ) Url: http://www.google.com/search?q=kitesurf&start=1 has been processed in 1.47 sec ! ==================================================================== but what I want to achieve would be simply [1] => http://..... [2] => http://..... [3] => http://..... [4] => http://..... [5] => http://..... and so on is that possible and if so how shall I process? thanx you in advance Steffy
use this pattern instead:
$obj->pattern='<div class=g{p[dummy]}><h2 class=r><a{p[dummy]}href="{p[link]}"{p[dummy]}class=l{p[dummy]}>{p[dummy]}</a></h2><table border=0 cellpadding=0 cellspacing=0><tr><td class=j><font size=-1>{p[dummy]}<br><span{p[dummy]}>{p[dummy]}</span><nobr><a class=fl href="{p[dummy]}">{p[dummy]}</a> - <a class=fl href="{p[dummy]}">{p[dummy]}</a></nobr></font></td></tr></table></div>'; you can then delete all the dummies with unset(). good luck.
YES..
I'm going somewhere how would you use the unset to remove the dummy? Steffy
yes yesy I know the unset fucntion, but where would you put into that script?
also let say that I want in my result page to have the link click able, what should I modifiy in the patern? thanx you for your time and sorry to be dumb Steffy
in my google example you have already a good unset example :
function processData($pattern_matches) //you can do whatever you want here with the pattern matches, insert in a database etc. { foreach ($pattern_matches as $k=>$v) { if($v['dummy']) { unset($pattern_matches[$k]['dummy']); } } print_r($pattern_matches); } you shouldn't modify the pattern !! you can add <a href="" to each link after processing, in processData function. have a nice day. |
info at phpclasses dot org.
