Difference between revisions of "Python: Selenium Crawler"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) (Created page with "Quickstart pip install -e git+https://github.com/cmwslw/selenium-crawler.git#egg=selenium-crawler from seleniumcrawler import handle_url print handle_url('https://news.yc...") |
Onnowpurbo (talk | contribs) |
||
Line 1: | Line 1: | ||
− | Quickstart | + | '''WARNING''' masih belum jalan / saya tidak bisa install / pakainya dengan benar |
+ | |||
+ | |||
+ | ==Quickstart== | ||
pip install -e git+https://github.com/cmwslw/selenium-crawler.git#egg=selenium-crawler | pip install -e git+https://github.com/cmwslw/selenium-crawler.git#egg=selenium-crawler |
Latest revision as of 04:41, 31 January 2017
WARNING masih belum jalan / saya tidak bisa install / pakainya dengan benar
Quickstart
pip install -e git+https://github.com/cmwslw/selenium-crawler.git#egg=selenium-crawler
from seleniumcrawler import handle_url print handle_url('https://news.ycombinator.com/item?id=5626377')
This will open up a browser window, 'click' on the main link, and load the article. It will print the following:
{ 'url': 'http://googleblog.blogspot.com/2013/04/google-now-on-your-iphone-and-ipad-with.html', 'source': Template:HTMLSOURCE, 'handler': 'hnews' }
Where Template:HTMLSOURCE is the actual HTML of the article.