forked from sachingupta006/github-crawler
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
10 lines (5 loc) · 662 Bytes
/
README
File metadata and controls
10 lines (5 loc) · 662 Bytes
1
2
3
4
5
6
7
8
This project contains a simple python crawler. Right now I will keep things simple and build a crawler that will visit all the links on a page upto a certain depth. Maybe things can be extended later.
To crawl a particular url, you need to give that as a command line argument
for example to crawl mycareerstack.com give run the python script as
python crawler.py http://mycareerstack.com
The crawler crawls links upto depth 5, by depth 5 it means that the crawler does a breadth first search going down 5 levels from the root url. Since it does a breadth first search all the links of the root url are collected first and then they are visited and so on.