I think this post is very useful for finding the url for downloading and extracting url using python code.
Here I am using "sgmllib" python built in module for finding urls.
Use below code and run it with any urls
Thank you guys for your support.
Here I am using "sgmllib" python built in module for finding urls.
Use below code and run it with any urls
__author__ = "Ashish jain (example@gmail.com)" __version__ = "$Revision: 1.0 $" __date__ = "$Date: 2014/10/01 21:57:19 $" __license__ = "Python" from sgmllib import SGMLParser class URLLister(SGMLParser): def reset(self): SGMLParser.reset(self) self.urls = [] def start_a(self, attrs): href = [v for k, v in attrs if k=='href'] if href: self.urls.extend(href) if __name__ == "__main__": import urllib usock = urllib.urlopen("http://diveintopython.net/") parser = URLLister() parser.feed(usock.read()) parser.close() usock.close() for url in parser.urls: print url
Thank you guys for your support.
No comments:
Post a Comment