|
⇤ ← Revision 1 as of 2007-10-14 06:00:56
Size: 2237
Comment:
|
Size: 2207
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 3: | Line 3: |
| ''' 含有章节索引的 *PUG 文章通用模板 ''' ::-- ["hoxide"] [[[DateTime(2006-04-29T09:12:35Z)]]] [[TableOfContents]] |
''' 含有章节索引的 *PUG 文章通用模板 ''' ::-- ["hoxide"] [[[DateTime(2006-04-29T09:12:35Z)]]] [[TableOfContents]] |
| Line 11: | Line 8: |
| Line 13: | Line 9: |
| ''简述'' 网易blog的相册比原来的网易相册复杂许多,一层一层的javascript分析头都大了,所以就用pywin32调用IE来干这个活了。 |
''简述'' 网易blog的相册比原来的网易相册复杂许多,一层一层的javascript分析头都大了,所以就用pywin32调用IE来干这个活了。 |
| Line 17: | Line 12: |
| Line 21: | Line 15: |
| Line 26: | Line 19: |
| Line 32: | Line 24: |
| Line 35: | Line 26: |
| Line 37: | Line 27: |
| return any([True for x in self.ie.Document.links if x.href.find('prev')]) |
return any([True for x in self.ie.Document.links if x.href.find('prev')] >= 0) |
| Line 42: | Line 31: |
| Line 45: | Line 33: |
| time.sleep(5) | time.sleep(1) |
| Line 51: | Line 39: |
| Line 57: | Line 44: |
| Line 62: | Line 48: |
| Line 66: | Line 50: |
| Line 70: | Line 53: |
| Line 77: | Line 59: |
| Line 79: | Line 60: |
含有章节索引的 *PUG 文章通用模板 ::-- ["hoxide"] [DateTime(2006-04-29T09:12:35Z)] TableOfContents
1. 导出网易博客相册的图片地址
简述 网易blog的相册比原来的网易相册复杂许多,一层一层的javascript分析头都大了,所以就用pywin32调用IE来干这个活了。
1.1. 代码
# coding:cp936
import win32com.client
import time
import re
import sys
class Album163blog:
def __init__(self, name, nextword=u'下一页'):
self.name = name
self.nextword = nextword
self.ie = win32com.client.Dispatch('InternetExplorer.Application')
def __index__(self):
return 'http://%s.blog.163.com/album/' % self.name
def __indexloaded__(self):
return any([True for x in self.ie.Document.links if x.href.find('prev')] >= 0)
def visible(self):
self.ie.Visible = True
return self
def connect(self):
self.ie.Navigate2(self.__index__())
time.sleep(1)
while self.ie.Busy and self.ie.ReadyState != 4: #READYSTATE_COMPLETE
time.sleep(1)
while not self.__indexloaded__():
time.sleep(1)
return self
def next(self):
link = ([x for x in self.ie.Document.links if x.innerText.find(self.nextword) >= 0])[0]
link.click()
time.sleep(2)
return self
def imgs_href(self):
def urlconv(url):
return re.sub(r'prevPhoto.do\?', 'prevPhDownload.do?host=%s&' % self.name, url)
return [urlconv(x.href) for x in self.ie.Document.links if x.href.find(u'prevPhoto') >=0]
if __name__ == '__main__':
name = sys.argv[1]
imgurls = []
ab = Album163blog(name, u'下一页')
ab.visible().connect()
while True:
links = ab.imgs_href()
if len(links) == 0:
break
print '\n'.join(links)
ab.next()
1.2. 提醒
如果地址倒出不完全的话是网速较慢引起的,增加程序中翻页函数(next)的sleep时间即可。
