Differences between revisions 1 and 2
Revision 1 as of 2007-10-14 06:00:56
Size: 2237
Editor: jigloo
Comment:
Revision 2 as of 2007-10-14 06:10:09
Size: 2207
Editor: jigloo
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
'''
含有章节索引的 *PUG 文章通用模板
'''
::-- ["hoxide"] [[[DateTime(2006-04-29T09:12:35Z)]]]
[[TableOfContents]]
''' 含有章节索引的 *PUG 文章通用模板 ''' ::-- ["hoxide"] [[[DateTime(2006-04-29T09:12:35Z)]]] [[TableOfContents]]
Line 11: Line 8:
Line 13: Line 9:
''简述''
网易blog的相册比原来的网易相册复杂许多,一层一层的javascript分析头都大了,所以就用pywin32调用IE来干这个活了。
''简述'' 网易blog的相册比原来的网易相册复杂许多,一层一层的javascript分析头都大了,所以就用pywin32调用IE来干这个活了。
Line 17: Line 12:
Line 21: Line 15:
Line 26: Line 19:
Line 32: Line 24:
Line 35: Line 26:
Line 37: Line 27:
        return any([True for x in self.ie.Document.links if x.href.find('prev')])
        return any([True for x in self.ie.Document.links if x.href.find('prev')] >= 0)
Line 42: Line 31:
Line 45: Line 33:
        time.sleep(5)         time.sleep(1)
Line 51: Line 39:
Line 57: Line 44:
Line 62: Line 48:

Line 66: Line 50:
Line 70: Line 53:
Line 77: Line 59:
Line 79: Line 60:

含有章节索引的 *PUG 文章通用模板 ::-- ["hoxide"] [DateTime(2006-04-29T09:12:35Z)] TableOfContents

Include(CPUGnav)

1. 导出网易博客相册的图片地址

简述 网易blog的相册比原来的网易相册复杂许多,一层一层的javascript分析头都大了,所以就用pywin32调用IE来干这个活了。

1.1. 代码

# coding:cp936
import win32com.client
import time
import re
import sys
class Album163blog:
    def __init__(self, name, nextword=u'下一页'):
        self.name = name
        self.nextword = nextword
        self.ie = win32com.client.Dispatch('InternetExplorer.Application')
    def __index__(self):
        return 'http://%s.blog.163.com/album/' % self.name
    def __indexloaded__(self):
        return any([True for x in self.ie.Document.links if x.href.find('prev')] >= 0)
    def visible(self):
        self.ie.Visible = True
        return self
    def connect(self):
        self.ie.Navigate2(self.__index__())
        time.sleep(1)
        while self.ie.Busy and self.ie.ReadyState != 4: #READYSTATE_COMPLETE
            time.sleep(1)
        while not self.__indexloaded__():
            time.sleep(1)
        return self
    def next(self):
        link = ([x for x in self.ie.Document.links if x.innerText.find(self.nextword) >= 0])[0]
        link.click()
        time.sleep(2)
        return self
    def imgs_href(self):
        def urlconv(url):
            return re.sub(r'prevPhoto.do\?', 'prevPhDownload.do?host=%s&' % self.name, url)
        return [urlconv(x.href) for x in self.ie.Document.links if x.href.find(u'prevPhoto') >=0]
if __name__ == '__main__':
    name = sys.argv[1]
    imgurls = []
    ab = Album163blog(name, u'下一页')
    ab.visible().connect()
    while True:
        links = ab.imgs_href()
        if len(links) == 0:
            break
        print '\n'.join(links)
        ab.next()

1.2. 提醒

如果地址倒出不完全的话是网速较慢引起的,增加程序中翻页函数(next)的sleep时间即可。

1.3. 反馈

MicroProj/2007-10-14 (last edited 2009-12-25 07:14:13 by localhost)