1. 导出网易博客相册的图片地址

简述网易blog的相册比原来的网易相册复杂许多，一层一层的javascript分析头都大了，所以就用pywin32调用IE来干这个活了。又改了改，速度快了点。

1.1. 代码

# coding:cp936
import win32com.client
import time
import re
import sys
class Album163blog:
    def __init__(self, name, nextword=u'下一页', percount=8):
        self.name = name
        self.nextword = nextword
        self.percount = percount
        self.curpage = 1
        self.pages = 100 #we will modify it's value in self.connect()
        self.ie = win32com.client.Dispatch('InternetExplorer.Application')
        
    def __home__(self):
        return 'http://%s.blog.163.com/album/' % self.name
    
    def __pageloaded__(self):
        return self.curpage >= self.pages and True or len(self.imgs_href()) >= self.percount
    
    def __pages__(self):
        url = [x.href for x in self.ie.Document.links if x.href.find('.blog.163.com/album/#p') >= 0][-2:][0]
        return int(url[url.find('#p')+2:])
    
    def visible(self, v=True):
        self.ie.Visible = v
        return self
    
    def connect(self):
        self.ie.Navigate2(self.__home__())
        time.sleep(1)
        while self.ie.Busy and self.ie.ReadyState != 4: #READYSTATE_COMPLETE
            time.sleep(1)
        while not self.__pageloaded__():
            time.sleep(1)
        self.pages = self.__pages__()
        return self
    
    def next(self):
        self.curpage += 1
        if self.curpage > self.pages:
            return False
        link = [x for x in self.ie.Document.links if x.innerText.find(self.nextword) >= 0][0]
        link.click()
        while not self.__pageloaded__():
            time.sleep(1)
        return True
    
    def imgs_href(self):
        def urlconv(url):
            return re.sub(r'prevPhoto.do\?', 'prevPhDownload.do?host=%s&' % self.name, url)
        return [urlconv(u) for u in set([x.href for x in self.ie.Document.links if x.href.find(u'prevPhoto') >=0])]

if __name__ == '__main__':
    name = sys.argv[1]
    #url = 'http://dwl2981332.blog.163.com/album/#p1'
    ab = Album163blog(name)
    ab.visible().connect()

    while True:
        links = ab.imgs_href()
        print '\n'.join(links)
        if not ab.next():
            break

-  ⇤ ← Revision 1 as of 2007-10-14 06:00:56 → 
  Size: 2237
  Editor: jigloo
  Comment:
+   ← Revision 4 as of 2007-10-20 03:43:22 → ⇥
  Size: 2630
  Editor: jigloo
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-'''
含有章节索引的 *PUG 文章通用模板
'''
::-- ["hoxide"] [[[DateTime(2006-04-29T09:12:35Z)]]]
[[TableOfContents]]
+''' 含有章节索引的 *PUG 文章通用模板 ''' ::-- ["jigloo"] [[TableOfContents]]
-Line 11:
+Line 8:
-Line 13:
+Line 9:
-''简述''
网易blog的相册比原来的网易相册复杂许多，一层一层的javascript分析头都大了，所以就用pywin32调用IE来干这个活了。
+''简述'' 网易blog的相册比原来的网易相册复杂许多，一层一层的javascript分析头都大了，所以就用pywin32调用IE来干这个活了。又改了改，速度快了点。
-Line 17:
+Line 12:
-Line 21:
+Line 15:
-Line 26:
+Line 19:
-Line 28:
+Line 20:
-    def __init__(self, name, nextword=u'下一页'):
+    def __init__(self, name, nextword=u'下一页', percount=8):
-Line 31:
+Line 23:
+        self.percount = percount
        self.curpage = 1
        self.pages = 100 #we will modify it's value in self.connect()
-Line 32:
+Line 27:
-    def __index__(self):
+    def __home__(self):
-Line 35:
+Line 30:
-    def __indexloaded__(self):
        return any([True for x in self.ie.Document.links if x.href.find('prev')])

    def visible(self):
        self.ie.Visible = True
+         def __pageloaded__(self):
        return self.curpage >= self.pages and True or len(self.imgs_href()) >= self.percount
    
    def __pages__(self):
        url = [x.href for x in self.ie.Document.links if x.href.find('.blog.163.com/album/#p') >= 0][-2:][0]
        return int(url[url.find('#p')+2:])
    
    def visible(self, v=True):
        self.ie.Visible = v
-Line 42:
+Line 41:
-Line 44:
+Line 43:
-        self.ie.Navigate2(self.__index__())
        time.sleep(5)
+        self.ie.Navigate2(self.__home__())
        time.sleep(1)
-Line 48:
+Line 47:
-        while not self.__indexloaded__():
+        while not self.__pageloaded__():
-Line 50:
+Line 49:
+        self.pages = self.__pages__()
 Line 51:
 Line 53:
-        link = ([x for x in self.ie.Document.links if x.innerText.find(self.nextword) >= 0])[0]
+        self.curpage += 1
        if self.curpage > self.pages:
            return False
        link = [x for x in self.ie.Document.links if x.innerText.find(self.nextword) >= 0][0]
-Line 55:
+Line 58:
-        time.sleep(2)
        return self
+        while not self.__pageloaded__():
            time.sleep(1)
        return True
-Line 61:
+Line 65:
-        return [urlconv(x.href) for x in self.ie.Document.links if x.href.find(u'prevPhoto') >=0]
+        return [urlconv(u) for u in set([x.href for x in self.ie.Document.links if x.href.find(u'prevPhoto') >=0])]
-Line 66:
+Line 69:
-    imgurls = []
    ab = Album163blog(name, u'下一页')
+    #url = 'http://dwl2981332.blog.163.com/album/#p1'
    ab = Album163blog(name)
-Line 73:
+Line 75:
-        if len(links) == 0:
+        print '\n'.join(links)
        if not ab.next():
-Line 75:
+Line 78:
-        print '\n'.join(links)
        ab.next()
 Line 79:
-== 提醒 ==
如果地址倒出不完全的话是网速较慢引起的，增加程序中翻页函数(next)的sleep时间即可。

== 反馈 ==

Diff for "MicroProj/2007-10-14"

1. 导出网易博客相册的图片地址

1.1. 代码