交叉对比病毒热度

目标

需要一个通用的脚本,可以:

  1. 自动化处理 江民/瑞星 等有检测记要输出的杀软日志文本
  2. 自动对比指定疫情网站公布的热点病毒,确认自己机器中哪些病毒样本是热点样本
  3. 综合积累对比各个杀软检测日志和各个疫情排行,得出自己中哪些病毒样本是公认的热点样本 假设:
    1. 所有杀软日志都处理成纯文本,编码是 utf-8
    2. 疫情排行的热点值,根据排行简单倒序设定:
      • 比如说,前 300个病毒,列表则:
      • 排名第一的病毒流行值为 300
      • 排名第二的病毒流行值为 299
      • 依次类推

准备

  1. 病毒排行收集:
  2. 本地病毒样本日志:

处理

使用virusampls-ratio.py 脚本进行依次处理

virusampls-ratio.py 的功能如自带的帮助所述:

安装 Python 环境

交叉对比

在本案例中,根据先后次序,列举运行的命令和关键截屏:

处理日志
  • 注意! 为了通用,请将所有日志文本,转存成 utf-8 编码的,否则,是对比不能的 ;-)

  • >python virusampls-ratio.py --lkv KV-RESULT.txt

    • 输出类似:spx-100412-vsr-0-kv.png

  • >python virusampls-ratio.py --lrx RX_result.csv

    • 输出类似:spx-100412-vsr-0-rx.png

分别对比疫情排行:
  • > python virusampls-ratio.py -r KV-RESULT.txt.dump 毒霸疫情中的前500个病毒名.TXT
    > python virusampls-ratio.py -r KV-RESULT.txt.dump 江民疫情中的前300个病毒名.TXT
    > python virusampls-ratio.py -r KV-RESULT.txt.dump 瑞星疫情中的前500个病毒名.TXT
    • 输出类似: spx-100412-vsr-1-yq.png

  • > python virusampls-ratio.py -r RX_result.csv.dump 毒霸疫情中的前500个病毒名.TXT
    > python virusampls-ratio.py -r RX_result.csv.dump 江民疫情中的前300个病毒名.TXT
    > python virusampls-ratio.py -r RX_result.csv.dump 瑞星疫情中的前500个病毒名.TXT
    • 输出类似: spx-100412-vsr-2-yq.png

最终输出成果:
  • >python virusampls-ratio.py -e virus-hotest-in-my-pc.tx tmp

    • 输出类似: spx-100412-vsr-3-export.png

  • 本次综合对比后得出的 186个高热点样本列表:

代码

含注释,空行,共145行,完成所有处理!

Toggle line numbers
   1 #!/usr/bin/env python
   2 # -*- coding: utf-8 -*-
   3 import os,sys,time,pickle,shutil
   4 from optparse import OptionParser
   5 from operator import itemgetter
   6 VERSION="virusampls-ratio v10.4.12"
   7 
   8 def processLogKV(fname):
   9     "简单处理KV查杀日志为list对象[计数,病毒名,隔离区目录]"
  10     print fname
  11     dumpf = "%s.dump"%fname
  12     dumpd = {}
  13     for line in open(fname).readlines():
  14         if " 中发现 " in line:
  15             tags = line.split()
  16             print tags[3],tags[1]
  17             viru = tags[3]
  18             path = tags[1]
  19             if viru in dumpd.keys():
  20                 pass
  21             else:
  22                 dumpd[viru]=[0,path]
  23     pickle.dump(dumpd, open(dumpf, 'wb'))
  24     print "\n\n%s\n\tdumped %d lines checked log as Python List obj."%(VERSION,len(dumpd.keys()))
  25 def processLogRX(fname):
  26     "简单处理RX查杀日志为list对象[计数,病毒名,隔离区目录]"
  27     print fname
  28     dumpf = "%s.dump"%fname
  29     dumpd = {}
  30     for line in open(fname).readlines():
  31         if "手动查杀" in line:
  32             tags = line.split(";")
  33             print tags[2][1:-1],tags[6][1:-1]
  34             viru = tags[2][1:-1]
  35             path = tags[6][1:-1]
  36             if viru in dumpd.keys():
  37                 pass
  38             else:
  39                 dumpd[viru]=[0,path]
  40     pickle.dump(dumpd, open(dumpf, 'wb'))
  41     print "\n\n%s\n\tdumped %d lines checked log as Python List obj."%(VERSION,len(dumpd.keys()))
  42 
  43 def ratioYQ(args):
  44     """将查杀日志和厂商疫情进行对比,按照排名积累分数,最终得到综合热度:
  45         前500排名的话,第一名为500分;
  46     MiscItems/2008-07-01 - Woodpecker Wiki for CPUG
  47 http://wiki.woodpecker.org.cn/moin/MiscItems/2008-07-01
  48     Python中最快的字典排序方法 | Windstorm
  49 http://www.kunli.info/2009/05/07/sorting-dictionaries-by-value-in-python/
  50     """
  51     dumpf = args[0]
  52     yqfile = args[1]
  53     tophotf = "%s-hot4-%s.txt"%(dumpf.split(".")[0],yqfile.split(".")[0])
  54     ratio = pickle.load(open(dumpf, 'rb'))
  55     flines = open(yqfile).readlines()
  56     for i in range(len(flines)):
  57         hvname = flines[i].upper().replace(".","").replace("/","")
  58         for v in ratio.keys():
  59             vname = v.upper().replace(".","").replace("/","")
  60             if vname in hvname:
  61                 ratio[v][0]+=i
  62     unitRatioDictFile = "virusampls-ratio.dict.dump"
  63     try:
  64         unitRD = pickle.load(open(unitRatioDictFile, 'rb'))
  65     except:
  66         unitRD = {}
  67     td = sorted(ratio.iteritems(), key=itemgetter(1), reverse=True)     
  68     hotvli=""
  69     for t in td:
  70         if 0!=t[1][0]:
  71             print t
  72             hotvli +="%s\t\t%s\n"%(t[0],t[1][1])
  73             if t[0] in unitRD.keys():
  74                 unitRD[t[0]][0] +=t[1][0]
  75             else:
  76                 unitRD[t[0]] =[t[1][0],t[1][1]]
  77 
  78     open(tophotf,"w").write(hotvli)
  79     print "\n\n%s\n\tfound top %d hotest Virus base %s...\n\t\t export as:%s"%(VERSION
  80         ,len(hotvli)
  81         ,yqfile
  82         ,tophotf
  83         )
  84     pickle.dump(unitRD, open(unitRatioDictFile, 'wb'))
  85 def finalExport(args):
  86     "将最终交叉对比出来的列表输出到指定目录/文件,同时(尝试将隔离区文件也复制到指定目录)"
  87     expfn = "%s/%s"%(args[1],args[0])
  88     expdir = args[1]
  89     unitRatioDictFile = "virusampls-ratio.dict.dump"
  90     unitRD = pickle.load(open(unitRatioDictFile, 'rb'))
  91     td = sorted(unitRD.iteritems(), key=itemgetter(1), reverse=True)     
  92     hotvli=""
  93     for t in td:
  94         print t
  95         hotvli +="%s\t\t%s\n"%(t[0],t[1][1])
  96         #shutil.copy(t[1][1],expdir)
  97     open(expfn,"w").write(hotvli)
  98     print "\n\n%s\n\tfound top %d hotest Virus in my pc \n\t\t export list file::%s"%(VERSION
  99         ,len(td)
 100         ,expfn
 101         )
 102 
 103 
 104 if __name__ == '__main__':      # this way the module can be
 105     begin = time.time()
 106     usage = "usage: %prog [option0] arg0 [arg1]"
 107     parser = OptionParser(usage,version=VERSION)
 108     parser.add_option("-r","--ratio",dest="ratio",nargs=2,
 109                       type="string",metavar="*.dump some-hot-virus-list.txt",
 110                       help="YiQing comparison,uasge --lrx|lkv processed .dump file "
 111                         "e.g KV-RESULT.dump some-hot-virus-list.txt")
 112     parser.add_option("--lrx",dest="lrx",
 113                       type="string",
 114                       help="process RX checking log")
 115     parser.add_option("--lkv",dest="lkv",
 116                       type="string",
 117                       help="process KV checking log")
 118     parser.add_option("-e","--export",dest="export",nargs=2,
 119                       type="string",metavar="*.txt path/2/export ",
 120                       help="export total hotest virus list into dir             " 
 121                         "e.g -e virus-hotest-in-my-pc.tx tmp")
 122     (options, args) = parser.parse_args()
 123     if 1 == len(sys.argv):
 124         parser.print_help()
 125     if options.ratio :
 126         print "comparison checked list and YiQing list\n\n"
 127         ratioYQ(options.ratio)
 128     else:
 129         if options.lrx:
 130             print "\n\nreFormat RX checking list\n\n"
 131             processLogRX(options.lrx)
 132         if options.lkv:
 133             print "\n\nreFormat KV checking list\n\n"
 134             processLogKV(options.lkv)
 135         if options.export:
 136             print "\n\nexport total hotest virus list in to\n\n"
 137             finalExport(options.export)
 138     if (options.lrx and options.ratio) or (options.lkv and options.ratio):
 139         print "\n\nATTENTION!\n\t -r and --lrx|lkx can not usage in same time!\n\n"
 140         parser.error("incorrect number of arguments! \nATTENTION!\n"
 141             "\ne.g.\n \tpython %s -lrx something.txt "
 142             "\nor\n\tpython %s -r sonething.dump top500-from-RX.TXT"%(parser.get_prog_name()
 143                 ,parser.get_prog_name())
 144                 )
 145         parser.print_help()


反馈

创建 by -- ZoomQuiet [2010-04-12 14:47:17]

ZoomQuiet/2010-04-12 (last edited 2010-04-16 09:46:30 by ZoomQuiet)