我有一个进程循环访问IP地址列表并返回有关它们的一些信息。simple for循环工作得很好,我的问题是由于Python的全局解释器锁(GIL)而大规模地运行这个循环。在

我的目标是让这个函数并行运行,并充分利用我的4个核心。这样,当我运行100K这些,它不会花我24小时通过一个正常的for循环。在

在阅读了其他人的答案之后,特别是这个,How do I parallelize a simple Python loop?,我决定使用joblib。当我运行10个记录通过它(上面的例子),它花了10分钟运行。这听起来不太对劲。我知道有些事情我做错了或者不理解。非常感谢任何帮助!在import pandas as pd

import numpy as np

import os as os

from ipwhois import IPWhois

from joblib import Parallel, delayed

import multiprocessing

num_core = multiprocessing.cpu_count()

iplookup = ['174.192.22.197',\

'70.197.71.201',\

'174.195.146.248',\

'70.197.15.130',\

'174.208.14.133',\

'174.238.132.139',\

'174.204.16.10',\

'104.132.11.82',\

'24.1.202.86',\

'216.4.58.18']

正常的for循环,工作正常!在

^{pr2}$

函数传递给joblib在所有核心上运行!在def run_ip_process(iplookuparray):

asn=[]

asnid=[]

asncountry=[]

asndesc=[]

asnemail = []

asnaddress = []

asncity = []

asnstate = []

asnzip = []

asndesc2 = []

ipaddr=[]

b=1

totstolookup=len(iplookuparray)

for i in iplookuparray:

i = str(i)

print("Running #{} out of {}".format(b,totstolookup))

try:

obj=IPWhois(i,timeout=15)

result=obj.lookup_whois()

asn.append(result['asn'])

asnid.append(result['asn_cidr'])

asncountry.append(result['asn_country_code'])

asndesc.append(result['asn_description'])

try:

asnemail.append(result['nets'][0]['emails'])

asnaddress.append(result['nets'][0]['address'])

asncity.append(result['nets'][0]['city'])

asnstate.append(result['nets'][0]['state'])

asnzip.append(result['nets'][0]['postal_code'])

asndesc2.append(result['nets'][0]['description'])

ipaddr.append(i)

except:

asnemail.append(0)

asnaddress.append(0)

asncity.append(0)

asnstate.append(0)

asnzip.append(0)

asndesc2.append(0)

ipaddr.append(i)

except:

pass

b+=1

ipdataframe = pd.DataFrame({'ipaddress':ipaddr,

'asn': asn,

'asnid':asnid,

'asncountry':asncountry,

'asndesc': asndesc,

'emailcontact': asnemail,

'address':asnaddress,

'city':asncity,

'state': asnstate,

'zip': asnzip,

'ipdescrip':asndesc2})

return ipdataframe

通过joblib使用所有核心运行进程Parallel(n_jobs=num_core)(delayed(run_ip_process)(iplookuparray) for i in iplookup)

Logo

汇聚全球AI编程工具,助力开发者即刻编程。

更多推荐