performance - Python speed up the code for reconstructing lists -
what have been trying reconstructing lists such as:
[42351, 4253, 1264, 5311, 3651] # first number in list id [42352, 4254, 1244, 1246, 5311, 1264, 3651] [42353, 1254, 1264]
into format this:
# id \t 1 \t the_second_number_in_a_list \t id \t 2 \t the_third_number_in_a_list \t id \t 3 \t the_forth_number_in_a_list ... 42352 1 4254 42352 2 1244 42352 3 1246 42352 4 5311 42352 5 1264 42352 6 3651 42353 1 1254 42353 2 1264 42351 1 4253 42351 2 1264 42351 3 5311 42351 4 3651
my idea creating intermediate dictionary desired format:
list_dic = {42352: [42352, 1, 4254, 42352, 2, 1244, 42352, 3, 1246, 42352, 4, 5311, 42352, 5, 1264, 42352, 6, 3651], 42353: [42353, 1, 1254, 42353, 2, 1264], 42351: [42351, 1, 4253, 42351, 2, 1264, 42351, 3, 5311, 42351, 4, 3651]}
and save txt file separated tab.
however, realized in reality may have hundreds of thousands of lists, , way slow , computationally expensive. i'm looking advices speed code , reduce memory needed whole procedure. thanks.
attached code:
seq1 = [42351, 4253, 1264, 5311, 3651] seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651] seq3 = [42353, 1254, 1264] # first, group information single list seq_list = [seq1, seq2, seq3] # second, construct dictionary store information list_dic = {} each_seq in seq_list: j = 1 list_dic[each_seq[0]] = [] each_item in each_seq[1:]: list_dic[each_seq[0]].append(each_seq[0]) list_dic[each_seq[0]].append(j) list_dic[each_seq[0]].append(each_item) j += 1 # third, save information txt file text_file = open("output.txt", "w") each_id in list_dic: line = '\t'.join(str(each_num) each_num in list_dic[each_id]) text_file.write(line+'\n') text_file.close()
from itertools import chain,count,cycle open("out.txt","wb") f: eachlist in alllists: merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:]) f.write( "\t".join( map(str,chain.from_iterable(merged)) ) ) f.write("\n")
as far can tell there isnt reason create intermediate dictionary
(that said existing solution seems pretty viable(although little slower likely)
for @sirparselot
>>> seq1 = [42351, 4253, 1264, 5311, 3651] >>> seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651] >>> seq3 = [42353, 1254, 1264] >>> alllists = [seq1, seq2, seq3] >>> eachlist in alllists: ... merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:]) ... print "\t".join( map(str,chain.from_iterable(merged)) ) ... 42351 1 4253 42351 2 1264 42351 3 5311 42351 4 3651 42352 1 4254 42352 2 1244 42352 3 1246 42352 4 5311 42352 5 1264 42352 6 3651 42353 1 1254 42353 2 1264
Comments
Post a Comment