performance - Python speed up the code for reconstructing lists -

- June 15, 2015

what have been trying reconstructing lists such as:

[42351, 4253, 1264, 5311, 3651]  # first number in list id [42352, 4254, 1244, 1246, 5311, 1264, 3651] [42353, 1254, 1264]

into format this:

# id \t 1 \t the_second_number_in_a_list \t id \t 2 \t the_third_number_in_a_list \t id \t 3 \t the_forth_number_in_a_list ... 42352   1   4254    42352   2   1244    42352   3   1246    42352   4   5311    42352   5   1264    42352   6   3651 42353   1   1254    42353   2   1264 42351   1   4253    42351   2   1264    42351   3   5311    42351   4   3651

my idea creating intermediate dictionary desired format:

list_dic = {42352: [42352, 1, 4254, 42352, 2, 1244, 42352, 3, 1246, 42352, 4, 5311, 42352, 5, 1264, 42352, 6, 3651], 42353: [42353, 1, 1254, 42353, 2, 1264], 42351: [42351, 1, 4253, 42351, 2, 1264, 42351, 3, 5311, 42351, 4, 3651]}

and save txt file separated tab.

however, realized in reality may have hundreds of thousands of lists, , way slow , computationally expensive. i'm looking advices speed code , reduce memory needed whole procedure. thanks.

attached code:

seq1 = [42351, 4253, 1264, 5311, 3651] seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651] seq3 = [42353, 1254, 1264]  # first, group information single list seq_list = [seq1, seq2, seq3]  # second, construct dictionary store information list_dic = {}  each_seq in seq_list:     j = 1     list_dic[each_seq[0]] = []     each_item in each_seq[1:]:         list_dic[each_seq[0]].append(each_seq[0])         list_dic[each_seq[0]].append(j)         list_dic[each_seq[0]].append(each_item)         j += 1  # third, save information txt file    text_file = open("output.txt", "w") each_id in list_dic:     line = '\t'.join(str(each_num) each_num in list_dic[each_id])     text_file.write(line+'\n') text_file.close()

from itertools import chain,count,cycle open("out.txt","wb") f:     eachlist in alllists:         merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])         f.write( "\t".join( map(str,chain.from_iterable(merged)) ) )         f.write("\n")

as far can tell there isnt reason create intermediate dictionary

(that said existing solution seems pretty viable(although little slower likely)

for @sirparselot

>>> seq1 = [42351, 4253, 1264, 5311, 3651] >>> seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651] >>> seq3 = [42353, 1254, 1264] >>> alllists = [seq1, seq2, seq3] >>> eachlist in alllists: ...     merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:]) ...     print "\t".join( map(str,chain.from_iterable(merged)) ) ... 42351   1       4253    42351   2       1264    42351   3       5311    42351    4       3651 42352   1       4254    42352   2       1244    42352   3       1246    42352    4       5311    42352   5       1264    42352   6       3651 42353   1       1254    42353   2       1264

Search This Blog

Overvie

performance - Python speed up the code for reconstructing lists -

Comments

Post a Comment

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

StringGrid issue in Delphi XE8 firemonkey mobile app -

html - jQuery UI Sortable - Remove placeholder after item is dropped -