csv - Python reading cvs files recursively in tree directory and append one of the two columns to data frame -


i have root directory contains hundreds sub-folders. want read csv files in each sub-folder, names same, study.csv

after reading csv files, want create data frame store part of data csv files. new data frame contain 3 columns. 1 column newly created mark csv file id, , other 2 columns 2 of csv file columns.

for example: structure of original csv file is:

row1.... row2.... row3.... row4: column1 column2 column3 column14 column5 row5:    1       2      3         4      5 row6:    2       4      2         1      10 row7:    3       8      9        11      23 ... 

the expected data frame want:

new column       column3       column4 1                  3              4 1                  2              1 1                  2              1 1                  9              11 

so read csv files starting row 4, new column in data frame, value same if rows same csv files. can regard new column csv file id.

i found os.walk me traverse tree directory, how can read 2 of specific columns in csv while creating new column id accordingly?

to iterate on each csv file in root directory (including sub folders), iterate on os.walk() , check each file .csv file extention, pass filepath , filename process_file()

for root, dirs, files in os.walk(root_dir):     fi in files:         if fi.split(".")[-1] == 'csv':             process_file(root + fi) 

load each line of csv file list., can separate values in each line string.split().

each value can referenced row number , column number csv_file[row_num][col_num]

to process single file, can iterate using values row_num , col_num want:

def process_file(filename):     title_line = 3 # indexing starts @ 0, 1 less 4     cols_to_keep = [0, 2, 3]      # load entire csv file list (not massive files)     f_lines = open(filename).readlines()      out_file = open("out.csv", "w")      f_lines = [line.strip().split(",") line in f_lines] # split each line in f_lines     if os.stat("file").st_size == 0: # if file empty, add title line         out_file.write(",".join(f_lines[title_line]))     line in f_lines[title_line:]: # each line after title line         new_line = []         col_index in cols_to_keep:            new_line.append(line[col_index])         out_file.write(",".join(new_line)) 

Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -