How to awk every nth line starting from different lines each iteration -

i awk print every nth line out of file starting line 0. then, after awk has gone through whole file, print every nth line starting line 1...then print every nth line starting line 2...etc, printing every nth line starting line n-1. sad attempt far:

#!/bin/bash  rm *.sad *.sadd *.out  #create loop index in $(seq 20 1 36);         listm+=($i) done  #create input file j in "${listm[@]}"         if [ $j -eq 20 ];                         awk 'nr % 20 == 0' vel_vmdout > atomvel.dat                 awk '{print $2,$3,$4}' atomvel.dat > velocity.dat         else                 awk 'nr % 20 == 1' vel_vmdout  > $j.sad                 egrep -v "^[[:space:]]*$|^#" $j.sad > $j.sadd                 awk '{print $2, $3, $4}' $j.sadd > $j.out                 paste velocity.dat $j.out  > taste         fi done

let me try clarify providing input , output should like. th input xyz file of md simulation consisting of frames of atoms' xyz coordinates.

input:

input http://i61.tinypic.com/2l8hz79.jpg

this image shows 1st snapshot , part of second snapshot. because these snapshot, ordering of atoms not change. thus, trying print xyz coordinates each snapshot each specific atom in own columns shown below. make file consisting of 3n columns, n number of atoms.

output:

output http://i60.tinypic.com/i3v1ax.png

as can see, each atoms' coordinates in own columns , total file nx3n array. bash script me trying this, first 2 atoms. wanted print every nth line (coordinates of nth atom) output. appreciate patience all.

generating sample data

this step should not necessary; question should have included usable sample data , required output sample data.

at 1 level, won't because don't have random number generator program, script below shows how generated data follows, , illustrates lengths might necessary go when question doesn't supply readable data. generated data looks similar data in question (at least superficially):

18  generated vmd in absentia  c     0.979485   -6.665347   0.575383  c     1.191999   -3.002386   2.859484  c     3.151517   -5.610077   0.429413  c     3.439828   -6.454984   1.319724  c     3.726201   -0.123038   2.096854  c     1.363325   -3.031238   0.016019  c     6.090283   -3.915340   2.396358  c     0.407755   -7.957784   -0.846842  c     0.203074   -0.796428   2.659573  o     2.600610   -2.259674   -0.260378  o     4.773839   -6.765097   0.588508  h     2.743424   -2.890016   2.906452  h     2.810233   -6.641054   -0.797672  h     6.854169   -3.191721   -0.925670  o     2.914233   -1.060001   0.776983  h     3.803923   -1.497032   2.908799  h     5.669443   -7.227666   -0.647552  h     0.092455   -5.850637   2.959987 18  generated vmd in absentia  c     6.042840   -7.254720   2.093573  c     2.551942   -6.044322   2.061072  c     3.523150   -6.167163   2.451689  c     5.197316   -3.429866   -0.412062  c     2.548777   -6.422851   1.282846  c     3.775197   -2.012031   1.377440  c     3.405112   -3.206415   -0.879886  c     1.448359   -5.419629   0.467291  c     3.661964   -2.789234   2.644294  o     4.214854   -2.439574   -0.951704  o     5.297609   -2.320418   2.709898  h     2.653940   -4.431080   -0.511743  h     5.040635   -0.676199   -0.590970  h     1.546725   -1.294582   2.562937  o     4.231461   -7.180908   1.629901  h     3.297836   -1.557133   -0.133280  h     3.442481   -4.489962   2.111930  h     1.423611   -7.982655   0.715618 18  generated vmd in absentia  c     1.432495   -7.686243   2.525734  c     5.038409   -4.976270   2.826846  c     6.184137   -7.303094   2.711561  c     3.208125   -0.606556   1.978725  c     2.171859   -6.792060   0.678988  c     6.521124   -5.622797   -0.773797  c     1.725619   -5.768633   -0.223397  c     3.602427   -2.325680   1.762008  c     1.937521   -1.686895   1.743159  o     0.745526   -0.114246   -0.949490  o     4.754360   -6.531145   1.998913  h     1.114732   -1.158810   1.486939  h     6.410490   -5.411647   0.062737  h     4.164330   -6.743763   1.802804  o     2.587841   -3.979700   2.609748  h     2.192073   -2.815376   -0.809569  h     5.501795   -2.326438   1.325829  h     3.285032   -1.212541   1.284453 18  generated vmd in absentia  c     3.564424   -3.117406   -0.032879  c     2.894745   -0.632591   0.532311  c     3.384916   -5.383135   1.179585  c     0.793488   -0.894539   -0.886891  c     1.348785   -6.501867   1.648604  c     2.189941   -2.438067   0.616090  c     2.043378   -4.966472   0.691603  c     3.124161   -5.792896   0.545362  c     5.741472   -0.640590   2.825374  o     0.300550   -7.149663   0.942726  o     1.344387   -0.121382   2.169401  h     4.963296   -0.964665   -0.230523  h     6.651423   -4.905053   2.509626  h     5.059694   -6.166516   0.102255  o     5.046864   -3.288883   0.853948  h     2.389007   -3.057664   1.806301  h     2.365876   -0.956860   1.458959  h     2.892502   -0.097422   -0.531714

the script used was:

random -n $((4 * 18)) -t '%8:6[0:7]f   %8:6[-8:0]f   %8:6[-1:3]f' | awk 'begin { n = split("cccccccccoohhhohhh", atoms, ""); atoms[0] = atoms[n] }      nr % n == 1 { print n; print " generated vmd in absentia" }      { print "", atoms[nr%18], "   ", $0 }'

the -n option random says how many rows generate; chose 72. -t option template, , notation %8:6[0:7]f means use %8.6f format print uniformly distributed random numbers between 0 , 7. awk script takes data generated , interpolates noise (the number of atoms , variant on 'generated vmd' line), tagging lines appropriate atomic symbol.

processing sample data

given data, need munge required output. script more or less job. there endless ways should improved, of course, such taking file names command line arguments, using temporary file names instead of fixed names, cleaning intermediate files, different compounds, different atoms (nitrogen, phosphorous, etc), , on. however, should adapt reasonably easily.

input="data" output="output" n=$(sed 1q "$input") n2=$(($n+2))  ((i = 3; <= n2; i++))     colno=$(printf "%.2d" $(($i-2)))     awk -v n=$n2 -v r=$i \         '   begin { name["c"] = "carbon"; name["h"] = "hydrogen"; name["o"] = "oxygen";                     r0 = r % n }             nr > 2 && nr <= r { count[$1]++; }             nr == r { printf "%-32.32s\n", name[$1] " " count[$1]; }             nr % n == r0 { xyz = sprintf("%s %s %s", $2, $3, $4); printf "%-32.32s\n", xyz }         ' "$input" > "column.$colno" done  paste -d ' ' column.* > "$output"

the first 4 lines set control parameters, collecting number of lines per unit of data input file, , adjusting things accordingly. for loop iterates on offsets 3 $n2 inclusive (skipping 2 header lines), , runs awk script. encodes atom types (begin), determines atom processing time (nr > 2 && nr <= r , nr == r), , arranges print triplets of data relevant atom. formatting organized column headings , actual xyz-triplets uniformly spaced. these written file column.$colno. when all's done, column.* files pasted generate single output file, looks this:

carbon 1                         carbon 2                         carbon 3                         carbon 4                         carbon 5                         carbon 6                         carbon 7                         carbon 8                         carbon 9                         oxygen 1                         oxygen 2                         hydrogen 1                       hydrogen 2                       hydrogen 3                       oxygen 3                         hydrogen 4                       hydrogen 5                       hydrogen 6                       0.979485 -6.665347 0.575383      1.191999 -3.002386 2.859484      3.151517 -5.610077 0.429413      3.439828 -6.454984 1.319724      3.726201 -0.123038 2.096854      1.363325 -3.031238 0.016019      6.090283 -3.915340 2.396358      0.407755 -7.957784 -0.846842     0.203074 -0.796428 2.659573      2.600610 -2.259674 -0.260378     4.773839 -6.765097 0.588508      2.743424 -2.890016 2.906452      2.810233 -6.641054 -0.797672     6.854169 -3.191721 -0.925670     2.914233 -1.060001 0.776983      3.803923 -1.497032 2.908799      5.669443 -7.227666 -0.647552     0.092455 -5.850637 2.959987      6.042840 -7.254720 2.093573      2.551942 -6.044322 2.061072      3.523150 -6.167163 2.451689      5.197316 -3.429866 -0.412062     2.548777 -6.422851 1.282846      3.775197 -2.012031 1.377440      3.405112 -3.206415 -0.879886     1.448359 -5.419629 0.467291      3.661964 -2.789234 2.644294      4.214854 -2.439574 -0.951704     5.297609 -2.320418 2.709898      2.653940 -4.431080 -0.511743     5.040635 -0.676199 -0.590970     1.546725 -1.294582 2.562937      4.231461 -7.180908 1.629901      3.297836 -1.557133 -0.133280     3.442481 -4.489962 2.111930      1.423611 -7.982655 0.715618      1.432495 -7.686243 2.525734      5.038409 -4.976270 2.826846      6.184137 -7.303094 2.711561      3.208125 -0.606556 1.978725      2.171859 -6.792060 0.678988      6.521124 -5.622797 -0.773797     1.725619 -5.768633 -0.223397     3.602427 -2.325680 1.762008      1.937521 -1.686895 1.743159      0.745526 -0.114246 -0.949490     4.754360 -6.531145 1.998913      1.114732 -1.158810 1.486939      6.410490 -5.411647 0.062737      4.164330 -6.743763 1.802804      2.587841 -3.979700 2.609748      2.192073 -2.815376 -0.809569     5.501795 -2.326438 1.325829      3.285032 -1.212541 1.284453      3.564424 -3.117406 -0.032879     2.894745 -0.632591 0.532311      3.384916 -5.383135 1.179585      0.793488 -0.894539 -0.886891     1.348785 -6.501867 1.648604      2.189941 -2.438067 0.616090      2.043378 -4.966472 0.691603      3.124161 -5.792896 0.545362      5.741472 -0.640590 2.825374      0.300550 -7.149663 0.942726      1.344387 -0.121382 2.169401      4.963296 -0.964665 -0.230523     6.651423 -4.905053 2.509626      5.059694 -6.166516 0.102255      5.046864 -3.288883 0.853948      2.389007 -3.057664 1.806301      2.365876 -0.956860 1.458959      2.892502 -0.097422 -0.531714

your task understand why bits of awk script present. example, why r0 needed (hint, experiment without r0 calculation, , use r in place).

Search This Blog

Overvie