c# - speed up through parallel.for -
need download lot of data (daily) oracle, delete data on sql server (in case of reruns) , paste data sql server sqlbulk copy. non parallel version :
(int = 0; < curves.rows.count; i++) { //download data oracle var data = getcurve(connectstring, impdate, curves.rows[i]); //delete old data in sql server deletesql(datetime.now, curves.rows[i]); //write sql server sqlbulk.copy writecurve(data, "dbo.t_curves"); }
i wanted speed using parallel for, these steps dependent on each other. came (i have never used task factory before):
parallel.for(0, curves.rows.count, => { var taskload = task.factory.startnew(() => getcurve(connectstring, impdate, curves.rows[i])); var taskdelete = task.factory.startnew(() => deletesql(impdate, curves.rows[i])); taskdelete.wait(); taskload.wait(); var taskwrite = task.factory.startnew(() => writecurve(taskload.result, "dbo.t_curves")); });
this halves time. halving time can expect? correct use of threading? can do? can include further speed ups?
update
just info, curves datatable definitions , m not writing or reading line line.
or should write this:
parallel.for(0, curves.rows.count, => { var taskload = task.factory.startnew(() => getcurve(connectstring, impdate, curves.rows[i])); var taskdelete = task.factory.startnew(() => deletesql(impdate, curves.rows[i])); var res1 = await taskdelete; var res2 = await taskload; writecurve(res2, "dbo.t_curves"); });
q: correct use of threading?
a: yes, doing expensive work on 2 separate tasks, , , waiting them complete before carrying on. +1 @jugarr pointing out don't need task final bit of work
q: can do?
*probably no, sound of doing
- getting single row (from oracle)
- deleting single row (in case exists) in mssql
- adding single row mssql
some recommendations:
- batch up, don't work single row @ time.
- profile slowest part is, retrieving data, deleting it, or inserting? that:
- working single rows, overhead of starting new connection massive. that
- missing indexes when trying retrieve data
- missing indexes when trying delete data because of referential integrity
what throughput rate getting? how many mb/s? each curve 1mb or 1gb?
Comments
Post a Comment