Click here to Skip to main content
15,885,216 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi community,

I am new to parallel coding and have some trouble understanding what I have read about it.

Let's assume the following scenario:

I have 5 tables in a sql server and I receive data from source X which needs to be filtered,sorted and validate and the resulting string[][] needs to be transformed to dataRow[] and uploaded to the sql server tables.


My sequentiel solution works, but to be honest - I have an eight core processor...

To transform my problem I thought I might use the TPL with a for loop. (iterating 0 to 4 = 1 taks per table). Based on the iteration index I would perform a specific LINQ query on my array and then simply take a new DataRow with the respective table schema and populate its fields. I have set the tables ID to increment automatically. My sequentiel solution therefore does not provide a value for the ID column of my DataRow (it will be done by the SQL server anyways).

Problem:
The string[][] exists outside of the TPL for loop and therefore needs synchronisation - is that correct ?

Is it further correct that all variables that I create within the TPL for loop are threadsafe - and that I therefore do not need to synchronise them? Meaning creating the DataRow withing the for loop should be fine in terms of exceptions ?

Last question regarding the TPL for loop. Will it automatically wait for all tasks before the main thread continues or do I have to Call Task.WaitAll() ? In that case, wouldn't it be better to create individual tasks add them to an array and do Task.WaitAll(arrayOfTasks) ?

I am using ADO DataTables -> therefore I want to wait for all the updates/changes to the local tables and then simply update the entire database at once.


I am happy about pseudo solutions, but I am more interested about understanding the concept correctly. Am I approaching this problem correctly or should I create a normal for loop and within the for loop create Tasks ?

As always - thanks for your help and time.

-DK
Posted

1 solution

You have a string[][]:
C#
string[][] myArray = new string[][] {};

You can either use one method for all tables, or define a separate method for each table.

Using one method for all tables, you define a list of tables
C#
List<string> tables = new List<string> { "table1", "table2", "table3", "table4", "table5" };

Next, you use PLINQ to execute your method for each table in parallel:
C#
tables.AsParallel().ForAll(table => DoSomethingWithTheArray(myArray, table));

...

private static void DoSomethingWithTheArray(string[][] myArray, string table)
{
    switch (table)
    {
        case "table1":
            // Perform transformation of array to table 1
            break;
        // etc.
    }
}

If you want to structure your code a bit better, and define a separate method for each table, you change the list of tables to be a list of methods:
C#
List<Action<string[][]>> transformations = new List<Action<string[][]>>
{
    TransformArrayForTableOne,
    // etc...
};

Then, change your PLINQ query to run all the methods in parallel:
C#
transformations.AsParallel().ForAll(transformation => transformation(myArray));

...

private static void TransformArrayForTableOne(string[][] obj)
{
    // Do something with the array specific for this table
}

private static void TransformArrayForTableTwo(string[][] obj)
{
    // Do something with the array specific for this table
}

// etc.

In each of the methods, you would perform the transformation to the specific DataRows. It's up to you how you want to commit these to the database.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900