Code performance optimization

Question

0.00/5 (No votes)

See more:

I need to make a clean copy of a very large object list, about 400,000 records, for this I am doing the following:

C#

public static T DeepClone<T>(this T a)
{
            if (ReferenceEquals(a, null)) return default;
            return JsonConvert.DeserializeObject<T>(JsonConvert.SerializeObject(a));
}

This code works very well in small object lists, but in this one of 400 thousand records, it takes hours, and I usually run out of memory, how can I optimize this? or how can I do it differently?

What I have tried:

I was looking for some third party nuget, but they seem complicated to use, they ask for decorations in the properties of the objects and I didn't get any result.

Now I'm trying a batch processing for this, I'm passing that list in parts to this method, with that I improved the memory problem, but the time is very long.

Posted 21-Dec-23 2:58am

Fercap89

Add a Solution

Comments

Rob Philpott 21-Dec-23 9:50am

Serialising to JSON and back out again is a pretty nasty way of cloning objects. Interesting you eventually run out of memory too, that suggests some sort of memory fragmentation otherwise the garbage collector should keep you running. What type goes into generic parameter T? Is it always the same, often different, could it be anything or do you have control over the types that go in there?

Fercap89 21-Dec-23 10:08am

Thanks for the answer, I have no control over the types of data that go there, they may eventually be different.

What do you suggest instead of using json?

Rob Philpott 21-Dec-23 10:16am

Yeah, I don't know actually - I don't think I've ever attempted to create a deep clone that can cope with any type, normally you have some insight about what you're cloning, members might be decorated for serialisation etc. Maybe some reflective approach might work, or some solution using expression trees for maximum performance. I found this - might be worth a go: https://stackoverflow.com/questions/129389/how-do-you-do-a-deep-copy-of-an-object-in-net

[no name] 21-Dec-23 13:35pm

You chunk it. There's no "logical" reason to do it all in one batch.

Fercap89 21-Dec-23 13:53pm

I'm doing it in batches, that seems to improve the memory usage, but not the speed, it takes many hours to process, I need to improve the execution times.

[no name] 21-Dec-23 14:34pm

Then you look at the hardware and the number of threads. And I "zip" my serialized documents (in memory) before writing to disk ... where XML (which is more verbose than JSON) reduces to 10%; which translates to fewer disk writes (the usual bottleneck).

Kenneth Haugland 22-Dec-23 7:06am

Any reason not to try Binary serializing? Otherwise, can you demand that T implements IClonable?

PIEBALDconsult 22-Dec-23 11:06am

I doubt there is any actual reason to have all of those objects in memory at once. Rethink.

0x01AA 24-Dec-23 5:50am

I found this code, no idea wheter it works:

T DeepCopy<T>(T instance)
{
  BinaryFormatter formatter=new BinaryFormatter();

  using(var stream=new MemoryStream())
  {
    formatter.Serialize(stream, instance);
    stream.Position=0;

    return (T)formatter.Deserialize(stream);
  }
}

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Answer 1 · 2023-12-23T22:51:00

Stop and think about what you are doing: JSON is a human readable text based transfer format, not an efficient storage format.

Suppose your object to deep close was an array of integers, containing five values:

C#

int[] data = {int.MaxValue, int.MaxValue, int.MaxValue, int.MaxValue, int.MaxValue };
string json = Newtonsoft.Json.JsonConvert.SerializeObject(data);

In memory, the aray takes 4 bytes per entry (plus a little overhead, but we'll ignore that for this exercise): 5 * 4 = 20 bytes.
The JSON string is this:

-"JSON"

[2147483647,2147483647,2147483647,2147483647,2147483647]

Which is 56 bytes. If you had 400,000 of these arrays the JSON string would be more than 2.5Mbyte. As your objects get more complex, more JSON "management" data is added to support deserialization, and the size of the object grows.

Now think how big your "real world" data is going to be when converted to JSON: absolutely massive.
That's probably why you run out of memory: The string result is too big and the serializer is running out of memory - I don't know what language Newtonsoft wrote the serializer in or any of the code internals, but it is seriously optimised for speed so there is a good chance it doesn't use the .NET heap in the same way you and I do!

Try it: take your 4000,000 object collection, and prune it to two items. Generate the JSON for that, and see how big the result is. Multiply that by 200,000 and you'll average out close to the final string size.

I'd suggest you go back to the generic deep clone packages you have found already, and re-read their documentation - Json is not the way to go (and XML will have exactly the same problem only much, much worse!)