Click here to Skip to main content
15,878,316 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I need to make a clean copy of a very large object list, about 400,000 records, for this I am doing the following:

C#
public static T DeepClone<T>(this T a)
{
            if (ReferenceEquals(a, null)) return default;
            return JsonConvert.DeserializeObject<T>(JsonConvert.SerializeObject(a));
}


This code works very well in small object lists, but in this one of 400 thousand records, it takes hours, and I usually run out of memory, how can I optimize this? or how can I do it differently?

What I have tried:

I was looking for some third party nuget, but they seem complicated to use, they ask for decorations in the properties of the objects and I didn't get any result.

Now I'm trying a batch processing for this, I'm passing that list in parts to this method, with that I improved the memory problem, but the time is very long.
Posted
Comments
Rob Philpott 21-Dec-23 9:50am    
Serialising to JSON and back out again is a pretty nasty way of cloning objects. Interesting you eventually run out of memory too, that suggests some sort of memory fragmentation otherwise the garbage collector should keep you running. What type goes into generic parameter T? Is it always the same, often different, could it be anything or do you have control over the types that go in there?
Fercap89 21-Dec-23 10:08am    
Thanks for the answer, I have no control over the types of data that go there, they may eventually be different.

What do you suggest instead of using json?
Rob Philpott 21-Dec-23 10:16am    
Yeah, I don't know actually - I don't think I've ever attempted to create a deep clone that can cope with any type, normally you have some insight about what you're cloning, members might be decorated for serialisation etc. Maybe some reflective approach might work, or some solution using expression trees for maximum performance. I found this - might be worth a go: https://stackoverflow.com/questions/129389/how-do-you-do-a-deep-copy-of-an-object-in-net
[no name] 21-Dec-23 13:35pm    
You chunk it. There's no "logical" reason to do it all in one batch.
Fercap89 21-Dec-23 13:53pm    
I'm doing it in batches, that seems to improve the memory usage, but not the speed, it takes many hours to process, I need to improve the execution times.

1 solution

Stop and think about what you are doing: JSON is a human readable text based transfer format, not an efficient storage format.

Suppose your object to deep close was an array of integers, containing five values:
C#
int[] data = {int.MaxValue, int.MaxValue, int.MaxValue, int.MaxValue, int.MaxValue };
string json = Newtonsoft.Json.JsonConvert.SerializeObject(data);
In memory, the aray takes 4 bytes per entry (plus a little overhead, but we'll ignore that for this exercise): 5 * 4 = 20 bytes.
The JSON string is this:
-"JSON"
[2147483647,2147483647,2147483647,2147483647,2147483647]
Which is 56 bytes. If you had 400,000 of these arrays the JSON string would be more than 2.5Mbyte. As your objects get more complex, more JSON "management" data is added to support deserialization, and the size of the object grows.

Now think how big your "real world" data is going to be when converted to JSON: absolutely massive.
That's probably why you run out of memory: The string result is too big and the serializer is running out of memory - I don't know what language Newtonsoft wrote the serializer in or any of the code internals, but it is seriously optimised for speed so there is a good chance it doesn't use the .NET heap in the same way you and I do!

Try it: take your 4000,000 object collection, and prune it to two items. Generate the JSON for that, and see how big the result is. Multiply that by 200,000 and you'll average out close to the final string size.

I'd suggest you go back to the generic deep clone packages you have found already, and re-read their documentation - Json is not the way to go (and XML will have exactly the same problem only much, much worse!)
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900