andyJ andyJ - 2 months ago 6
C# Question

Removing elements from JSON based on a condition in C#

I have a JSON string that I want to be able to amend in C#. I want to be able to delete a set of data based when one of the child values is a certain value.

Take the following

{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"explainOther":"",
"fl":"*,score",
"indent":"on",
"start":"0",
"q":"*:*",
"hl.fl":"",
"qt":"",
"wt":"json",
"fq":"",
"version":"2.2",
"rows":"2"}
},
"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":
[{
"id":"438500feb7714fbd9504a028883d2860",
"name":"John",
"dateTimeCreated":"2012-02-07T15:00:42Z",
"dateTimeUploaded":"2012-08-09T15:30:57Z",
"score":1.0
},
{
"id":"2f7661ae3c7a42dd9f2eb1946262cd24",
"name":"David",
"dateTimeCreated":"2012-02-07T15:02:37Z",
"dateTimeUploaded":"2012-08-09T15:45:06Z",
"score":1.0
}]
}}


There are two response results shown above. I want to be able to remove the whole parent response result group when its child "id" value is matched, for example if my id was "2f7661ae3c7a42dd9f2eb1946262cd24", I would want the second group to be deleted and thus my result would look as follows.

{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"explainOther":"",
"fl":"*,score",
"indent":"on",
"start":"0",
"q":"*:*",
"hl.fl":"",
"qt":"",
"wt":"json",
"fq":"",
"version":"2.2",
"rows":"2"}},
"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
{
"id":"438500feb7714fbd9504a028883d2860",
"name":"John",
"dateTimeCreated":"2012-02-07T15:00:42Z",
"dateTimeUploaded":"2012-08-09T15:30:57Z",
"score":1.0
}]
}}


I will need to perform multiple delete operations on the Json file. The Json file could contain thousands of results and I really need the most performant way possible.

Any help greatly appreciated.

Answer

I've been attempting to compress this into a nicer LINQ statement for the last 10 minutes or so, but the fact that the list of known Ids is inherently changing how each element is evaluated means that I'm probably not going to get that to happen.

        var jObj = (JObject)JsonConvert.DeserializeObject(json);
        var docsToRemove = new List<JToken>();
        foreach (var doc in jObj["response"]["docs"])
        {
            var id = (string)doc["id"];
            if (knownIds.Contains(id))
            {
                docsToRemove.Add(doc);
            }
            else
            {
                knownIds.Add(id);
            }
        }
        foreach (var doc in docsToRemove)
            doc.Remove();

This seems to work well with the crappy little console app I spun up to test, but my testing was limited to the sample data above so if there's any problems go ahead and leave a comment so I can fix them.

For what it's worth, this will basically run in linear time with respect to how many elements you feed it, which is likely all the more algorithmic performance you're going to get without getting hilarious with this problem. Spinning each page of ~100 records off into its own task using the Task Parallel Library invoking a worker that will handle its own little page and returned the cleaned JSON string comes to mind. That would certainly make this faster if you ran it on a multi-cored machine, and I'd be happy to provide some code to get you started on that, but it's also a huge overengineering for the scope of the problem as it's presented.

Comments