Click here to Skip to main content
15,559,275 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,

Since last week, i have been trying to split the large file logically. I have a large transaction log file (3GB) which is in JSON format. That log file contains the xml content (Request and resposne from Soapui) taken from the Readyapi tool. Now i want to split this file into small files without missing the continution of request/response of xml content. or splitting the file based on request and response. i have a below code for splitting into small files based on memory in bytes. but that code cuts the xml request/response into half. Is it possible to split the files based on request and response instead of memory or lines based?

Please hepl me on this.

Note: My log file looks like below [xml requests and responses are in JSON format]


"request":{"method":"POST","url":"http://10.174.104.40:8001/ws/de-partner-ppil_gw-v4-vsSTD/retrieveContact","httpVersion":"HTTP/1.1","cookies":[],"headers":[{"name":"SOAPAction","value":"\"\""},{"name":"Authorization","value":"Basic R1dQQzpHUF9QQ19pbnRlcm5fVGVzdA=="},{"name":"Accept","value":"text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2"},{"name":"Cache-Control","value":"max-age=259200"},{"name":"User-Agent","value":"Java/11.0.10"},{"name":"Connection","value":"keep-alive"},{"name":"X-Forwarded-For","value":"10.1.115.1"},{"name":"Host","value":"10.174.104.40:8001"},{"name":"Content-Length","value":"25916"},{"name":"Content-Type","value":"text/xml;charset=UTF-8"},{"name":"Via","value":"1.1 egress-https-proxy-1-vjrls (squid/3.5.20)"}],"queryString":[],"postData":{"mimeType":"text/xml;charset=UTF-8","params":[],"text":"\n<tns:envelope xmlns:tns="\"http://schemas.xmlsoap.org/soap/envelope/\"">\n <tns:header>\n <axahdr:contextheader xmlns:axahdr="\"http://aems.corp.intraxa/eip/2012/03/schemas/envelope\"">\n <axahdr:audittimestamp>2021-05-21T12:37:42.911+02:00\n <axahdr:addressing>\n <axahdr:messageid>5063a91c-1a80-4a2d-ae8d-bf48fd6e7463\n <axahdr:action>\n <axahdr:sourceendpoints>\n <axahdr:sourceendpoint>\n <wsa:address xmlns:wsa="\"http://www.w3.org/2005/08/addressing\"/">\n \n \n <axahdr:conversationid>\n <axahdr:taskid>GWPC-35907a43-583d-4479-9070-7f317707f14f\n \n <axahdr:messagemetadata>\n <axahdr:serviceid>de-partner-ppil_gw-v4-vsSTD\n <axahdr:serviceoperation>retrieveContact\n <axahdr:stage>DEV\n <axahdr:substage>DEV\n <axahdr:interaction>sync\n \n <axahdr:requesters>\n <axahdr:requester>\n <axahdr:opco>AXA.de\n <axahdr:businessdomain>\n <axahdr:applicationsystem>\n <axahdr:application>GWPC\n <axahdr:applicationinstanceid>timetravel-policycenter-05\n \n \n <axahdr:security>\n

What I have tried:

Below code is to split the file based on memory in bytes which works fine.

package soapui_Sample;
import java.io.BufferedOutputStream;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.RandomAccessFile;


public class Split2
{
    public static void main(String[] args) throws IOException
    {   	
            RandomAccessFile raf = new RandomAccessFile("C:\\Users\\X921763\\Desktop\\Logs\\GW_TT_Functional_Virt-20210521_084950_250.har", "r");
            long numSplits = 5; //from user input, extract it from args
            long sourceSize = raf.length();
            System.out.println(sourceSize);
            long bytesPerSplit = sourceSize/numSplits ;
           System.out.println(bytesPerSplit);
            long remainingBytes = sourceSize % numSplits;

            int maxReadBufferSize = 8 * 1024; //8KB
            for(int destIx=1; destIx <= numSplits; destIx++) {
                BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("C:\\Users\\X921763\\Desktop\\Logs\\Outputlogs\\GW_TT_Functional_Virt-20210521_084950."+destIx));
                if(bytesPerSplit > maxReadBufferSize) {                	
                    long numReads = bytesPerSplit/maxReadBufferSize;
                   // System.out.println("numreads: "+numReads);
                    long numRemainingRead = bytesPerSplit % maxReadBufferSize;
                    //System.out.println("numRemainingRead: "+numRemainingRead);
                    for(int i=0; i<numReads; i++) {
                        readWrite(raf, bw, maxReadBufferSize);
                    }
                    if(numRemainingRead > 0) {
                        readWrite(raf, bw, numRemainingRead);
                    }
                }else {
                    readWrite(raf, bw, bytesPerSplit);
                }
                bw.close();
            }
            if(remainingBytes > 0) {
                BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1)));
                readWrite(raf, bw, remainingBytes);
                bw.close();
            }
                raf.close();
        }

        static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
            byte[] buf = new byte[(int) numBytes];
           // System.out.println("bytes variable"+buf);
            int val = raf.read(buf);
            //System.out.println("Val variable"+val);
            if(val != -1) {
            	//System.out.println("Suresh2");
                bw.write(buf);
                bw.flush();
            }
        }
    }
Posted
Updated 25-May-21 20:32pm
v2
Comments
Richard MacCutchan 26-May-21 3:28am    
You cannot split such a file by some arbitrary number of characters. You need to parse the content so you keep the request and response sections together in each small file.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900