Markus Markus - 5 months ago 7
Java Question

Split string after approx 100 chars and next sign (Java)

I would like to split a string after approx. 200 chars or the next special sign:

The string is formatted like

<data>|...|<data>|
, where one
<data>
block is between 30 and 70 chars.

My desired result would be a String array like

<data>|<data>|
<data>|
<data>|<data>|<data>|


where every line is approx 200 chars long.

My code looks like

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.junit.Test;

public class RegexpTest {

@Test
public void testRegexp() throws Exception {
String data = "Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|";
String pat = ".{1,200}(\\d|\\s|\\w|\\.|\\:{1,70})\\|";
String ans = data.replaceAll(pat, "X");
//Pattern regex = Pattern.compile(pat);
//Matcher regexMatcher = regex.matcher(str);

System.out.println(data.length()); //prints 528
System.out.println(ans.length()); //prints 3
}
}


The result produces a correct amount of replacements (3) but the overall result should be a String array.

Is there a regexp (similar to SO Q&A) that could handle this problem? A solution with for loops is also acceptable.

Scratch Pad:

Feel free to test on regex101.com (includes my attempt and the test data)

Answer

Without regex. Just split the data at the "|". Then check if adding a part to the existing line will exceed the 200 characters. If it does then start a new line. Quick and dirty:

edit: added comments and formatting

public static void main(String[] args) {
    // your data
    String data = "Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|Symbol Ticker:1466654463000:157.71:TRADE:42|";
    // do the split
    List<String> out = new Test().splitToApproxAt(data, 200);
    // print the splitted lines
    for(String o : out){
        System.out.println(o);
    }
}

public List<String> splitToApproxAt(String data, int len){
    // split at the pipe symbol "|"
    String[] parts = data.split("\\|");

    // this will be our current line in progress
    String line = "";

    // this will store the lines up to 200 chars
    List<String> out = new ArrayList<String>();

    // for every data-part
    for( String part : parts ){
        if(part.length() > len){
            System.out.println("oh shit, what do?");
            continue;
        }
        // would this exceed the 200 chars?
        if( line.length() + part.length() > len){
            // yes! add previous line to output
            // and start a new one.
            out.add(line);
            line = part;
        }else{
            // no we can attach that to the
            // current line
            if(line.length()>0){
                // delimit with pipe
                line += "|" +part;
            }else{
                // line was empty, no pipe
                line = part;
            }
        }
    }
    // add the last line to the output
    out.add(line);
    return out;
}