Umuril Lyerood Umuril Lyerood - 4 months ago 17
Java Question

From file of path to unique data structure

I'm reading from a file a list of paths. I want to save them in a built-in java structure that can erase the duplicates automatically. By duplicates I mean if I have

/usr/bin
and then I add
/usr
the
bin
folder has to be erased because is "contained" inside the
usr
folder. I read the file sequentially so I'd prefer to not have to check all data twice, if possible.

Example code:

UnknownType<Path> database;
BufferedReader reader = new BufferedReader(new FileReader(new File("db.txt")));

String line;
while ((line = reader.readLine()) != null) {
Path path = Paths.get(line).toRealPath();
database.add(path);
}


Example file:

/usr/bin
/usr
/dev
/dev/sda1
/dev/sda2
/home/user/Desktop/file.txt
/home/user/Documents/file2.txt
/home/user/Documents/file3.txt


Expected output:

data structure containing paths:
/usr
/dev
/home/user/Desktop/file.txt
/home/user/Documents/file2.txt
/home/user/Documents/file3.txt

qxz qxz
Answer

A tree-based solution (probably more efficient):

class Database {

  public void add(String p) {
    root.add(Arrays.asList(p.split("\\\\|/")), 0);
  }

  public void addAll(Collection<? extends String> list) {
    for (String p : list)
    add(p);
  }

  public List<String> getPathsList() {
    ArrayList<String> list = new ArrayList<>();
    root.listPaths(list, "");
    return list;
  }

  PathNode root = new PathNode("");

  static class PathNode {

    public final String name;
    public Map<String, PathNode> children = new HashMap<>();

    public PathNode(String name) {
      this.name = name;
    }

    public boolean isLeaf() {
      return children.size()==0;
    }

    public boolean isRoot() {
      return name.isEmpty();
    }

    public void add(List<String> path, int i) {
      String childName = path.get(i);
      PathNode child = children.get(childName);

      if (child != null) {
        if (path.size()-i <= 1) child.children.clear();
        else child.add(path, i+1);
      } else if (!isLeaf() || isRoot()) {
        PathNode node = this;
        for (; i < path.size(); i++) {
          String key = path.get(i);
          node.children.put(key, node = new PathNode(key));
        }
      }
    }

    public void listPaths(ArrayList<String> list, String prefix) {
      for (PathNode child : children.values()) {
        if (child.isLeaf()) list.add(prefix+child.name);
        else child.listPaths(list, prefix+child.name+File.separator);
      }
    }

  }

}

Test to verify correctness: http://ideone.com/cvqEVT

This implementation will accept both Windows and Unix paths when running on any platform. The paths returned by Database.getPathsList() will still use the OS's file separator; you could change that by changing File.separator in Database.PathNode.listPaths (the last line of real code).