Alaa Nassef Alaa Nassef - 12 days ago 5
Git Question

Listing or Archiving non binary files from a git repo

I'm currently working on a Java project using JGit. I still didn't use JGit, but I'm assuming that its functionality would be quiet the same as what comes with normal git.

What I'm trying to do is to fetch all non binary files, and files below a certain size from a bare git repo branch, and archive them in a zip file. This task might be simple for a repo with a working directory, since I can simply use

git grep -Ic ''
to list all non binary files, and then pass those files to
git archive
, however this is not doable for bare repositories.

Would be grateful for your help.

Answer

You can use JGit's ArchiveCommand to produce an archive. Its setPaths() method allows you to select only certain paths to be included.

In order to assemble the list of paths you would want to analyze the tree of the commit to be archived. For example:

TreeWalk treeWalk = new TreeWalk( repository );
treeWalk.setRecursive( true );
treeWalk.addTree( commit.getTree() );
while( tree.next() ) {
  ObjectId blobId = getObjectId( 0 );
  if( !isBinary( treeWalk ) {
    filesToArchive.add( treeWalk.getPathString() );
  }
}
treeWalk.close();

The example code walks the entire tree of the commit to be archived, obtains the contents of each file in the tree and calls the fictional isBinary() method to determine whether its content is text or binary. All non-binary files are added to the filesToArchive collection that can be passed to the ArchiveCommand.

For the isBinary() implementation you may succeed in using JGit's attribute support:

Attributes attributes = new AttributesHandler( treeWalk ).getAttributes();
boolean binary = attributes.isSet( AttributesHandler.BINARY_RULE_KEY );

AttributesHandler::getAttributes() returns the merged attributes for the current path represented by the treeWalk.

Comments