manf manf - 25 days ago 9
HTTP Question

Parsing a smart http git-upload-request response with an early(?) end primer

im trying to download / read a simple git repository over the smart http transfer protocol, but the resulting file contains an end sequence before any object was referenced. I started by downloading a copy of a random repository at Github (Bitbucket produced a similar file) using the following url: https://github.com/

user
/
repo
.git/info/refs?service=git-upload-pack
. This resulted in a file such as (This file was taken and shortened from the Github Git Repository):

001e# service=git-upload-pack
000000fabe5a750939c212bc0781ffa04fabcfd2b2bd744e HEAD multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2:2.6.5~peff-attributes-nofollow-1622-gbbc42c6
003eac84098b7e32406a982ac01cc76a663d5605224b refs/heads/maint
003fbe5a750939c212bc0781ffa04fabcfd2b2bd744e refs/heads/master
003db27dc33dac678b815097aa6e3a4b5db354285f57 refs/heads/next
003b0962616cb70317a1ca3e4b03a22b51a0095e2326 refs/heads/pu
003d0b3e657f530ecba0206e6c07437d492592e43210 refs/heads/todo
003ff0d0fd3a5985d5e588da1e1d11c85fba0ae132f8 refs/pull/10/head
00403fed6331a38d9bb19f3ab72c91d651388026e98c refs/pull/10/merge
...
004549fa3dc76179e04b0833542fa52d0f287a4955ac refs/tags/v2.9.0-rc2^{}
003e47e8b7c56a5504d463ee624f6ffeeef1b6d007c1 refs/tags/v2.9.1
00415c9159de87e41cf14ec5f2132afb5a06f35c26b3 refs/tags/v2.9.1^{}
003ee6eeb1a62fdd0ac7b66951c45803e35f17b2b980 refs/tags/v2.9.2
0041e634160bf457f8b3a91125307681c9493f11afb2 refs/tags/v2.9.2^{}
003ef883596e997fe5bcbc5e89bee01b869721326109 refs/tags/v2.9.3
0041e0c1ceafc5bece92d35773a75fff59497e1d9bd5 refs/tags/v2.9.3^{}
0000


I used the following sources for information regarding parsing:

Protocol Doc

Git Book

JGit Reference Implementation

I did not find any reference in the sources above that would lead me to believe the sequence "0000" to be documented, but git client can clone nonetheless.

In a short inspection of the original source code of git, "pkt-line.(c|h)" did not produce any new findings. The following (Java-)Program illustrates the problem, because it will print "true" in the secound println statement. So apperently 0000 was parsed, treated as the end and in the next statement 00fa will be evaluated and print the following line. As a result of my observation i can only assume that i am missing some detail, popular git clients/server have an implementation flaw or the protocol documentation is unclear. Any help is appreciated!

PS: I am aware that "0000" can mean flush, but that is not specified in this service request.

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;

/**
* Read Git style pkt-line formatting from an input stream.
* <p>
* This class is not thread safe and may issue multiple reads to the underlying
* stream for each method call made.
* <p>
* This class performs no buffering on its own. This makes it suitable to
* interleave reads performed by this class with reads performed directly
* against the underlying InputStream.
*/
public class PacketLineIn {
/**
* Magic return from {@link #readString()} when a flush packet is found.
*/
public static final String END = new StringBuilder(0).toString(); /* must not string pool */

private final InputStream in;

private final byte[] lineBuffer;

/**
* Create a new packet line reader.
*
* @param i the input stream to consume.
*/
public PacketLineIn(final InputStream i) {
in = i;
lineBuffer = new byte[1000];
}

/**
* Read a single UTF-8 encoded string packet from the input stream.
* <p>
* If the string ends with an LF, it will be removed before returning the
* value to the caller. If this automatic trimming behavior is not desired,
* use {@link #readStringRaw()} instead.
*
* @return the string. {@link #END} if the string was the magic flush
* packet.
* @throws IOException the stream cannot be read.
*/
public String readString() throws IOException {
int len = readLength();
if (len == 0) {
return END;
}

len -= 4; // length header (4 bytes)
if (len == 0) {
return ""; //$NON-NLS-1$
}
byte[] raw;
if (len <= lineBuffer.length) {
raw = lineBuffer;
} else {
raw = new byte[len];
}

readFully(in, raw, 0, len);
if (raw[len - 1] == '\n') {
len--;
}
return decodeNoFallback(StandardCharsets.UTF_8, raw, 0, len);
}

/**
* Read a single UTF-8 encoded string packet from the input stream.
* <p>
* Unlike {@link #readString()} a trailing LF will be retained.
*
* @return the string. {@link #END} if the string was the magic flush
* packet.
* @throws IOException the stream cannot be read.
*/
public String readStringRaw() throws IOException {
int len = readLength();
if (len == 0) {
return END;
}

len -= 4; // length header (4 bytes)

byte[] raw;
if (len <= lineBuffer.length) {
raw = lineBuffer;
} else {
raw = new byte[len];
}

readFully(in, raw, 0, len);
return decodeNoFallback(StandardCharsets.UTF_8, raw, 0, len);
}

int readLength() throws IOException {
readFully(in, lineBuffer, 0, 4);
try {
final int len = parseInt16(lineBuffer, 0);
if (len != 0 && len < 4) {
throw new ArrayIndexOutOfBoundsException();
}
return len;
} catch (ArrayIndexOutOfBoundsException err) {
throw new IOException("FUCK U JGIT");
}
}

private static void readFully(InputStream in, byte[] buffer, int off, int length) throws IOException {
if (in.read(buffer, off, length) != length) {
throw new IllegalArgumentException("Not enough spaaaaaaaace!");
}
}

private static int parseInt16(final byte[] args, int start) {
final byte[] data = {
args[start], args[start + 1], args[start + 2], args[start + 3]
};
return Integer.parseInt(new String(data), 16);
}

public static String decodeNoFallback(final Charset cs,
final byte[] buffer, final int start, final int end)
throws CharacterCodingException {
ByteBuffer b = ByteBuffer.wrap(buffer, start, end - start);
b.mark();

// Try our built-in favorite. The assumption here is that
// decoding will fail if the data is not actually encoded
// using that encoder.
try {
return decode(b, StandardCharsets.UTF_8);
} catch (CharacterCodingException e) {
b.reset();
}

if (!cs.equals(StandardCharsets.UTF_8)) {
// Try the suggested encoding, it might be right since it was
// provided by the caller.
try {
return decode(b, cs);
} catch (CharacterCodingException e) {
b.reset();
}
}

// Try the default character set. A small group of people
// might actually use the same (or very similar) locale.
Charset defcs = Charset.defaultCharset();
if (!defcs.equals(cs) && !defcs.equals(StandardCharsets.UTF_8)) {
try {
return decode(b, defcs);
} catch (CharacterCodingException e) {
b.reset();
}
}

throw new CharacterCodingException();
}

private static String decode(final ByteBuffer b, final Charset charset)
throws CharacterCodingException {
final CharsetDecoder d = charset.newDecoder();
d.onMalformedInput(CodingErrorAction.REPORT);
d.onUnmappableCharacter(CodingErrorAction.REPORT);
return d.decode(b).toString();
}

public static void main(String[] args) throws IOException {
final Path input = Paths.get("refs");
final PacketLineIn line = new PacketLineIn(new FileInputStream(input.toFile()));
System.out.println(line.readString());
System.out.println(line.readString() == PacketLineIn.END);
System.out.println(line.readString());
}
}

Answer

This is a flush-pkt and it's defined in Documentation Common to Pack and Http Protocols.

A pkt-line with a length field of 0 ("0000"), called a flush-pkt, is a special case and MUST be handled differently than an empty pkt-line ("0004").

It's used in Packfile transfer protocols. It's also mentioned, though not by name, in HTTP transfer protocols.

smart_reply     =  PKT-LINE("# service=$servicename" LF)
                   ref_list
                   "0000"

and

compute_request   =  want_list
                     have_list
                     request_end
request_end       =  "0000" / "done"
Comments