Tribe Tribe - 1 month ago 4x
Git Question

Get files from each git commit

I have about 160k commits each with 3 files being updated (been using github as a website), and i'm looking for a way to get the files so I can then put the contents into a real DB.

My question is how can I get (download?) the updated files from each commit, saving them to a folder with a timestamp/commitSHA appended to the name to avoid naming conflicts.

Is this possible with git? I know I can use the github site to see the files and what has changed, but the problem is there are over 160k commits.


This is not the most elegant solution but it should work.

First you have to get a local copy of the repository using:

git clone <repo-url>

You get the <repo-url> from the GitHub page of your project (check the "Clone or download" button).

Then you cd into the local repo and run something along these lines:

for rev in $(git log --format=%H); do
    git checkout $rev -- file1
    cp file1 ../history/file1-$rev

Make sure you create the history directory in advance. Duplicate the two lines inside the loop for each file you need to get.

Run git reset --hard at the end to let the repository in its original state.

If you also need the timestamp of the file you can get it using git log --format=%ct file1. Replace the cp command with:

ts=$(git log --format=%ct $rev file1)
cp file1 ../history/file1-$rev-$ts

Check the documentation for other file or commit properties you can get using get log.