I have about 160k commits each with 3 files being updated (been using github as a website), and i'm looking for a way to get the files so I can then put the contents into a real DB.
My question is how can I get (download?) the updated files from each commit, saving them to a folder with a timestamp/commitSHA appended to the name to avoid naming conflicts.
Is this possible with git? I know I can use the github site to see the files and what has changed, but the problem is there are over 160k commits.
This is not the most elegant solution but it should work.
First you have to get a local copy of the repository using:
git clone <repo-url>
You get the
<repo-url> from the GitHub page of your project (check the "Clone or download" button).
cd into the local repo and run something along these lines:
for rev in $(git log --format=%H); do git checkout $rev -- file1 cp file1 ../history/file1-$rev done
Make sure you create the
history directory in advance. Duplicate the two lines inside the loop for each file you need to get.
git reset --hard at the end to let the repository in its original state.
If you also need the timestamp of the file you can get it using
git log --format=%ct file1. Replace the
cp command with:
ts=$(git log --format=%ct $rev file1) cp file1 ../history/file1-$rev-$ts