Note: In this blog post the operations that are presented will override git history. Be careful what you're doing and backup your repo if you're not sure what you're doing.
Few weeks ago I froze gems on my blog and ended up with a very big repository. So, I wanted to clean up the mess and remove permanently gems folder from the repository.
git rm wasn't doing the job well, it only removes the folder from the working tree and the repository still contains the objects of this folder. After a quick search, I found that git-filter-branch was the command I was looking for.
So, you can permanently remove a folder from a git repository with:
git filter-branch --tree-filter 'rm -rf vendor/gems' HEAD
Which will go through the whole commits history in the repository, one by one change the commit objects and rewrite the entire tree.
We use -r (recursive) parameter for recursive remove, and -f (force) to ignore nonexistent files (since folder/files may not be introduced to the repository within the commits range on which we do branch filter).
You can also specify range between commits, where you like to filter:
git filter-branch --tree-filter 'rm -rf vendor/gems' 7b3072c..HEAD
First commit is not being filtered.
If you subsequently try to do branch filters, you should provide -f option to filter-branch to overwrite the backup in refs/original/ where git stores the original refs from the previous branch filter.
git filter-branch -f --tree-filter 'rm -rf vendor/gems' HEAD
You can also remove original refs by hand, or do some backup to other location.
rm -rf .git/refs/original/
Permanently removing files from repository is same as folders:
git filter-branch --tree-filter 'rm filename' HEAD
There are few branch filter types (you can check the documentation), but the one we use here --tree-filter is for rewriting the tree and its contents. You can also use --index-filter which is similar to --tree-filter but does not check the tree, and it goes much faster.
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
--ignore-unmatch parameter is used to ignore nonexistent files.
At the end, don't forget to push the changes to the repository with --force, since this is not a fast forward commit, and the whole history within the commits range we filtered will be rewritten.
git push origin master --force
filter-branch rewrites the history for you, the objects remain in your local repository until they get dereferenced and garbage collected.
To check what's pointing to nuked objects with use the following command. If you have tags and branches in the repo pointing to those objects, you'll most likely see them.
git for-each-ref --format='delete %(refname)' refs/original
To dereference, expire reflog (which by default is 90 days) and force garbage collect, you can do:
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin git reflog expire --expire=now --all git gc --prune=now
You'll need to make sure all branches and tags are pushed to remote (unless you're pushing to a new repo).
git push origin --force --all git push origin --force --tags