Permanently remove files and folders from Git repo

Note: In this blog post the operations that are presented will override git history. Be careful what you're doing and backup your repo if you're not sure what you're doing.

Few weeks ago I froze gems on my blog and ended up with a very big repository. So, I wanted to clean up the mess and remove permanently gems folder from the repository. git rm wasn't doing the job well, it only removes the folder from the working tree and the repository still contains the objects of this folder. After a quick search, I found that git-filter-branch was the command I was looking for.

So, you can permanently remove a folder from a git repository with:

git filter-branch --tree-filter 'rm -rf vendor/gems' HEAD

Which will go through the whole commits history in the repository, one by one change the commit objects and rewrite the entire tree.

We use -r (recursive) parameter for recursive remove, and -f (force) to ignore nonexistent files (since folder/files may not be introduced to the repository within the commits range on which we do branch filter).

You can also specify range between commits, where you like to filter:

git filter-branch --tree-filter 'rm -rf vendor/gems' 7b3072c..HEAD

First commit is not being filtered.

If you subsequently try to do branch filters, you should provide -f option to filter-branch to overwrite the backup in refs/original/ where git stores the original refs from the previous branch filter.

git filter-branch -f --tree-filter 'rm -rf vendor/gems' HEAD

You can also remove original refs by hand, or do some backup to other location.

rm -rf .git/refs/original/

Permanently removing files from repository is same as folders:

git filter-branch --tree-filter 'rm filename' HEAD

There are few branch filter types (you can check the documentation), but the one we use here --tree-filter is for rewriting the tree and its contents. You can also use --index-filter which is similar to --tree-filter but does not check the tree, and it goes much faster.

git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

--ignore-unmatch parameter is used to ignore nonexistent files.

At the end, don't forget to push the changes to the repository with --force, since this is not a fast forward commit, and the whole history within the commits range we filtered will be rewritten.

git push origin master --force

Posted by Dalibor Nasevic on and tagged with: .