How do I keep files in a branch if they are removed from master?

Rija Saleem

I have to checkout to master and pull changes, but in those changes, a few of the files I was working on have been removed and new files made. I want to keep those files in my branch and push them to master again.

How do I do this?

torek

There's a flaw in this question, which we can demonstrate:

$ cd ~/tmp && mkdir x && cd x && git init
Initialized empty Git repository in ...
$ echo demonstrate that files are not in branches > README
$ git add README && git commit -m initial
[master (root-commit) 7490a25] initial
 1 file changed, 1 insertion(+)
 create mode 100644 README
$ echo file 1 > file1
$ echo file 2 > file2
$ git add file1 file2
$ git commit -m 'add some files'
[master 5b25569] add some files
 2 files changed, 2 insertions(+)
 create mode 100644 file1
 create mode 100644 file2
$ git rm file1 
rm 'file1'
$ git commit -m 'remove file1'
[master 15f7d52] remove file1
 1 file changed, 1 deletion(-)
 delete mode 100644 file1
$ ls
file2   README
$ git branch -a
* master

Exercises:

How many branches are there?
Which branch has file1 in it? (This is a trick question)
What files are in branch master? (This is also a trick question)

Before you read any others answers here, see what your own answers are.

I might use the spoiler hide/reveal trick from some of the other StackExchange sites, if they worked here, but they don't, so just try not to peek too far down to actual answers, where I discuss the fact that questions two and three are trick questions. For that matter, question one is a bit of a trick too.

Some wrong answers, but hey, they're trick questions...

OK, so, first: the obvious answers are:

1
master
file2 and README.

And yet if file1 is in branch master, where is it? If it's not in master, where is it?

The trick

The trick here is that Git is not about files. It's true that Git stores files, but it stores them in commits. And Git is not even about branches, although we use branch names to find commits.

The best answer to Q1 is that there's one branch name. How many branches there are depends on how we define the word branch. Some definitions will still get us 1 as the answer; another would get us 3 because there are three commits, hence three things that are at least potential branches; and another would get us "infinity" because once there's at least one commit, there are an infinite number of possible branch names. (Well, not quite infinite, because our computers are finite, but more than we want to bother trying to count.)

The best answers to Q2 and Q3, though, are just mu: files are not in branches, in Git. Files are in commits.

In the output above—if you repeat the commands, you'll get different commit hashes, so yours will be slightly different—the file file1 is in commit 5b25569. That commit is contained in branch master, but it is just one of three commits in master. The file file2 is in commits 5b25569 and 15f7d52, and both of these commits are in master; and README is in all three commits, including the initial root commit 7490a25.

So now we know the trick

The trick is that files are not in branches, but rather in commits. The commits themselves are contained in branches—meaning found by using those branch names—but the files are in the commits. "Later" commits may have more files, or fewer files, or just different files, than "earlier" commits. So, let's go back to the original problem statement now:

I have to checkout to master and pull changes,

Technically, you don't have to do any of this, you just want to. But let's assume that this is a good plan—and for the most part, it is. The next thing you should know, though, is that the phrase pull changes is as misleading as the question about which branch some file is in. You are talking about running git pull, and the pull command is just shorthand for running two other Git commands:

git fetch: this obtains any commits that are new to you;
some second Git command to incorporate those new commits into the current branch.

but in those changes, a few of the files I was working on have been removed and new files made.

Now that you know that files are in commits, the next key realization is that git fetch merely adds commits to your collection. All your existing commits remain intact. The second command you'll run could change this, depending on which second command you choose; that's where lal the complications come in.

I want to keep those files in my branch and push them to master again.

As long as you keep all your own commits, and all the earlier commits, you will be able to get to those files. The reason for that is simple: any commit, once made, is completely read-only. So the files saved inside a commit are saved forever—or at least, as long as that commit continues to exist.

How commits work

Now that you know that Git is all about commits, it's important to understand how you will use them and work with them. The next two things you should probably do at this point are run git log, and maybe git log --all --decorate --oneline --graph. See also Pretty Git branch graphs. Here, though, let's discuss what a commit is and does for you.

Each commit is numbered. These numbers are not simple counting numbers: we don't have commit #1, followed by commit #2, then #3 and so on. Instead, the numbers are huge and ugly, and impossible for humans to remember or deal with. The actual number of any one commit is simply a cryptographic hash of the commit's contents (as seen as an internal Git object), expressed in hexadecimal. It looks random, though it is not. Git guarantees that every commit everywhere gets a totally unique number.
Each commit stores two things:
- A commit stores a full snapshot of every file. The files are saved inside—or in a sense, outside—the commit in a special, read-only, Git-only, compressed and de-duplicated form. This takes care of the fact that commits tend to re-use a lot of files. In the example above, the README file was the same in all three commits. In fact, there was really only one README file-content ever saved; all three commits just referred to this same file (as a hash ID: f17a15bb99297d320e9a001a39052d43b17b2ce2).
- Each commit stores some metadata, or information about the commit itself. The metadata inside a commit contains things like the name and email address of the person who made the commit. It also contains some date-and-time-stamps, which help make sure the commit gets a unique hash ID. (Otherwise, if you used an empty commit as the first commit of every repository, all your initial root commits would have the same hash ID!)
Importantly for Git's internal operation, each commit stores, in its metadata, the hash ID of some set of earlier commits. (This set is empty for the very first commit, since it is the first commit: there is no earlier commit.) Most commits store just one previous, or parent, hash ID; we call these ordinary commits.

So, if we have a simple chain of ordinary commits, each one has the raw hash ID of the previous ordinary commit. That is, if the most recent commit has hash ID H, commit H has inside it the hash ID of some earlier commit. We say that H points to this earlier commit, and we can draw that, like this:

<-H

If we call that earlier commit G, we can add that in to our drawing:

        <-G <-H

Since commit G points to an earlier commit, if we call that commit F, we draw that in:

... <-F <-G <-H

and so on, all the way back to the very first commit (a root commit—that's why Git said [master (root-commit) 7490a25] initial way back when).

Branch names let us find commits

When we first start out, we have no commits at all:

(kind of an uninteresting graphic here)

We make our first, or root, commit, maybe with a README file like I did:

There's only the one commit, which doesn't point anywhere, so obviously we're using just that one commit. But we'll change that soon, so let's draw in the name master and make master point to A:

A   <-- master

Now let's make our second commit, B ([master 5b25569] add some files), which points back to A. When we do that, Git changes our name master to point to B instead of A, with B pointing back to existing commit A:

A <-B   <-- master

We'll repeat with another commit ([master 15f7d52] remove file1):

A--B--C   <-- master

We're getting a little bit tired so we stop bothering drawing in the arrows from commits to commits as arrows. They can't change after we make the commits: all commits are completely read-only, forever, so we don't really need to bother.

Note how the branch name advances as we go

We started with the name master pointing to A, when we made our root commit. Then we made commit B and Git updated our name master to point to B instead of A. Once we made commit C, C pointed to B and master pointed to C.

If we now decide that we want to get rid of commit C, we can use git reset --hard, which normally we wouldn't do because it throws some work away. But in this case that work is just "remove a file", so we might not mind. To make this git reset work, we have to find the hash ID of commit B. We tell Git to make the name master point directly to B, instead of C, and we get:

     C   [abandoned]
    /
A--B   <-- master

Commit C is not gone. It's still there, in the repository. It's just that if we ask Git to find commits, Git is going to start with the name master, then work backwards. We'll see commit B, then we'll see commit A, and then Git will stop because there is no earlier commit.

If we remember the hash ID of commit C, we can put it back with another git reset --hard, making the name master point to C again:

     C   <-- master
    /
A--B

and now we can draw them as a straight line again.

Commits are therefore the snapshots and the history

So that shows us the point of commits: they hold the snapshots of all the files, and because they connect to each other—though backwards only—they are the history. It also shows us the point of branch names: the name finds the last commit that is part of the branch. From there, Git works backwards.

Commits can be on many branches

If we have this:

A--B--C   <-- master

as our history so far, we can now make a new branch name, also pointing to commit C. Let's use the name develop:

A--B--C   <-- develop, master

To make it easy to tell which name we're using, let's attach a special name, HEAD, to exactly one branch name, like this:

A--B--C   <-- develop, master (HEAD)

This means we're using commit C through name master. If we run:

git checkout develop    # or git switch develop

now, we get:

A--B--C   <-- develop (HEAD), master

Nothing else changes: we are still using commit C. But now we're using commit C through the name develop.

This matters when we make our next commit. We modify some file, run git add, and run git commit and get a new commit D. New commit D points back to existing commit C, and Git stores the new commit hash ID into the current branch name, so now we have:

A--B--C   <-- master
       \
        D   <-- develop (HEAD)

Commits A-B-C were on both branches before. They're still on both branches now, but now new commit D is only on develop.

If we go back to master, create yet another new branch, and make yet more commits, we get:

        E--F--G   <-- feature (HEAD)
       /
A--B--C   <-- master
       \
        D   <-- develop

and so on. Checking out (or switching to) a branch name means: Remove all the files from the current commit, and instead put in the files from the commit pointed-to by the branch name, and make that the current branch. Making a new branch means: Make a new name, pointing to some commit. We pick which commit before we make the name, usually by checking it out first (so that we can see if it's the right one).

We're now ready for `git merge`

Let's suppose we have this going on:

          I--J   <-- branch1
         /
...--G--H   <-- master (HEAD)
         \
          K--L   <-- branch2

That is, we have three branch names, each of which points to a different "last" commit. The last commit on branch1 is J; the last commit on branch2 is L; and the last commit on master is H, which—along with all earlier commits—is on all three branches.

If we now run:¹

git merge --ff-only branch1

our Git will locate, in this repository, three commits. These three commits are:

The current commit. This one is easy to find, because Git just uses HEAD as usual. Since HEAD is attached to master, and master finds commit H, this finds commit H.
The other commit that we name on the command line. Since we said branch1, Git reads branch1 to see that it points to commit J.
Last—or in some sense first, because of the way Git does merges, but Git has to find the other two before it can find this most important commit on its own—Git uses these two commits to find the best shared commit. The best shared commit is a commit that is on both branches, and is better than any other commit also on both branches.

Git calls this best shared commit the merge base. Technically, the merge base is the commit that comes out of the Lowest Common Ancestor algorithm, but a good way to see it is that it's the commit "closest" to both branch tip commits. Here, that's just commit H itself, which is as close as you can get to commit H, and as close as any shared commit is to commit L.

When the merge base of a git merge command is the current commit, Git can do what it calls a fast-forward merge. This isn't really a merge at all: it's more just a git checkout, but it's a git checkout that drags the branch name with it. That is, the result of any real merge would just match commit L, as we'll see in a moment. So instead of bothering with a real merge, if you let it, Git will do this fake non-merge fast-forward thing instead.

The result is this:

          I--J   <-- branch1, master (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

Note that commit H is not abandoned! We still find commit H, but now we find it whether we start from J because of the name branch1, or whether we start from J because of the name master. Either way, Git steps back once to commit I, then again to commit H. So now, commits up through J are on both branches.

You should repeat that to yourself because it's big and important: Commit J used to be only on branch1. Now it's on both branches. The branches that contain some commit change over time.

Now we'll run a second git merge, this time using:

git merge branch2

When Git goes to do this one, it again locates the three commits: the current commit J, the other commit L, and the merge base. The merge base is the best shared commit, so, which commits are shared? It should be obvious from looking at the diagram: as before, that's commits up through and including H. The best one is commit H again.

This time, Git cannot take a short-cut. To do a merge, Git has to do a real merge.

¹The --ff-only is usually unnecessary. It tells Git that we want the fast-forward that Git uses by default. If you've configured certain Git settings, this --ff-only will override that configurations, which is why I included it here.

Merge is about combining work

We already noted that each Git commit contains a full snapshot of every file. There aren't any changes in any commit. There is only a snapshot, plus the metadata. To find changes, then, we have to have Git compare two snapshots.

We can, for instance, have Git compare the snapshot in commit H to that in commit I. Any files that are exactly the same are unchanged. Any files that are different have some changes, and Git can compare each file to see what changed. That's how Git shows us what changed in commit I: by comparing its snapshot to the snapshot in H.

The same goes for commit J, though we'll have to have Git compare its snapshot against the one in commit I. Put those two sets of changes together, and you get the overall change from H to J.

Git can take one small shortcut here: instead of comparing H vs I, then I vs J, Git can just compare H vs J directly. Whatever files are the same, those didn't change. Whatever files are different, Git can figure out what's different in those files. This is faster and easier, especially if commit J un-does any changes made in commit I. But in any case, that's what Git does. In effect, it runs git diff:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed

This tells Git what we changed in our master branch.

Then, Git does the same kind of diff, starting from the same merge base commit H, but to the other commit:

git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

Git now has to combine these two sets of changes.

That's what git merge is really about. We run two diffs—from the shared merge base to each branch tip—to see what we changed, and what they changed, and then we combine those two diffs. Or rather, we have Git do all that, and then we have Git apply the combined changes to the snapshot from H, the merge base.

By taking our changes and adding theirs—or equivalently, taking their changes and adding ours—we get both changes. We keep ours, and add theirs. When Git does this combining, it uses some very simple text-oriented rules: each change is just delete these lines and add these other lines. As long as some file that we and they both touched has these changes done far enough apart, Git will be able to combine them on its own.

If Git can't combine these changes on its own, Git will give up, stop in the middle of the merge, and leave us a mess to fix. But in many cases, Git can combine them on its own, and does. That's true even though Git has no idea what it's doing: it does not understand our files in any way. It's just following simple text rules. And yet, that actually works, quite often.

Having combined both changes and applied them to the snapshot from H, Git now makes a new commit, mostly as usual:

The snapshot for the new commit is the result of applying the combined changes.
The parent of the new commit is where things are a little odd.

Instead of just the one usual parent, Git adds a second parent to this new commit:

          I--J   <-- branch1
         /    \
...--G--H      M   <-- master (HEAD)
         \    /
          K--L   <-- branch2

New merge commit M points back to commit J as usual, but also points back to commit L. The result is that all the commits are now on master. Commits up through J are on branch1; commits up through H, plus K-L, are on branch2; and all commits up through and including merge commit M are on master.

Merging therefore makes commits "more reachable"

Merge commits, then, have this interesting side effect of making commits that were just on one branch, be on more than one branch. Even non-merge commits—from the fast-forward we did earlier—did the same thing: it added commit I-J to master.

Note that if we could do a fast-forward merge:

...--D--E   <-- mainline (HEAD)
         \
          F--G--H   <-- feature

we can tell Git: do a real merge, using git merge --no-ff feature. In this case, Git will find the merge base commit E, the current commit E, and the other commit H, and say to itself: I could do a fast-forward, but the user commanded that I do a real merge. So Git will dutifully run:

git diff --find-renames <hash-of-E> <hash-of-E>   # what we changed

to figure out what we changed. The output of this diff is ... nothing!

Git then does the other diff, to find out they changed, and combines the two diffs. The combination of "nothing" and "anything" is the anything, so the result of the combining is a set of changes that will turn commit E's snapshot into commit H's. Git will then make a new merge commit with the snapshot from H:

...--D--E---------M   <-- mainline (HEAD)
         \       /
          F--G--H   <-- feature

and make new merge commit M—I used that letter to stand for merge again, this time—such that it has E as its first parent and H as its second parent. The snapshot in new commit M is exactly the same as the snapshot in existing commit H, but since we told Git do not use the fast-forward short-cut, it made a real merge commit.

(You should do this—the make a real merge commit—if you want the hash ID of commit M to be different from the hash ID of existing commit H, and to have two parents like this, so that you can see that there was a merge. If you don't care whether the hash IDs match and whether there's a real merge, you can let Git do the fast-forward thing instead, which is easier on Git. The difference will be that some future historian, looking at your repository, won't know that you got there by using a fast-forward merge: if you make a real merge, that records the real merge, for that future historian to see.)

That was a lot of background; now we can get back to your question

I have to checkout to master and pull changes, but in those changes, a few of the files I was working on have been removed and new files made.

So, this means that you have, at the moment, your own branch name and your own commits:

          I--J   <-- somebranch (HEAD)
         /
...--G--H   <-- master

You're going to run git checkout master, which will do this:

          I--J   <-- somebranch
         /
...--G--H   <-- master (HEAD)

Then you're going to run git pull. This will:

Run git fetch, which will obtain new commits from some other Git repository:
```
          I--J   <-- somebranch
         /
...--G--H   <-- master (HEAD)
         \
          K--L   <-- origin/master
```
We haven't covered remote-tracking names like origin/master, but it will remember the hash ID of the newest commit you got from their master branch.
Run git merge origin/master, probably, depending on how you configured your Git defaults. Assuming it does, and that this permits and thus does a fast-forward merge, you will end up with:
```
          I--J   <-- somebranch
         /
...--G--H--K--L   <-- master (HEAD), origin/master
```
in your repository.

Note how none of your commits has changed. They literally can't. You will, however, want to do something with them.

The tricky part is now what to do with your commits, I-J here. You can merge or rebase. Merging is fundamentally simpler and is what we have already covered, so let's look at what this does. If you run:

git checkout somebranch && git merge master

you'll start with:

          I--J   <-- somebranch (HEAD)
         /
...--G--H--K--L   <-- master, origin/master

Your Git now has to combine the changes. You made some changes to some file(s), between your best shared commit H and your current commit J. They made some changes to some file(s) between H and L, and their changes include removing entire files, so that commit L lacks some files as compared to commit H.

Git will do its best to combine your changes with theirs. If you've changed the file that they removed, Git's attempt to combine changes will fail. Git does not know how to combine add this line, remove that one with remove the entire file.

If you have not changed the file that they removed, Git will combine do nothing with remove entire file and will remove the file. That's fine!

If the merge fails, Git will stop with a merge conflict. You will now have to fix up the result yourself. Whatever files you supply as the "correct result", Git will believe you.

If the merge succeeds, Git will go ahead and make a new merge commit:

          I--J--M   <-- somebranch (HEAD)
         /     /
...--G--H--K--L   <-- master, origin/master

In this case, if you want some file back, simply extract that file from an earlier commit such as commit J. Find the commit's hash ID and run, e.g.:²

git restore --source <hash-of-J> -SW -- path/to/file.ext

and Git has now copied the file from commit J to both its index / staging-area, and to your working tree. If your Git is old and does not have git restore, use the equivalent git checkout:

git checkout <hash-of-J> -- path/to/file.ext

which does the same thing (in old and new Git versions), and is shorter.³

If the merge does fail with a merge conflict, and Git stops and needs your help to finish the merge, you get to learn all about resolving Git merge conflicts.

²There are ways to do this without having to cut and paste hash IDs, but as a Git beginner, rest assured that cutting and pasting hash IDs is fine. I still do it myself sometimes; sometimes it's the easiest thing! Just run git log to find the right commit, then cut and paste.

³The git checkout method is just something of a bad habit, because git checkout has a "safe mode" and an "unsafe mode". The new-in-Git-2.23 git switch command always works in "safe mode"; the unsafe parts from git checkout are now in the new git restore command. That way you always know whether you're running a "safe" command (git switch) or an "unsafe" one (git restore). With the old git checkout, it's hard to know whether you're using it in safe mode or unsafe mode.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-09-7

Comments

0 comments

How do I force git to checkout the master branch and remove carriage returns after I've normalized files using the "text" attribute?

TOP Ranking

Article