This post is in Draft Mode - it will not appear on the site or in search results

Unwrapping git to treat SQL files as text

I got this error message when trying to compare two plain text sql files:

"Binary files a/person.sql and b/person.sql differ"

Hold Up! These aren't binary files. To prove it to myself I opened them up in notepad++. "See!," I said to my rubber duck - they're just plain text characters.

Where I went next took me on a 3 hour journey of unicode encoding and git internals.

These aren't the meat and potatoes of a developer's life, but they compose the building blocks of literally almost every ounce of output delivered, so they're worth understanding, at least nominally.

The Problem #

First let's look at git to see where the problem begins.

Every time git stores a file, it has to undergo a process of binary detection to determine whether the file is worth diffing. It doesn't get to make this determination merely because of the file extension because multiple vendors can use the same extension and it doesn't have a registry of every possible filetype anyway.

[Instead, git looks for] a NULL (0) byte somewhere within the first 8000 characters of the file. Typically, that happens because the file is being saved as something other than UTF-8. So, it's likely being saved as UCS-2, UCS-4, UTF-16, or UTF-32. All of those have embedded NULL characters when using ASCII characters
- John Szakmeister

Sure enough, if we look at the "binary" sql file, we'll see it's saved as UCS-2 w/ BOM. And if we look open it up with a HEX editor, we can see multi-column null characters.

UCS-2 w/ BOM

Compare to what happens when we convert it back to UTF-8:

UTF-8

Possible Solutions #

Landscape

Show each solution with across different git tools:

# get all files
$baseDirectory = "H:\stash\db_sphinx\SQLCode"
$files = Get-ChildItem -Path $baseDirectory -Recurse *.sql

# using WriteAllLines
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$files | ForEach {
  $FileContents = Get-Content $_.FullName;
  [System.IO.File]::WriteAllLines($_.FullName, $FileContents, $Utf8NoBomEncoding)
}

References #

Git Binary/Text Files

Update Encoding

SQL Files Binary in Git

Git attributes

Github