Archive for January, 2009

Hashing Windows Files

Wednesday, January 7th, 2009

Last month, a student wrote to us at Syncplicity to ask a bunch of high level questions about our service in preparation for a class presentation.  I wrote a lengthy response that hopefully was interesting without divulging company secrets.  I want to share some of those details here and I’ll start with the basics: figuring out what files need to be sync’ed.

There’s a lot of proprietary code that determines exactly what files need to be uploaded and downloaded to sync a client.  Then, there’s more code to determine the best ordering for these operations.  At the heart is the SHA-256 hashing algorithm.  When Syncplicity notices a new or updated file, it first calculates a 256-bit hash on the file’s contents.  This is basically a unique fingerprint that we can use to identify the file no matter where it resides.  If I compute the hash on a file, change one byte, and compute the hash again, I’ll get a different answer.  That’s a critical detail.  Now the client has a way to uniquely identify every single file.

So, the client computes a bunch of SHA-256 hashes over all your files (which is why your CPU may kick up when you start Syncplicity for the first time) and then submits those hashes to the server.  The server then tells the client which of those files are already uploaded — the client just needs to upload the unique, non-uploaded files.  This is great because if you have the same photo on two computers, only one computer needs to upload that photo.  We can save space (and keep your quota lower) by only storing unique files once.  Very cool.

With these hashes, and a bit of magic, the client will perform the initial upload and downloads.  Then how do we tell if a file has a new version?  For that, we work off a combination of the file’s last modified date and the SHA-256 hash.  If Syncplicity notices the last modified date changes and the hash is different, it may be time to sync the new version.  But, if the date changes and the hash stays the same (that’s called a touch in the linux world), then it’s a no-op for Syncplicity.

The student writing us noticed that we can detect changed files pretty quickly.  On Windows, Microsoft exposes the System.IO.FileSystemWatcher class in the .NET Framework.  Basically, Syncplicity tells Windows that it’s interested in knowing about file system changes in certain directories.  When Windows sees a file change, it tells Syncplicity about it immediately.  That saves us from trolling your file system looking for changes.  And yes, Macs have something very similar.

Well, that’s it for now.  The next post will likely deal more with our architecture and how we leverage the magical “cloud.”

Happy New Year

Thursday, January 1st, 2009

It’s 2009, time for a new website and new ideas for the new year.  That got me thinking about the next year as a sprint (like part of an iteration in agile programming).  I’ve been listening to Ondrej (another co-founder @ Syncplicity) too much because the next question to pop in my head was, what exit criteria do I use to determine if 2009 was a successful year?

I’ve got the gym thing covered (thanks LaLanne Fitness) and my diet’s good too (I’m a pescetarian).  I’d like to reduce my dependency on caffeine sometimes, but living in San Francisco makes me think that’s heresy.  I’d also like to take more photos.  I have a fantabulous Canon 30D and I always enjoy snapping some photos.

Work-wise, 2009 should be a great year.  Since Fall 2007, Syncplicity has been a very exciting project for me.  We have a great product and we’re now well funded.  It’s been a challenge to get this point and there’s a long queue of really interesting work ahead.  I’m especially excited because not everything is Windows focused.  If you know me, you know that Windows is now relegated to VMs on my computers.

Happy New Year everyone!