It should be easy enough to de-duplicate instruments; there's a pretty well-formed equality condition between them I think.
lsdj2xml looks legit, but it hasn't been updated in a long time (personally, I wouldn't have chosen to write it in C, but that's me). If it works though, that's great.
I think the idea of .sav to MML and back again, while interesting, is a bit heavyweight for what we want to accomplish here (unless someone please oh please has an MML grammar, tokenizer, etc just lying around). I think that lsdj2xml has the right idea in compiling it back and forth from some intermediate representation and funging it in that way.
@gizmo, if you haven't already started working on this, would you object to my starting a project in Google Code and writing the whole thing in Python? Python makes me less swear-y than C does any day, esp. for things like this that aren't exactly performance critical.