Supporting Multiple Versions of a File Format

Posted on Saturday 14 April 2007

When writing software, you will often want to support multiple versions of a file format. Here is a quick guide that highlights some key concepts…

File IconGeneric File Header

  • Define a file header (meta information at the beginning of the file) format that does not change, such that you can always read the file’s type and version, for any version of the file. For example:
    • If your file is an INI file, create a section that looks like the following:[Header]
      File Type = “MyFormat”
      File Version = “1.0″
    • If your file is binary, use fixed width strings for file type and version at the very beginning of the file.
    • If your file is a CSV file, then make the first two rows store file type and version, such as the following:File Type, MyFormat
      Version, 1.0
  • Create a VI that can read the header (file type and version) from any version of your file — call it “MyFileType_ReadHeader.vi” (or similar).

Custom Control FileData Structure Type Definition

  • Create a Type Definition of your Data Structure – You should have a type definition of the data structure that you are saving to file. Your reader/writer should output/input this type definition.
  • Create a Snapshot of your Type Definition for Each Official Version – For each official version that your reader/writer will support, create a copy of the data structure and disconnect it (recursively) from any type definitions — this will be a “snapshot” of your type definition at the moment in time that you declared it to be a specific version. I’ll repeat myself: the snapshots should not be connected to any type definitions, since a snapshot should never be changed, once it is declared to be an official version.Note: The tricky part with this step is disconnecting a control from all type definitions, recursively. But, there are tools that can easily be made to automate this (download this example).

Read Write VI IconsFile Reader and Writer VIs

  • Your file reader and writer VIs should output the type definition – However, inside your reader/writer you should coerce the type definition to the snapshot of the latest version. Now, if you modify your type definition, your reader/writer VI will be broken. This is an indication, at edit time, that you need to create a new version (and another snapshot) of your file version.
  • Create a file reader and writer subVI for each version – You will need one subVI for each file version that handles converting the file into the data structure snapshot for that file version.Note: If you aren’t going to support a “save for previous” feature, you don’t need to be able to write every version.
  • Create a version upgrade subVI and a version downgrade subVI for each file version – For all versions, except the first version, you will need a VI that handles converting data between neighboring versions. Additionally, you will need to implement some sort of upgrade path (maybe as a state machine) that handles calling the upgrade VIs to successively convert the data from older versions to the latest version.Note: If you aren’t going to support a “save for previous” feature, you don’t need to be able to downgrade to every version.

[UPDATE 2007-04-17] Example Download: “Disconnect Control from Typedef Recursive.vi


5 Comments for 'Supporting Multiple Versions of a File Format'

  1.  
    April 17, 2007 | 9:12 am
     

    Jim,
    Thanks for the great article. There are some good ideas there for maintaining a well-documented, robust file-format chain.
    I’ve used .ini files to create less-rigorously maintained, flexible file formats. New fields can be added without breaking the reader/writer, and default values, if properly managed, can make old files work fairly well with the new format. Fields can be removed from the format, with no harm done other than the unused fields cluttering up the .ini files. Like I said, it’s a less rigorous approach, but it makes it really easy to change file formats on quickly-evolving R&D projects.
    Now I’ll start thinking about combining your suggestions with my current practices to get some of the best features of each.

    Cheers,
    Dave T.

  2.  
    April 17, 2007 | 10:48 am
     

    Dave,

    You’re welcome — I’m glad you liked the article. I agree with you that the INI files are extremely flexible, and are a perfect compliment to the technique described in the article. Quickly evolving projects don’t necessarily benefit from a rigorous file versioning scheme, but with the right tools and templates for implementing such a scheme, the small cost to implement it can be well worth the reward.

    Thanks,

    -Jim

  3.  
    Aristos Queue
    April 17, 2007 | 4:31 pm
     

    I also recommend defining some syntax that means “comment” in your file which can be nested somehow (so some symbol that starts the comment and another that ends it such as /* and */ in C++ code). This allows for you to have different versions of your file reader/writer that can embed new features and still produce an older style document that older readers can load. For example, you might use /** **/ to encode some actual data that only a more recent reader can read, but an older reader will skip over it as a comment. Otherwise every time you want to add a feature or increase the data recorded in a file, you must bump your file definition version and then you’re stuck being unable to load the file in an older version.

  4.  
    April 18, 2007 | 8:44 am
     

    Aristos Queue,

    Thanks for the tip. Yes, backwards compatibility of newer file versions with older readers is a very useful feature. I can think of a few ways to achieve this. For INI files, don’t remove or rename sections or keys. For CSV files, access columns by name (header), rather than by column index, and don’t remove or rename columns. For XML files, don’t remove or rename entities.

    Thanks,

    -Jim

  5.  
    April 26, 2007 | 11:08 am
     

    Good advice Jim. You can see over the years, we’ve added some file formats to LabVIEW (http://ideasinwiring.blogspot.com/2005/11/storing-and-retrieving-data.html) that make it hard for customers to follow your rules and that can cause problems.

    I’ve seen folks hang themselves with the Datalog file (http://www.kip.uni-heidelberg.de/fp-computer/LabVIEW/FP_LabView/html/Writing_Datalog_Files.html) by writing a file with one particular type definition (i.e. cluster) and then can’t read the file any more when they change clusters without understanding it will affect backward compatibility.

    The newer file formats like .TDM (http://zone.ni.com/devzone/cda/tut/p/id/3542) are much more maintainable because they store the type definition (typically called a ’schema’) in the data file so you can’t lose it and so it can be browsed by an external application.

Leave a comment

(required)

(required)


Information for comment users
Line and paragraph breaks are implemented automatically. Your e-mail address is never displayed. Please consider what you're posting.

Use the buttons below to customise your comment.


RSS feed for comments on this post |

 

Bad Behavior has blocked 772 access attempts in the last 7 days.