Binary file localization can be very challenging. This is because the internal format of a binary file can be almost anything. Most binary files contain a header following one or more records that contains the data. The actual format and size of header and records blocks depend on the file format. Usually they contain number, strings and binary data. There are dozens of ways to write a number into a file. There are several different number types such as integers, floating point numbers and fixed point decimal numbers. Integers can be stored in 1, 2 or 4 bytes with or without sign. Some formats uses little-endian byte order. Some other uses big-endian byte order. There are several ways to encoding other number types as well. Strings also bring dozens of different ways. The length of the string might be before the characters or the characters might contain terminating null character. There are several string encoding such as code pages, UTF-8 and UTF-16. All together there can be almost unlimited amount of different binary formats. In order to cope this Sisulizer lets you define the binary format by specifying the structure of the file.
The structure of a binary file is:
There is an optional header following zero or more records. Header and and each record contain one or more fields.
Each field contain one data such as integer or string value or just array of bytes.
In addition of header and record a binary definition specifies some common options of the file.
Byte order (endianess) specifies in which order data containing multiple bytes is written. Possible values are:
|Little-endian||The least significant byte value is stored at the memory location with the lowest address, the previous byte value in significance is stored at the following memory location and so on.|
The most significant byte value is at the lowest address.
String encoding and string length specify what is the default string encoding. When you add a string field into header or record you can either specify a generic string field or encoding and length depend string field. The difference is that generic string always uses the default string encoding. Specific string fields use hard coded string encoding. If you binary file encodes all string values in the same way it is better to use generig strings. If the file contains two or more different string encodings you have to use specific strings or you can use generic string for the most common field and specific string for other fields.
Possible encoding methods are:
UTF-8 strings are used.
UTF-16 strings are used.
|Windows code page||Strings are encoded with Windows (ANSI) code pages.|
|ISO code page||Strings are encoded with ISO code pages.|
|Mac code page||Strings are encoded with Mac code pages.|
|OEM code page||Strings are encoded with OEM (DOS) code pages.|
|EBCDIC code page||Strings are encoded with EBCDIC code pages.|
|Other code page||Strings are encoded with other code pages.|
Possible length methods are:
|Null terminared||A null characters is written following the last string character.|
|Preceding length byte||
The length of the string in characters is written in the preceding byte of the the first character. Maximum string length is 255 characters.
|Preceding length word||
The length of the string in characters is written in the preceding word of the the first character. Maximum string length is 65535 characters.
The length of the string is fixed. Some files using fixed string length contain the actual length in the header. If this is the case add a string length or string size field into the header definition. If header does not contain the string length specify the size value.