The tool takes a description text file and generates code to interact with files of the format described therein. Generated code is C and porting it to another language wouldn't be very easy since the associated structs are also generated by the tool.
Files created using the generated code grow and shrink as needed. You can look at the allocation algorithm for more details.
Each file also contains an index which is loaded whole to memory when the file is opened. This makes it possible to have very fast lookups for reading.
The tool generates come C code to be used in your program. Necessary types are also generated by the tool, but you can modify them using some directives.
An object is a top level data structure which can be directly read or written. All data you read or write must be part of an object. For example, you can't simply read an integer value with a given identifier. The integer must reside inside an object.
There is no top-level 'main' object. All objects are equal in standing and are handled independently of each other. The object index is not visible to the user. If you want to maintain a list of things, you should do it yourself by storing some pointers in an object.
When you run the generated code, almost all functions return some error indication. The return value is non-zero if a function fails. Errors within the generated code are usually non-recoverable. Either memory or disk should fail for an error to occur. Another source of errors is a corrupt binary file. None of these can be recovered from gracefully. Therefore, in case of a failure within the generated code, the best option is to make a new file and store as much as you can into the new file before abandoning the current one.
The main input file describes the object types to be used for the file format. It can also contain other things like directives etc. A single main input file is all you'd need to generate a given format code.
If present, auxilliary input files modify the description given by the main input file. These files can not contain new object descriptions but can modify the existing ones in certain ways. For example, an auxilliary input file can modify the 'raw' fields in an object. Indeed, when an auxilliary input file is given, all raw fields in all object definitions are removed from the main description.
Another usage for auxilliary input files is to modify the compilation environment. In an auxilliary input file, you may define new code to be inserted before and after the source output, you may change the library prefix etc.
The main usage pattern for this is follows. The main input file is supposed to be a 'pure' declaration of the file format, much like a grammar description. Following this main input file, you can have an auxilliary input file which adds raw fields for use in a specific program, along with compilation specifics such as header information, output file names etc.
This system makes it easier to share one file format description among several programs which operate on the same file format.
Name | C Type | Name | C Type |
int8 | int8_t | unt8 | uint8_t |
int16 | int16_t | unt16 | uint16_t |
int32 | int32_t | unt32 | uint32_t |
int64 | int64_t | unt64 | uint64_t |
flo32 | float | flo64 | double |
str8 | uint8_t* | str16 | uint16_t* |
str32 | uint32_t* |
You can also define arrays and dynamic arrays. Arrays are multi-dimensional C arrays with fixed dimensions. Dynamic arrays are one-dimensional arrays with variable size. There are some restrictions on what you can store inside an array element. An array or a dynamic array can have the following as its elements:
Speaking of pointers, you can also have pointer fields in objects or structs. These can only point to objects. No double pointers, no pointers to anything else.
Objects and structs are containers in which you can store fields of the above mentioned types. These two types are identical except for one fact: Objects are top level structures which can be read and written independently. They have their own object identifier and read/write functions are generated for each object.
On the other hand, structs can only be handled as part of another object or struct. No read/write functions are generated for structs and they are not stored in the object index.
Both types are represented by C structs. C structs generated for objects have an extra field in them, called "_objid".
In an input file, you can specify object and struct declarations, directives and C code.
object <name> <field declaration> <field declaration> .. <field declaration> <raw field> <raw field> .. <raw field> endA 'raw field' is just a regular C struct member declaration. Raw declarations aren't processed by the tool. They are simply passed along to output. The syntax for a raw field is:
raw <C declaration>Normal fields are declared using the following syntax:
field <name> <type>If the type is a basic type, then the name in the first table is used in the expression. Other cases are listed below:
array(*) <array element type> array(<dim1>,<dim2>, ... <dimN>) <array element type> ptr <object name> <object or struct name>Array element types are explained in the Types Section.
src_top #include "myincludes.h" end_src_topIn the case of src_top, the given C code replaces the #include directives for the generated code. This is a good place to introduce your platform specific #includes.
int PFXopen(const char *fn, int mode,bff_t **R); int PFXcreate(const char *fn, bff_t **R); void PFXclose(bff_t *bf); int PFXerrno(bff_t *bf);For PFXopen(), mode is 1 if you need write access and 0 otherwise. PFXopen() and PFXcreate() return non-zero if the operation failed. PFXerrno() returns the latest errno.
For each object type OBJ, you get one struct declaration and two functions:
typedef struct { < fields go here > } PFXOBJ_t; int PFXread_OBJ (bff_t *bf, uint64_t objid, PFXOBJ_t **R); int PFXwrite_OBJ(bff_t *bf, PFXOBJ_t *obj);In read(), the object and its fields are allocated using malloc(). When you create a new object, it's important to set its _objid field to 0. This way, it will get a valid object identifier when it's written.
MAGIC : 16 bytes FILESIZ: uint64 NEXTID : uint64The file header is padded with zeroes to size 128.
All data, including header and meta-data is written in litle endian order. For example, FILESIZ is an 8-byte unsigned integer written with the least significant byte first. All file offsets are relative to the beginning of the file.
In the file header, MAGIC is any 16 byte sequence of bytes. It's not interpreted in any way. FILESIZ is the offset of the object index. This is also where the data section of the file ends. NEXTID is the next available object identifier. These identifiers start from 32. Values below 32 are reserved.
At the end of the file, we have an object index. This index holds the positions of objects as well as free areas. The following is the format of the object index:
NENTRIES: uint32 <entry1> <entry2> .. <entryN>Each entry has the following information:
TYPE : uint32 OFFSET: uint64 IDENT : uint64TYPE is 0 for free areas. Otherwise, it identifies the type of the object. This is just an integer assigned by the system. The OFFSET is the offset of the object or free area within the file. IDENT is the object identifier. If the entry refers to a free area, then IDENT is the size of the free area.
Rest of the file consists of blocks. Each block has the following header:
SIZE : uint64 PREV : uint64 NEXT : uint64PREV and NEXT fields form a doubly linked list. These fields are meaningful only for blocks that are part of objects. For free areas, these are ignored.
The SIZE field tells us how big the block is. This size includes the size of the block header.
An object may be split into several blocks. For all blocks except for the last one, the amount of data stored in a block is the same as the block size minus the block header size. The last block may contain less data than indicated by SIZE. This happens when freeing the remaining area would result in a too small region which wouldn't be useful for anything.
For free blocks, only the SIZE field is meaningful, which stores the same value as the associated index entry does.
As mentioned above, an object may span several blocks. The first block for an object contains the object header followed by some object data. The remaining blocks contain only object data within their payload section. An object header is:
SIZE : uint64Type and identifier information is already present in the index, so they are not repeated here. SIZE doesn't include the size of the object header. It's simply the size of the object data.
The object data can contain aggregate data such as strings or dynamic arrays. For these types of data, the number of elements is given before the elements themselves. For strings, the terminating null is not included in this count. The count is an unsigned 4 byte integer.
Arrays are another form of aggregate data. Dimensions for arrays is not stored on disk since they are already known by the generated code. Elements of an array are stored row-first.