Due to its purpose, it will not handle recursion and is not suitable for storing linked information.
Lines starting with the character # are ignored and can be used for comments.
%endText in between is not interpreted by the program in any way, even comments (starting with #) are passed thru.
object_definition : struct_name "deprecated" ";" object_definition : name_specifier ";" object_definition : name_specifier "{" field_list "}" name_specifier : struct_name name_specifier : struct_name struct_tag struct_name : identifier struct_tag : "@" identifier field_list : field_definition ";" field_list field_list : field_definition ";" field_definition : type_specifier field_name_list field_name_list : field_name field_name_list : field_name "," field_name_list field_name : identifier type_specifier : type_name type_specifier : type_name "[" "]" type_specifier : type_name "[" array_size "]" array_size : identifier array_size : decimal_numberIn this syntax, quote characters are just there to emphasise that the quoted strings are input tokens. You don't need actual quote characters for any tokens.
The given struct name and tag are used as follows in the output:
typedef struct struct_tag { fields } struct_name;In the input, the struct tag is written with an @ sign at the front, without any spaces between.
A type specifier specifies either a singular field, an array or a variable sized array. The base type can be a basic type or a type specified using an object definition. Here are the basic types:
Basic Type Name | Corresponding C Type |
---|---|
int8 | int8_t |
unt8 | uint8_t |
int16 | int16_t |
unt16 | uint16_t |
int32 | int32_t |
unt32 | uint32_t |
int64 | int64_t |
unt64 | uint64_t |
string | char* |
The header contains an enumeration which identifies the type of an encoded object.
Following that, a typedef struct is output for each object definition. Finally, all structs are combined into a typedef union. This type is used for encoding/decoding functions.
In the source file, you have a table describing the objects. This table contains information about all versions. Following that, encoding, decoding and helper functions are output.
In order to use a different version, you should call:
const szTable* szVersion(const szTable* table, int version);This will return NULL if the required version is not found. You can call this function using sztab as the first argument or any other szTable* typed value obtained from szVersion.
In the input file, when you use the directive %version, a new version of the protocol is declared. This new version inherits all objects from the previous version. Any object definition following the directive will be inserted into the new version of the protocol. If you want to deprecate any message, you may use the deprecated object definition as shown before:
object_definition : struct_name "deprecated" ";"From this version on, the given object will no longer be recognized.
uint8_t* szEncodePad (const szTable *table, szObject *obj, size_t hdr, size_t ftr, size_t *size); uint8_t* szEncode (const szTable *table, szObject *obj, size_t *size);The first function adds some header and footer space to the allocated buffer.
The return value is the allocated buffer. The allocated size is returned thru the size pointer. If the given object is not recognized or if there is a memory allocation failure, the return value is NULL.
In order to encode a struct, you need to set its _type field to the corresponding enum value. This needs to be done only for the top level struct passed to Encode since the types of lower level structs are inferred from the top level struct.
The encoder is capable of handling 0 sized variable arrays and NULL strings.
The encoded buffer contains the following (after any header space):
byte index: 0 1 2 3 4 5 6 7 8 9 .... T T T T S S S S D D D ...... DThe first 4 bytes encode the enum value corresponding to the given struct (copied from _type). The second 4 bytes encode the size of the data portion, as shown by D bytes above. Therefore, this size is 8 less than the encoded size.
All integers are encoded little endian.
The encoded buffer is independent of the given object. All data stored in the object is copied to the buffer. The object may be freed without affecting the buffer.
When you're done with the buffer, you may call free() to deallocate it.
szObject* szDecode (const szTable *table, uint8_t **buffer, size_t *length);The initial contents of (*buffer) are expected to conform to the format described above. 4 bytes of type, followed by 4 bytes of size and corresponding number of bytes of data.
The return value is NULL if there are any encoding errors or memory allocation failures. The _type field of the returned object may be used to tell which message has arrived.
The decoded object is independent of the decode buffer. They may be deallocated independently. Functions below are provided for your convenience.
void szFree(const szTable *table, szObject *obj); void szDestroy(const szTable *table, szObject *obj);can be used to destroy objects created by the user or by the szDecode function. These simply call free() recursively on all pointers.
When you provide objects you have created yourself, make sure that all pointers point to separately allocated addresses. strdup() is your friend. Also, the _type field in the top level object must be correctly set.
The second function doesn't free the top level pointer.
License, object definitions and enum specifications should reside in one file, which will be the protocol specification. This file should ideally contain all the versions. Having additional versions in additional files will necessiate that they are always given in the correct order to the tool.
Other specifications regarding code generation such as %extra-fields, source and header top/bottom code etc. should reside in a separate file for each program.
There should be some sort of version negotiation at the start of the protocol. Be careful to include messages related to this in the default version.
In order to develop genser further, you need to be aware that the source file also contains a shell script which is used to re-encode the library code into string literals within the rest of the code. These string literals are used to copy the library code to the output. If you make modifications to the library code, quit the editor and run:
$ sh genser.cThis will create a temporary tool in /tmp and update genser.c. The old version will be stored in genser.c.bak.
Within the library code, some prefixes are used to refer to functions or types. These are then replaced by the program during output.
Each integer is encoded little endian.
A string is encoded as a 4-byte length, followed by string data, including the null terminator. The length also counts the null byte. If length is zero, then the string is NULL.
A variable length array is encoded as a 4-byte number of elements, followed by each element.
Data is encoded in a depth-first fashion. If you have arrays, strings or variable sized arrays, data for these are encoded directly after the previous field's data. End of the encoded buffer corresponds to the end of the top level object.
Although the generated library has no memory leaks, the same can't be said for the tool itself. It never frees any memory. Normally this is not an issue since this is a build tool and the input size is not likely to be huge. However, properly deallocating stuff sometimes exposes bugs that are not normally encountered by chance.
For example, if you have two things that are (supposed to be) copies of each other, deallocating one should not create a problem for deallocating the other. If you never deallocate, you won't find this bug easily. On the other hand, valgrind will happily tell you that a pointer is freed multiple times if you do the deallocations.
Error handling is not tested neither in the tool nor the library. It does work for simple cases, but I expect some bugs to emerge later.
In szFree and szDestroy, the table argument is redundant. These functions could easily use the all-versions table to locate the requested object definition. Also, these functions should really return an integer, indicating success(0)/failure(non-0). Failure happens when the library is unable to find the object definition.
Some checks are intentionally omitted. For instance, if you have an object which claims to have N elements in a variable sized array, the library doesn't check whether the corresponding pointer is NULL or not. It simply goes ahead with the encoding. If it ends up trying to read from a NULL address, the program will naturally die with a segfault. This is intentional because this inconsistency is a serious bug and allowing Encode to treat it as a regular error would make it harder to find the bug.