Other LP tools have the notion that, what the programmer writes is a source code for both the document and the program, and these need to be extracted separately (hence the "weaving" and "tangling"). This causes the source code to be strictly structured (with sections, list items etc) and cluttered with markup commands which are useful only for generating a document.
However, these documents aren't the thing which is most frequently used by a programmer during his daily work. He works much more on the source code itself. Therefore, the source code itself must be easy to read and shouldn't disturb a reader's focus with markup commands.
Another point I have noticed is that, typesetting doesn't improve much in a well written text. I have read many books in plain ASCII format (thanks to Gutenberg Project) and haven't been disturbed by the fact that they aren't typeset with lots of different fonts or that they aren't paginated.
Based on these points, I concluded that the best literate programming tool would let the programmer write a document in plain ASCII and then the tool would figure out the code from the document. The source text will normally need no further processing: it's the final product in the documentation branch. t2c is an implementation of this idea.
$ makePut the resulting binaries somewhere in your $PATH. I used the following platforms so far:
The -f switch causes t2c to force all writes. Normally, t2c doesn't touch an output file if the write operation wouldn't change the file. This enables use of t2c with the 'make' program to re-build only the changed files after each modification.
When given the -f option, all output files will be touched even if the write operation wouldn't change anything. This can be used to update the modification times of output files.
All other arguments are taken to be names of input files. t2c processes the input files in the given order.
t2c can not process the standard input stream. All input must be
stored as files and then given to t2c as command line arguments.
An input file consists of blocks. There are two kinds of blocks.
You can think of a section as an in-memory file. When you redirect
some text into a section, the text is put into the in-memory file.
After this, when you declare a position for the section, the contents
of the in-memory file is inserted at that position.
Before going into more detail, let's look at a simple input file:
When t2c processes the contents of "file.out", it sees that the section
'B' is needed. It finds that section from its internal list of sections
and then inserts the body of it in the current position. At this point,
"file.out" buffer contains the following:
Let's now get into the details. As previously stated, a file consists
of blocks. Other commands are embedded inside the contents of blocks.
A block starts with a redirection command and ends at the next
redirection command or at the end of input file. The text between the
command and the end of the block is the 'body' of the block.
It's an error to start a file without a redirection command. So, the
first line in an input file must be such a command. See todo.
A command consists of a single character command code and a command
argument. The command code must be at the first column of a line,
without preceding whitespace. The argument is the rest of the line
until the newline character. There is no way of continuing a command
to the next line.
The body of a block starts after the newline terminating the associated
command. Therefore, you will need an empty line after the command if
you wish to insert a newline before the body.
The argument of a command is usually normalized before being passed on. The
normalization process converts all whitespace and control byte sequences
to a single space character (ASCII 32), then removes the leading and
trailing spaces.
The following are the commands recognized by the program:
The argument to an append command can not start with an asterisk (*)
or a bang (!) sign. These are reserved for template usage. A sole
dot (.) character as the section name signifies a document section,
which is treated specially. This section name (a sole dot) should
be exclusively used for document blocks.
This command has two variations: numbered append and PREV append.
Note that you can't use trailing numbers in Declare commands (:).
For instance:
Input Files
t2c was designed to work with ASCII files. UTF-8 should also be
fine. Just make sure you don't use any non-ASCII whitespace characters
in t2c commands.
Text in redirection blocks are put into the given section. Text in
file blocks are put into the given file. Both redirection blocks
and file blocks may contain section declarations. What this means
will be evident later.
+ A
Text to be put in section A
+ B
Section B header
: A
Section B footer
> file.out
File header
: B
File footer
Here, we have 3 blocks. Two sections called 'A', 'B', and a file block
for "file.out". After reaching the end of the input, t2c starts to
emit output files. In this case we have only one. If there were many,
they would be processed in the order they were given.
File header
Section B header
: A
Section B footer
File footer
Now, it sees that the contents of section 'A' are needed. It searches
for the section and inserts the contents of section A.
File header
Section B header
Text to be put in section A
Section B footer
File footer
This is the final contents of the file. Since there are no more files
to output, the program exits with success indication.
The append command has some variations as well, we will see them in a
moment.
The Append Command
This command (+) takes a section name as its argument and then puts the
body of the block at the end of the given section. If there are multiple
appends to a given section, they are put into the destination section
in the order they are seen in the input.
Numbered Append
Normally, the append command adds the body to the end of the given
section. However, if the section name ends with an integer, something else
is done. When the argument of an append command ends with an integer, the
integer is removed from the argument but it's remembered. After this,
the body of the append command is inserted into the corresponding
section. When doing so, the integer is used as a position indicator
much like line numbers in some languages. If there is another append
command for the same section but with a smaller integer ending the
command argument, then the body of that command will precede the body
of this command. For example if you have:
+ A 100
Body 1
+ A 200
Body 2
+ A 50
Body 3
The contents of section 'A' will be:
Body 3
Body 1
Body 2
If there is an append command for the same section without an integer
suffix, then that command works normally, appending to the end of the
section, after all numbered bodies.
: Foo 300
is an invalid usage. Please see here for more details.
Anyway, This feature is particularly useful for writing type definitions in the document order but having them output in compiler order. Ditto for variables, functions without prototypes etc.
A file command can be given multiple times for the same file. If this is the case, the later command will append to the file rather than re-opening it.
This command can take more than one argument. The first one is the file name. The others are options about the file. Two options are recognized:
The declare command is executed after all append commands have been
executed. Therefore it isn't necessary to do all definitions related
to a section before you can insert it somewhere.
The declare command can't be used with a section name ending with
an integer. For example, if you have:
The body of a filter
command is fed to the standard input stream of the executed program
and is replaced by the contents of its standard output stream. If the
external program fails, t2c will fail as well. The standard error
stream of the executed program is redirected to somewhere else for
successful invocations. Therefore, you won't see warning messages
if an external program executes successfully with the given input but
gives diagnostic messages on the standard output stream.
The filter command can also be used in a nested fashion. If this is
the case, the commands are processed from inside out. First, the
innermost command is executed, then the one enclosing that etc.
For instance, this works properly:
If you need to pass such delicate arguments to an external command,
it's best to put the invocation in a shell script and do the quoting
there. The quoting mechanism of t2c is not very robust and is designed
to be used only in the simplest of cases.
As a final note, the filter command uses execvp to execute the external
program. Therefore, shell features such as environment variable expansion,
home directory expansion, globbing etc. are not available. If you need
these, put them in a shell script and execute that from t2c.
Within template code, you can have variables which will be replaced by
some values later. The system also includes a couple of macros with
arguments.
The template feature also lets you import a library in private or
public scope.
There are some builtins to help you turn some sections of your code on
and off based on variable settings.
Such a variable is automatically assigned a value by the system.
This value is suitable for use as a C identifier and has the
general form _YQRZnumber. So, if you see such nonsense in your
output, you probably capitalized some variable name, like $Gain
or something.
When you do that, the public code sections are output at the same
places as their private counterparts. For instance, public prototypes
will be placed just before the private prototypes etc.
In addition to this, the variable static will be set to the
value static. In public scope, this variable is set to an
empty string. This means that, you should write your public functions
and public prototypes with the prefix $static, as follows:
For the instantiated code to appear in your C files, you need to
provide locations for different parts of the code. There are two
ways of doing this.
First, you can use the IFACE and CODE variables to tell the program
where the corresponding elements should go. Public definitions, types
and prototypes go in to the section given by the IFACE variable.
The rest goes to the place defined by the CODE variable. For instance:
The second method is to micro-manage where each single part goes.
To do this, you need to specify section names for each used
template section. The variable names are the same as the corresponding
codesection identifiers explained before. Here is an example:
Another warning is given when any empty sections are found.
If you have
You can declare a section in multiple spaces to repeat the content.
This could be useful for making two slightly different programs
sharing some of their private code.
If you want to include code samples within documentation sections,
it's a good idea to write them all in uppercase in order to tell them
apart from normal code sections.
When you write a program using literate programming techniques, it's
good practice to concentrate on the document and make it as clear as
possible. In my case, this causes me to interrupt function definitions in
the middle and explain something. When that happens, I lose focus and I
sometimes omit some critical piece of code from the interrupted function.
Therefore, I now try to never interrupt a function with comments. Instead,
I try to keep the functions small, less than one screenful if possible.
Here is a list of things that could be nice to have.
An input file must start with a block command. It's an error to
put anything before a block command at the top of the file since
the program then doesn't know where to put that text. I should
probably allow this for empty lines.
It would also be nice to be able to print t2c documents. Printing them
as plain text files is possible, but those would not be so easy to read
on paper. It would be nice to have a table of contents, section titles,
indices, page numbers etc. Also, I could format code and comments
differently to make them more pleasant to read.
Speaking of formatting, it could be nice to have an HTML formatter
for t2c input files. This could hyperlink section names to append
commands, format the code parts differently etc.
I shall mention the prototype generator and other tools in this document.
The code is a mess since its original form was just a hack to see whether
something like this could be useful. Then it grew bigger and I have many
things that could go in properly named functions. I should clean it up
some time.
This document also needs touch-up. A proper man page should be written,
showing the synopsis for each command along with short descriptions.
A proper regression test suite shall be made. I did only very basic
testing and can't be confident in the program.
After parsing, for each append-part, we find the block which has the
same name as the part's target. If not found, it's created. Then,
the contents of the part is dumped into this block.
Finally, we process the output files. When a declare command is seen
within the contents of an output file, the corresponding block is found.
The contents of the block are output at that position.
Lines of text are represented by the line_t object. This object
contains a field called kind. This field determines the function
of the line. If it's one of + : < > then it's taken to be
a command line. Otherwise, it can be T, Y or Z.
The program reads a set of function definitions from the standard input
and prints their prototypes to standard output. Since this program doesn't
parse the C code at all, it looks for a specific pattern to do its job.
The program simply replaces text in top-level {} pairs with semicolons.
For example
Another consequence of the program's algorithm is that you can't have struct
declarations inside parameter declarations, but who does that?
Finally, the function definitions must not be generated by CPP macros. Since
t2c.proto doesn't try to interpret the meaning of its input in any way, there
is no way it can generate a declaration when the function is specified like:
Speaking of directives, since the program doesn't process them at all, it
can be fooled by inactive '{' or '}' tokens. For example, the following
will break:
There are no recognized command line options or arguments. The regular use
is as shown in the previous example:
Below are the links to yet another companion program, which worked
something like the template feature. It's called t2c.ar.
Here are the sources for t2c.ar and
here is a document describing the code
within it.
+ Types 100
stuff
and later
> file.c
A
: Types 100
B
you will end up with only
A
B
in file.c. This is because the append command removes the integer suffix
from its argument and there is no longer a section named 'Types 100'
when the declare command runs. Therefore, nothing gets emitted for the
declare command with argument "Types 100".
The Filter Command
The filter command (<) filters its body using the given program
and inserts the output of the executed program in the output stream.
This command is not a block command. i.e. it doesn't end a previously
started block command. However, it does have a body which specifies
the text to be input to an external program. End of the input is
marked by an empty filter command, a '<' character on a line by
itself. For example:
+ A
< perl
print(" hello world!\n");
<
will put the line
hello world!
in section A. The filter command gets executed after both the append
and declare commands. Therefore, it's not possible to execute append
or declare commands output by an external program. For instance:
< perl
print(": Types\n");
<
Would insert the line
: Types
into the output stream literally and no processing for this line would
be done since the program is way past that stage.
< perl
< perl
print("print(\"hello\");");
<
<
It's also possible to pass command line arguments to external programs.
These arguments are seperated by whitespace characters. If you wish
to include space characters in the arguments, you shall enclose the
corresponding arguments in single quotes. This also means that if you
have single quotes surrounding the arguments, they will be removed.
There are no other escape encoding mechanisms. For example, you can't
have an argument which contains a single quote character.
Templates
This is a new feature in version 1.5. You can write template code which
can be later customized to specific situations. This is done by a little
bit of macro processing.
Defining Templates
In order to define a template, you write append commands in a specific
form. Here is the general syntax for it:
+* templatename.codesection
.. code ..
Here, templatename is the identifier for the template. It should
be a C identifier, but that restriction is not enforced yet. The
codesection identifier should be one of the following:
Within the template code, you can use variables to generate new
symbol names based on the settings of the library user. For instance:
void $pfx_print();
can generate something like
void foo_print();
if the user has set the variable pfx to value foo. Each variable
consists of a dollar sign, followed by one letter and
optionally more letters and digits. The underscore sign can not be part
of a variable name.
The expressions here can be single variables or a combination of
builtin macros. For instance, if you want to include some code
only if both A and B are defined, you can do this:
$def($def($B)($A)) (conditional code)
If you want to make an OR operator, you can simply write your
expressions side by side:
$def( $A $B ) (conditional code)
There are also generated variables. If the variable name
looks like $Gxy where x is a lower case letter and y is an
optional sequence of letters and digits, then the variable is
a generated variable.
Using Templates
In order to instantiate a template, you need to make an append
command with the following form:
+! templatename
var1 value1
var2 value2
..
The variables are given without the dollar sign. Values are optional.
If some variable is used only for deciding whether some code is to
be output etc, then you can omit the value part. The system will
automatically assign the value 1 to the variable. Leading
and trailing spaces are removed from values. No more processing
is applied to values. Each variable assignment spans a single line,
there is no way to make a line continuation.
Scope and Code Placement
Templates are instantiated in public scope by default. You may
override this by setting the scope variable to private.
+* map.public_functions
$static $pfx_new()
{ .. do something .. }
This makes it possible for the public function to have the static
storage specifier in case the user imports the template in private
scope.
+! map
IFACE Map Interface
CODE Map Implementation
could be used to relocate the corresponding code into
sections marked as ': Map Interface' and ': Map Implementation'
respectively.
+! map
public_definitions Map Header
public_types Map Types
private_variables Map Variables
..
You can actually skip some, there is an amount of leniency here.
However, you shouldn't use this unless you absolutely have to.
Errors and Warnings
t2c gives a warning when a section is not output into any file.
This prevents code from being lost inadvertantly. Document sections
are marked with a lone dot and they don't generate this warning
if not output to a file.
: Pretty Functions
and there is no relocation to 'Pretty Functions', then t2c will give
another warning.
Some Tips
It's OK to not declare some sections. For example, document sections
need not be output to any file since the source code is already the
document. Therefore, I use very simple section names for documentation,
such as a lone dot, or a minus sign.
An Example
t2c requires commands to start at the first column so
be careful if you copy-paste the example.
+.
This is an example file playing a game of 'guess the number'. In order
to have some amount of code to discuss, let's make it a non-standard
game which cheats and changes its secret number every turn to make it
hard to win. Of course, this needs to be done in a way consistent with
previous guesses and responses to make sense.
Let's start with the output file:
> guess.c
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
: Variables
: Functions
+.
This is pretty much how I use t2c to write files. I put nothing in the
body of the file command other than trivial stuff such as headers etc.
I have two sections here, since it's a simple program. More complex
programs typically have type sections, prototype sections etc.
Here is the main function.
+ Functions
int main()
{
int guess;
int turns;
+.
Since we're trying to avoid the user from finding out our secret number,
we're going to change it constantly. In fact, we don't even need a
secret number in the first place, just the boundaries established by
previous guesses.
+ PREV
int upper=100, lower=1;
+.
Did you notice that I'm using a single dot as the section name for
comment blocks? This is so, because I won't be writing the comments to
any file at all.
Let's move on with the problem at hand.
+ PREV
: Initialize
for(turns=0;;turns++) {
guess= get_guess(turns);
if (guess==0)
{
printf("My number was %d. You made %d guess%s. Good day.\n",
(lower+upper)/2, turns, turns>1? "es":"");
return 1;
}
if (found(guess, &lower, &upper))
{
printf("Yes! My number was %d. You "
"found it in %d guess%s.\n",
guess, turns, turns>1? "es" : "");
return 0;
}
}
return 1;
}
+.
As you can see, the PREV append command is not so useful if you use it
to split up a function. It's best to write whole function in one go,
and then surround it with comments on both ends. For that to work well,
the function needs to be small. I did it here just to demonstrate PREV.
So let's implement the get_guess function.
+ Functions 100
int get_guess(int turns)
{
int guess;
if (turns)
printf("You made %d guess%s so far.\n",turns,turns>1?"es":"");
printf("What is your %sguess?\n", turns? "next ": "");
scanf("%d",&guess);
return guess;
}
+.
I added an integer suffix to the section name so that this function gets
emitted before main(). I do this because I don't want to write function
prototypes. This kind of tracking dependencies becomes tiresome after
some point. There is another program distributed along with t2c, called
t2c.proto. This program helps generate prototypes automatically.
Now, let's proceed with the 'found' function. This function simply
adjusts the upper and lower bounds and tells the user that he was
unsuccessful with his guess.
+ Functions 100
int found(int guess, int *lower, int *upper)
{
int secret_is_lower;
int valid_guess= 0;
if (guess<*lower) secret_is_lower= 0;
else if (guess>*upper) secret_is_lower= 1;
else {
valid_guess= 1;
if (*lower==*upper) return 1;
else if (guess-*lower>*upper-guess) secret_is_lower= 1;
else secret_is_lower= 0;
}
if (secret_is_lower) {
print_message(guess, secret_lower_message,
sizeof(secret_lower_message)/sizeof(char*));
if (valid_guess) { *upper= guess-1; }
} else {
print_message(guess, secret_higher_message,
sizeof(secret_higher_message)/sizeof(char*));
if (valid_guess) { *lower= guess+1; }
}
return 0;
}
+.
We could have simply printed a string for the message, but I didn't
want the program to be boring :)
+ Functions 50
void print_message(int guess, char **message_set, int nmsg)
{
int mno= rand() % nmsg;
printf(message_set[mno], guess);
}
+.
We're writing the program top-down. Therefore our line numbers are
getting smaller and smaller as we go into more detail.
Now, we're using 'rand' for something completely different. Let's
put an srand call into main:
+ Initialize
srand(time(NULL));
printf("I have the number ready, let the game begin.\n");
printf("Enter 0 any time to quit the game.\n\n");
+.
Now all that needs to be written is the message set.
+ Variables
static char *secret_higher_message[]=
{
"My secret number is higher than %d.\n",
"You guessed %d, but it is too small. Try again.\n",
"Guess a number greater than %d.\n",
"I wouldn't make my secret number as small as %d, would I?\n"
};
static char *secret_lower_message[]=
{
"%d is too big.\n",
"You should try a number smaller than %d.\n",
"I don't think my secret number is that big.\n",
"This time, I chose a number smaller than %d.\n"
};
+.
So that's about it for this example. If you want to see a bigger
and much more complicated example, look for self_print.u in the
source distribution.
Things to Do
There are no bugs I know of. There are some things that would be
nice to have, but I'm quite happy with the state the program is in.
Things That Won't Be Done
Change Log
1.0 Initial Build
1.1 May 2013
- Cancelled the .w stuff, writing directly in C.
- Separated the proc library from the main program,
importing from alib.
- Same for strbuf
1.2 May 2013
- Added line number information to output files.
1.5 20170512
- Templates
Hacking Guide
There are 3 major object types in the program: blocks, parts and files.
When you use the append command, the related text gets stored in
a part object. The same thing happens when you use the
file command. An input file is thus separated into parts.
Companion Programs
The following programs are distributed along with the t2c sources.
Eventually, I might incorporate them into the main executable since
they are quite simplistic.
Prototype Generator
The t2c.proto program generates function prototypes. However, it's not for
use in the general case, it can be used only as a filter inside a t2c
source code.
int foo(int a) { body }
becomes
int foo(int a) ;
This has some consequences. First, the input must consist of only function
definitions. For example, things like:
typedef struct foo { body } foo_t;
will be translated to:
typedef struct foo ; foo_t;
which will fail spectacularly. In order to give only function definitions
to this program, you can use the relocation feature of t2c. For instance,
I use the following idiom.
> file.h
< t2c.proto
: Public Functions
<
> file.c
#include "file.h"
:Public Functions
and later ..
+ Public Functions
void foo()
{
}
This way, I get prototypes for all functions and no warnings :).
FUNC_BEGIN(foo)
FUNC_END
or similar. However, the following does work:
FUNC_DEF(foo) {
}
Note that t2c.proto removes all comments and pre-processor directives from
the input before generating prototypes. Therefore, things like the
following will break:
#ifdef _WIN32
int print(DisplayDevice *device,char *str)
#else
int print(XDisplay *dpy, char *str)
#endif
When you have such things, it's just easier to not relocate the corresponding
function to the t2c.proto input.
#if 0
if (a==3) {
#else
if (a==5) {
#endif
stuff
}
Since the program sees two '{' tokens, it expects two '}' tokens to close
them. It doesn't find those, so the rest of the file is gobbled up into
the first '{' block, until something equally weird happens.
+ Private Prototypes
< t2c.proto
: Private Functions
<
Quoter
The program t2c.q quotes its standard input and emits it as a multi-part
C string literal. By multi-part, I mean one string literal token per
input line. For instance,
Roses are red,
Violets are blue,
I made this program,
Just for you.
will be translated to
" Roses are red,\n"
" Violets are blue,\n"
" I made this program,\n"
" Just for you.\n"
The program encodes special characters using octal escape sequences.
There are no options or command line arguments, you can simply invoke
it as:
< t2c.q
Roses are red,
Violets are blue,
I made this program,
Just for you.
: Poem Footer
<
and such.
Downloads
Main program:
I keep the older versions in case I have messed up in the new version.
I really should clean up this place.