HTMLGen


I write all my own HTML pages, I don't use a special tool or a website publisher. When I was starting in early 2003, I looked around at some Beeb-interest websites for ideas and picked up the basics of embedding images in pages, creating and laying out tables, and adding hyperlinks. Still to this day, that's pretty much all that I use in my own HTML. Nothing fancy at all.

Over the years, I made up a document of useful bits of HTML which I called my "HTML Library", which also had certain hard-to-generate character codes in it, like fraction symbols and the micro prefix.

The first 16 years of BeebMaster webpages were written in a simple text editor - GEdit since November 2008. I also had one or two template files, but generally, to make a new picture set, I would pick one existing set in the right section and save it then find and replace the set name with the new name, then take the first picture page in the set, replace and save that, and go through the whole picture set updating the link for each HTML page and adding a new caption.

It was very time consuming and pretty boring, and had no spell check, no grammar check, and no syntax check.

One benefit to it, at least as I saw it, was that I was writing the captions off the cuff, so they are very natural and unrehearsed, but as this process was often several years after I made the sets, it didn't always make for the most illuminating explanations - if you've ever followed a picture set and felt I didn't know what I was doing, you were probably right! I'm sure I did at the time, but years later perhaps not!

During 2013, I started to make an accompanying caption text file at the time I made the picture sets. Initially this was a brief overview, but over time became a caption for each picture. This served as a very helpful aide-memoire when writing up the picture sets.

What I wanted for a long time was a way to auto-generate the HTML pages from captions I'd already written. I did try a couple of things on occasion. I had a spreadsheet which would make up the main index page table, but it didn't get much use.

In July 2019, I set to work on an HTML Generator written in BBC BASIC, called "HTMLGen". I wanted to run it on a BBC Micro, but it runs on my RISC PC, which is connected via Sunfish to my shared network storage drive, so that I don't have to worry about truncating filenames or anything like that. It takes a simple plain text file of various variables, index page caption and individual picture captions, which I devised through several iterations, and outputs the information into the index table HTML page and then all the individual picture HTML pages. It does in a few seconds what it used to take me days to do manually.

The first HTMLGen sets were uploaded in August 2019. Even referring to the caption files I'd already written, I was still able to keep the spontaneous feel of the set captions. From August 2019 onwards, every new picture set I make has a captions file written in the HTMLGen format ready for generation at the time the set is uploaded, so when that filters through things might start to make more sense!

In April 2020, I used one of my existing captions text files for the first time in my HTMLGen source file, by applying some regex find and replaces to convert the captions into the right format for my HTMLGen template.

If you look at the HTML source for any new pages, you'll see the HTMLGen credit applied near the top:

< !-- Generated with HTMLGen 4.5 using RISC OS 3.70 on Fri,10 Apr 2020.13:31:25 -- >

This also helps me with another thing I've wanted to do, but been able to figure out how - date-stamp the HTML pages within the source. I thought of using a batch regex find and replace to insert the date, but trying to do that with a date other than now, such as using the file modification time, was very difficult. Being able to search to the right point in the file to insert a hidden date stamp also presented a problem owing to the unstandardised state of many of my existing HTML pages which had developed organically over time. Also it meant that I would have to remember to update the date stamp manually if I ever edited the page.

It was all too much trouble, but the time stamp added by HTMLGen is definitely a step in the direction of knowing when each new page was created. A further bonus is that the pages are now all in a standard format, so if I ever did want to insert something extra at a specific point, it should be possible - at least with pages made by HTMLGen.

By May 2020 I had generated over 3,000 HTML pages in 80 picture sets, and this natty little tool of mine has enabled me to make the May 2020 update one of the biggest ever.

In November 2020, I made a concerted effort to adapt HTMLGen to run on a BBC Micro so that I could use my Beebs to do the HTML page generation, and also to write the captions text files as well. Please click here to see my pages on HTMLGEN generated on a Beeb by HTMLGEN!

I started out writing the captions files on my PC, but I wanted to do them on a Beeb. Unfortunately, my favourite BBC Micro Word Processor, Inter-Word, let me down as it's not possible for Inter-Word to save plain text files; it applies its own file format and control codes to saved documents. Even spooling a file still applies some formatting for printing, adding a carriage return at the end of each line, which I couldn't turn off. For a long time I had to use Edit for writing the captions. It worked well, but wasn't perfect as it doesn't word-wrap.

In August 2022, I had another go with Inter-Word and remembered my Inter-Word converter of 20 years earlier, which I wrote to get round the file format problem I'd encountered in a different context all those years ago! I updated it to deal with some new situations with my captions files, mainly to do with hash and pound sign characters and paragraph marks, which all needed to stay as-is and not be converted to anything else, and brought it into use. Now I write the captions in Inter-Word and use the IW converter to make a plain text file out of the Inter-Word document which works a treat in HTMLGEN!

One thing it couldn't cope with is picture sets with different image filetypes. I still had to process these manually to some extent, editing the HTML generated to correct the filetype of the picture where it changed part way through a set. In October 2021, I made a solution by introducing a "flip" flag which could be inserted at the beginning of a picture's caption, which would then switch the picture filetype associated with that caption. I expanded this in February 2023 to allow video files to be included in the picture sets which would play when clicking the main picture, by use of another "flip" flag at the beginning of the caption.

Version 5 of HTMLGEN came out shortly afterwards, with modifications to allow "Domesday Navigation", and as at May 2023, HTMLGEN stood at version 5.3.

HTMLGEN is still entirely written in BBC BASIC, though there has long been a bit of 6502 assembler included to get the OS version string on pre-RISC OS machines, but it is getting longer and longer as new features and developments are added. This has made it more difficult to run on a BBC Model B, even using mode 7. The BASIC program is now over 16K and all the remaining user RAM was used to assemble each HTML page in memory before saving it to file. Generally the individual HTML pages are well under 2K, but the index pages can be many times more, especially in the case of very long picture sets.

In June 2023 I finally took the plunge to do something I had long been advised to do - use file handling calls to write the file contents directly to file instead of assembling the full HTML page in memory and saving the memory block. I had been reluctant to do this as I thought it would be difficult to keep track of different pointers between the reading of the captions file and the writing of the output file. In fact, with OSGBPB, it's simplicity itself, as the call updates the control block with new pointers following the writing of each chunk of data, so the mere user like me doesn't even have to get involved in knowing where to start and finish. Now, instead of my procedures to tack on the next bits at the memory pointer, which I had to keep updated, I just use the same procedures but replace the memory writing with an OSGBPB call, and I don't even have to keep an "output" pointer at all, everything is done for me by the OS!

Version 6.0 was the first to use file handling to create the files, so I can now run a picture set of any size through HTMLGEN even on a Model B without worrying about running out of memory! Version 6 also includes expanded information in the hidden HTML tag about the machine being used to make the page, like this:

< !-- Generated with HTMLGen 6.4 using Econet Station 201 (BBC Master 128) with Acorn MOS 3.50 on Fri,29 Sep 2023.23:35:12 -->

At the end of September 2023, I brought out HTMLGEN version 6.4 which added the facility to generate table links for a picture set's parent index page, and the "What's New page", which I had still been doing manually. To automate the process even more, the "What's New" link includes most of the index page caption, so you will see from October 2023 onwards much longer descriptions in "What's New" as these are now taken directly from the wording used on the picture set index page.

The table below shows the progress I have made over the years since I started using HTMLGEN:

DateNumber of HTML Pages GeneratedNumber of Sets
May 2020 (approx figures)3,00080
January 2022 (approx figures)8,000280
January 202310,129374
May 202311,231422

Updated 1st October 2023

Click here to return to All About BeebMaster