Moving the pfSense® Documentation to GitHub
We've moved the documentation from mediaWiki to GitHub. More information can be found on the following blog post:
There is still some clean up left to do from the conversion, so if you find anything content that broke in the process feel free to post the findings here.
I have a similar doc site I would like to convert from Mediawiki to GitHub. How did you execute the actual conversion of Mediawiki pages? What tooling did you use?
I used a php script to grab all the wiki pages from the DB and save them as flat files (note: I forgot to add the title at the top of each page in this version):
<?php $servername = "localhost"; $username = "user"; $password = "pa$$"; $dbname = "mediaWiki"; function slugify($text) { // replace non letter or digits by - $text = preg_replace('~[^\pL\d]+~u', '-', $text); // remove unwanted characters $text = preg_replace('~[^-\w]+~', '', $text); // trim $text = trim($text, '-'); // remove duplicate - $text = preg_replace('~-+~', '-', $text); // lowercase $text = strtolower($text); if (empty($text)) { return 'n-a'; } return $text; } // Create connection $conn = new mysqli($servername, $username, $password, $dbname); // Check connection if ($conn->connect_error) { die("Connection failed: " . $conn->connect_error); } $sql = "SELECT page_title, page_touched, old_text FROM revision,page,text WHERE revision.rev_id=page.page_latest AND text.old_id=revision.rev_text_id AND page.page_namespace=0 AND substring(text.old_text,2,8) NOT IN ('REDIRECT')"; $result = $conn->query($sql); if ($result->num_rows > 0) { // output data of each row while($row = $result->fetch_assoc()) { $myfile = fopen(slugify($row["page_title"]).".mw", "w") or die("Unable to open file!"); fwrite($myfile, $row["old_text"]); fclose($myfile); } } else { echo "0 results"; } $conn->close(); ?>
I also found some commands on StackOverflow to download all the images as a zip file.
I used pandoc to convert the mediaWiki syntax to RST syntax (you could do markdown or whatever here and go in a different direction):
files=($(find . -type f -name '*.mw')) for item in ${files[*]} do filename=${item##*/} #printf " %s\n" $filename pandoc $filename -f mediawiki -t rst -o ./output/${filename%.*}.rst || { printf " %s conversion failed\n" $filename ; } done
Then massaged all that into the desired sphinx formatting that I took several custom python/bash scripts to clean up the pandoc conversion (it isn't perfect).
Then I built the sphinx docs as HTML and ran the npm package broken-link-checker-local against it to check for broken links (more python scripts involved to fix them).
I also used git as a backup so I could
git checkout
if my scripts blew anything up along the way.That's about all the advise I can offer... It's a lot of work, but worth it in the end. Good Luck!
Great, thanks for sharing!
How did you come to choose Sphinx over the other many possible options (Jekyll, Hugo, many other choices)?
I'd say the big reason is that Sphinx (restructuredText) has a syntax geared towards documentation (info/warning boxes, etc) so we didn't have to adopt a custom markdown flavor to achieve the same results. We also have the option to easily build other variants such as PDFs, man pages, texinfo, etc should we choose. In a perfect world, I would prefer if there was a "standard" markdown flavor that existed for documentation that mimics the key directives of the RST syntax. The markdown ecosystem is getting there, but until then sphinx works for us.
We currently use Jekyll (possibly Gatsby in the future) for and so we are familiar with those SSGs as well.
Plus we've been using Sphinx for years for the pfSense book and other internal documentation, so it made sense to keep the converted wiki consistent.
There used to be documentation for installing pfSense on watchguard firebox appliances; this seems to have disappeared. I had bookmarked the link at which now redirects to your docs page, and I can't seem to find it via your search box. What's happened here?
Some of the pages were considered outdated, redundant, or misplaced and are still being manually organized and that might be one of them. We are still working on cleaning up everything from the move. Until then you may want to use the web archive: