Archives Africa: A brief technical overview

The Archives Africa Online Catalogue is the heart of this project. Many African Archives are almost invisible to the research community as information is held in individual repositories, sometimes with little or no public interface.

A key project requirement for Archives Africa has always been the ability to search and discover archival materials from multiple repositories, regardless of the user’s geographic location or time zone. With this in mind, it was obvious that a web-based ‘app’ was the best approach for us to take, rather than a conventional desktop or server ‘application’.

Web benefits…

An online, web-based app enables authorised Archives Africa editors to add and update their records at any time, from anywhere. It also, most importantly, publishes and ‘surfaces’ their Archival assets for better discovery by search engines, as well as other websites and social media.

We did have to keep in mind that some users may only have access to slow or unreliable internet connections, so the app needed to be fast and efficient in terms of page delivery and file size. We also need to keep things as simple as possible, as one-to-one training is impossible to provide to some users.

After assessing several available online apps, we selected Access to Memory (AtoM) due to its speed, simplicity and ease of installation / maintenance.

AtoM is multilingual, multi-repository, open source and freely available. It is based on international standards and well supported by Artefactual (the authors) who are based in Canada.

Our AtoM installation runs under Linux (Ubuntu 16.04) on a Virtual Cloud server in a UK-based data centre. The server is currently running a minimal level of specification, but can easily be allocated more processing power and storage as future site traffic demands.

Collecting data…

For most Archive Editors, the creation and maintenance of records directly through the AtoM user interface works well. However, for some Archivists in the field with slow or unreliable internet connections, we needed to design a lightweight and reliable way to gather catalogue data, and then import this into AtoM on their behalf.

The technical teams from IMAGIZ and Kings Digital Labs (KDL) have collaborated and developed a prototype approach for capturing (currently) collection-level descriptions, based on Microsoft Excel.

A structured Excel form has been designed in English and French – and can be further translated into additional languages as required in the future. It contains separate worksheets for several required ISAD(G) fields, plus additional areas for Subjects, Places, Personal and Corporate names.

KDL have developed a server-based application (written in ‘Python’) which can receive this Excel form as an email attachment. It removes and processes the data, validates it and then converts it into several custom XML files which are then saved on the server. If any errors are detected in the form, the application emails a message to the sender asking them to check the data, and to resend it when corrected.

Any saved sets of valid XML files will then be detected and processed by a further custom server-based application (written in ‘Node.js’) currently in development by IMAGIZ. This application will convert the XML into a clean, structured EAD file, which will be saved, ready for import into AtoM.

This data-capture process works as follows:

  • Excel form is distributed to archivists in the field who complete and email (when connectivity allowed) it to their supervisor for an initial check.
  • The checked form is sent (from ‘known’ email addresses only) to Archives Africa as an email attachment.
  • The KDL and IMAGIZ applications will work together to process validated data into EAD which is imported into AtoM.

EAD is currently imported manually in order to monitor data quality and structural integrity. It also enables us to correct any remaining minor issues – but the intention is to refine both applications in the future so that they can automatically detect and correct some (or all) issues with incoming data.

Both server applications have been initially designed to collect and process collection (summary) level records but the ambition is to update them to enable the collection and processing of detailed catalogue items.

Again, the intention at all stages is to keep things as simple as possible. This keeps development and maintenance time to a minimum.

This project is still in relatively early stages of technical development. The prototype systems we have put in place will work well, but will also benefit from future enhancement as additional time and budget allows.