I am super excited and incredibly honoured to have the opportunity to undertake an overseas research travel project in August/September 2019. Supported by the Gordon Darling Foundation through the Darling Travel Grants - Global, the aim of my project is to investigate how cultural institutions abroad are acquiring, preserving and providing access to born-digital collections.

This will be my first time travelling to the United Kingdom and Europe, as well as my first time travelling solo overseas. I will be spending time in London, Edinburgh, Glasgow and Amsterdam where I will be visiting some amazing organisations (and I am open for more visits - get in touch!) and attending iPRES for the first time as well! I will also be making a quick stop over to Dublin at the end of the trip, which I am also super excited about (more details to come).

I look forward to meeting new people as well as catching up with those I have previously met online and face to face in Australia. I haven't quite decided whether I will actively blog while travelling, but I will definitely be posting to my Twitter and Instagram accounts.

Cover image credit: Map by Free Vector Maps


I have come to appreciate the availability of free graphics with the amount of presentations I have had to put together, particularly over the last six months with teaching as well as presenting as part of my role at the Library. I am a heavy user of the wonderful digital preservation illustrations from Digitalbevaring.dk, which has inspired me to make my own illustrations available under a creative commons licence.

I have put together a small pack containing eight illustrations (with some extra colour variations) of digital physical carriers. This includes a 3.5" floppy disk, 5.25" floppy disk, Apple PowerBook 1400c, CD-ROM, Digital Linear Tape, Jaz Disk, USB drive and Zip Disk.

I am providing access to these under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) licence.

You can download a zip file on the following page, which may be updated in the future with more illustrations: https://blog.matthewburgess.net/p/illustrations.html

The International Digital Curation Conference travelled to Australia and the southern hemisphere for the first time this year, hosted by the University of Melbourne from 4 - 7 February 2019. With the theme of collaborations and partnerships in the field of digital curation and preservation, this event highlighted the collaborative nature of the community of practice surrounding this field of work on a global scale. Even as a relatively new professional, I recognised many names and faces ("I follow you on Twitter!") and found the programme engaging. I was fortunate to take part in many aspects of the event, from the pre-conference workshops to the unconference, and along with the excitement of attending the conference itself I was also excited to present at an international conference for the first time.

It all began on Monday morning with Digital Preservation Carpentry - a full day, pre-conference workshop that aimed to trial a hands-on technical lesson for digital preservation processes using the pedagogical teaching style of the Carpentries; and to gather feedback from participants to enhance further development of digital preservation lessons. The idea for the workshop was raised at the inaugural Australasia Preserves event in February 2018, where it was highlighted that there was a lack of training and education in this space within the Australasian context. It was a pleasure to be involved with a team of talented professionals in developing this first iteration. My focus was on the BagIt File Packaging Format and the use of Bagger as a tool. Overall, it was great to see a mix of participants (skill level, background, location etc) in an open and welcoming environment, keen to engage with the content. As organisers, we came away with some great feedback to improve on what we developed, as well as interest from the community in developing further lessons. Where to from here will be explored through Australasia Preserves, so make sure you join the Google Group if you would like to know more. Collaborative notes from the workshop are available here: https://tinyurl.com/y8zrk8oo


My lightning talk was part of the afternoon Parallel Session C - Digital curation & preservation on day one of the main conference and was chaired by Paul Wheatley, Head of Research and Practice at the Digital Preservation Coalition (DPC). We were fortunate to have Wheatley also volunteer his time in the Digital Preservation Carpentry workshop where he provided an impromptu thought session on what a manifest is really about in relation to the BagIt File Packaging Format. He highlighted threats to digital objects and noted that we should 'trust nothing, validate everything' and contemplate what minimum information is required for a meaningful, verifiable manifest. This reminded me of Ross Spencer's discourse on file digests, noting that 'understanding how to create a file digest, and what that means, provides a mechanism to ensure that a file transferred from a donor, or from a central government agency, to an archive remains unchanged'.


The lightning talk session included some fascinating talks, from thoughts on the DCC Curation Lifecycle model by Sayeed Choudhury, the use of BitCurator for processing, appraisal and iterative selection of email by Cal Lee and Lachlan Glanville discussing how the Germaine Greer archive drove digital preservation at the University of Melbourne Archives. The session, and day one, concluded with Carolyn Hank's energetic talk 'Dead, Dormant, Zoetic: Modeling the Blog Lifecycle', which made me think of my own blog that has been dormant since early 2018 (until now!). My lightning talk 'Digital preservation at the point of acquisition: Collecting born-digital photographs' aimed to highlight the collaborative process of developing new guidelines and specifications for collecting born-digital photographs and upskilling librarians through a hands-on photography workshop to understand the requirements being asked of donors and vendors. It is available as a blog post via the following link, along with a copy of the specifications and guidelines: http://bit.ly/2EGNMIj


BitCuratorEdu

After hearing Cal Lee speak during the minute madness rapid fire poster presentations, I was keen to hear about BitCuratorEdu - a two-year project to study and advance the adoption of digital forensics tools and methods in libraries and archives through professional education efforts. I will certainly be keeping a close eye on this project as one of the outputs includes the production and dissemination of a publicly accessible set of learning objects to be used in providing hands-on digital forensics education. This is something that is clearly lacking for both students as well as information professionals in the context of galleries, libraries, archives and museums (GLAM).

As someone fortunate to have had the opportunity to undertake computer forensics training utilising Forensic Toolkit (FTK) in 2017, I have since been interested in finding something that is specific to GLAM. The requirements for law enforcement are quite different when it comes to digital forensics, and the power of this software also raises ethical concerns when dealing with collection material.

I managed to catch Lee in between all of his engagements at the conference to discuss the project/poster. In discussing the preliminary finding that instructors desire realistic datasets and mechanisms to connect students to real-world projects, I commented that this is relevant when learning about many aspect of digital preservation and digital asset management. My education included the use of open source digital library software and dummy data to analyse and document requirements and specifications for the design of a digital asset management system. This makes me wonder how we can connect students with GLAM organisations in an effort to provide real-world projects, particularly in Australia. I think the challenge here revolves around the learning outcomes required for the project, and whether the organisation can provide enough autonomy for students to meet them. Lee pointed out Digital Corpora as a useful resource for computer forensics education, which contains freely available disk images and other files.

Scaling Emulation and Software Preservation Infrastructure, the EaaSI network 

Unfortunately I missed the demo by Euan Cochrane at the conference, but I managed to have a discussion with him regarding the EaaSI network over a drink or two and it sounds like an exciting program. Led by the Digital Preservation Services team at Yale University Library, EaaSI aims to enable broader access and use of preserved software and digital objects. Being able to click a link in an online catalogue to open an emulated environment in a web browser, looking at born-digital files within their original software configuration, is a future I would like to see! Unfortunately it sounds like this will not be an easy feat on a global scale, with Cochrane noting that the copyright and legal requirements for different jurisdictions creates the need for local instances of the EaaSI network. Hearing about this project re-enforced my current thinking, and the current policy in my organisation, to retain original files when acquiring born-digital collections - even after normalising them. It is important to be able to go back to the original, with emulation developments in the future providing alternative access mechanisms.

There was a lot of involvement in IDCC19 by Australasia Preserves, a digital preservation community of practice (CoP) established by the University of Melbourne in February 2018. From the Digital Preservation Carpentry workshop, to multiple sessions during the unconference, it was great to see. Jaye Weatherburn, Data Stewardship Coordinator at UniMelb, also gave a talk on 'Advancing digital preservation capability through collaborative connection' that promoted Australasia Preserves, highlighting its achievements in its first year. I have enjoyed my involvement with this CoP over the past 12 months and it provided me with the opportunity to organise my first event in July last year, which was a learning experience to say the least.

Further general comments and selected highlights from both days of the conference:

  • Christine Keneally's keynote discussed the significance of data curation in democracy, where institutions can destroy and rewrite important truths, with data curators as frontline guardians to the bedrock of society 
  • The importance of metadata was noted in several talks, from Joakim Philipson's comment that validation is key to keeping metadata in good shape and being adaptable for the future; Lars Vilhuber discussing the lack of consistent, reliable metadata for restricted data (eg, no information on licenses, accessibility); Donna Hensler's lessons learned, existing metadata needs to be in good shape before importing into new systems, where the curation of metadata is a substantial, time consuming activity 
  • In discussion on collaboration across communities with Nancy McGovern and Clifford Lynch, chaired by Kevin Ashley, the question was raised on whether content creators should be involved. Lynch said that capturing intent of creators is important and has emerged as a key issue in the preservation of digital art. McGovern stated that we have to 'have our ducks in a row' before we start talking to content creators and understand what we are trying to do 
  • Flora Feltham's talk on building an Aotearoa New Zealand-wide digital curation community or practice for sector wide collaboration to give people confidence and expertise to collect and manage born-digital materials 
  • Michelle Negus Cleary and Peter Neish spoke about collaborating across borders with the Anzac Gallipoli Archaeological Database (AGAD), highlighting the challenges in ongoing custodianship and the need for data management plans 
  • Dr Patricia Brennan's keynote, highlighting the importance of digital curation to maximise reuse of data for other studies, mitigate obsolescence, maintain value, facilitate reproducibility and increase pathways of discovery 


For me, the unconference was heavily geared towards Australasia Preserves where we had morning and afternoon sessions that looked at the past 12 months, discussing what worked well, what did not and how we can make it work going forward. This included discussions on how we could connect more with the private sector, how the Digital Preservation Coalition can help, how the National and State Libraries Australia (NSLA) digital preservation CoP operates and how we can ensure sustainability for another year. The day finished with some outcomes and next steps, including the development of a briefing pack to enable people to advocate to their management for involvement in the community as part of their professional development. Well done to Jaye for putting together such a great document. The collaborative notes from the unconference can be found here: https://bit.ly/2SBcxgw

The unconference also saw an impromptu, brief introduction to BitCurator workshop with Cal Lee where he helped participants install the environment using VirtualBox. Unfortunately for me, my laptop did not have a suitable processor to make it work so I could not get it running on the day, but it did give me a very brief overview that spurred me to install and look at it once I returned to work and had access to a suitable computer. Lee highlighted that when working with disk images, you should create them in the virtual environment and then determine whether any further actions could be undertaken in the host environment where you will have more processing power. He provided a very quick overview of Bulk Extractor Viewer, which is a graphical user interface that can be used to scan for personally identifiable information (PII). It was great to be able to attend this short session as I did not have the opportunity to stay in Melbourne for Lee's workshop the following morning.


As a first time attendee and presenter at a conference in this field of work, I found IDCC19 to be a welcoming and invigorating experience with great diversity in attendees and the programme. It provided the opportunity to meet professionals from across the globe as well as the Australasian region. While I was completely exhausted after a whirlwind four days, I returned home with a renewed passion for the work I do as well as practical plans for further research and actions based on presentations and impromptu conversations during networking events.

You can find out more details about the conference, and links to collaborative notes and slides at the following location: http://www.dcc.ac.uk/events/idcc19

I have also created an #IDCC19 TAGS archive of tweets that you can can access via the following link: http://bit.ly/2Eqm94K

Related links and useful resources:

  1. Australasia Preserves Briefing Pack
    https://blogs.unimelb.edu.au/digital-preservation-project/2019/02/27/australasia-preserves-briefing-pack-2019
  2. Australasia Preserves Google Group
    https://groups.google.com/forum/#!forum/australasia-preserves
  3. Australasia Preserves at IDCC 2019 blog post
    https://blogs.unimelb.edu.au/digital-preservation-project/2019/02/14/australasia-preserves-at-idcc-2019
  4. Bagger
    https://github.com/LibraryOfCongress/bagger
  5. BagIt File Packaging Format
    https://tools.ietf.org/html/rfc8493
  6. BitCuratorEDU project website
    https://educopia.org/bitcurator-edu
  7. Digital Corpora for computer forensics education research
    https://digitalcorpora.org
  8. Digital Preservation Carpentry workshop
    http://www.dcc.ac.uk/events/workshops/digital-preservation-carpentry
  9. Digital Preservation Coalition website
    https://www.dpconline.org
  10. Scaling Emulation and Software Preservation Infrastructure (EaaSI) website
    https://www.softwarepreservationnetwork.org/eaasi

Cover image: My view from the plane window on the way to Melbourne from Sydney, 3 February 2019.

Standards and best practice are important when dealing with digital assets in a cultural collecting institution. It is important when dealing with information. They facilitate the access, discovery and sharing of digital resources, as well as their long-term preservation1. This is where preferred file formats, procedures for offloading digital assets from physical carriers, documenting preconditioning actions and many other activities come into play.

But you cannot always control what you receive when it comes to digital collections. Standards are there for guidance and sometimes decisions need to be made on whether to allow something into the collection that does not meet them. The intrinsic value of the object, its uniqueness and rarity may very well trump the technical requirements for digital collecting. When dealing with born-digital photographs for example, where some institutions prefer a Camera Raw or uncompressed TIFF file format, a low resolution JPEG would also be accepted under the right circumstances.

The digital collecting workflow has changed significantly in the last 12 months in my workplace with the introduction of new standards and tools such as BagIt2 and Bagger3, as well as beginning to ingest the significant backlog of both digitised and born-digital collections into our digital preservation system. We have strict control over the process for new acquisitions, but our legacy collections are another story.

While checksums have been generated for acquisitions for a number of years, there are legacy collections that do not contain as much metadata as we generate and use today. This includes checksums, virus scans and information relating to the physical carrier it was received on. So now that we have these new procedures, guidelines and workflows in place, what do we do with these legacy collections? Should we go back to the creator and ask them to submit the files again? Should we try and locate the physical carrier it was received on before we had a policy in place to manage and store them? While this may be possible in some cases, there is a point where you need to draw the line and accept things as they are.

Authenticity is an attribute that is highly valued in digital preservation, where appropriate steps need to be taken to ensure that it is not compromised during the process of managing digital assets 4. It is important to establish authenticity (including fixity) as early as possible. Drawing the line with legacy material means accepting them as they are, generating checksums now and bringing them up to our current standards for our ingestion processes to make them accessible now and into the future.

Custodial control of digital assets can only be maintained within the context of both organisation and system policies, procedures, guidelines and following best practice. These will change over time and it is important to understand that you may have to let go of strict control requirements under some circumstances and do the best with what you have at the time.


This post is my contribution to the GLAM Blog Club April theme: 'Control'.


1. Standards and best practice, Digital Preservation Handbook. Digital Preservation Coalition.
https://www.dpconline.org/handbook/institutional-strategies/standards-and-best-practice

2. Kunze, J., Littman, J., Madden, L., Summers, E., Boyko, A., Vargas, B., 2016. The BagIt File Packaging Format (V0.97).
https://tools.ietf.org/html/draft-kunze-bagit-14

3. Bagger, Library of Congress.
https://github.com/LibraryOfCongress/bagger

4. Harvey, R., Weatherburn, J., 2018. Requirements for Successful Digital Preservation, in: Preserving Digital Materials. Rowman & Littlefield, Lanham, MD, United States, p. 86.



Cover image credit: Illustration copyright of digitalbevaring.dk and shared under a CC BY 2.5 Denmark licence (illustrations) https://creativecommons.org/licenses/by/2.5/dk/deed.en_GB , and a CC0 1.0 licence (icons) https://creativecommons.org/about/cc0.

Last year was a big year for me in terms of professional development (PD) and I am fortunate to be in a role that invariably includes learning (What file format is that? How do I get something off this physical carrier? How does this thing connect to that thing?). As 2018 begins, I feel like this year will be much the same. One of my goals this year is to actively take the time to learn more. It has been very easy to get swept up in everyday work over the last six months, so this year I plan on setting aside some dedicated time for learning.

I joined the ALIA PD Scheme this time last year and successfully completed my first year of compliance at the end of June. Based on financial year, I admittedly have not taken the time to reflect on the last six months so my first goal is to catch up on tracking my PD and plan the next five months to complete my second year of the scheme.

With that in mind, here are some things I want to learn in 2018:

  • Python: this has been at the top of my "to do" list for a while and I am increasingly finding myself in situations where I believe it could be quite useful. I have no interest in becoming a software developer, but this coding language has become quite popular with working professionals who are using programming skills to get better at their jobs. I plan on starting with Automate The Boring Stuff With Python and go from there. 

  • Public speaking: I have mentioned previously that this has not always been a strong point for me. I had several opportunities to present last year, which includes to students, to industry as well as developing and running a workshop for the first time. I am also looking forward to taking part in Australasia Preserves in Melbourne next month. This is not so much what I want to learn, more an area I would like to get more practice and experience.

  • Writing: GLAM Blog Club has provided a great opportunity to keep me active in writing something at least once a month for the past 12 months. While I plan on continuing that, I also want to start exploring other avenues. I had my first attempt at submitting an abstract to a conference last year. While unsuccessful, my plan is to build on that and keep trying.


This post is my contribution to the GLAM Blog Club January theme: 'What I want to learn in the year ahead'.

I have been taking photos with digital cameras since 2002. I still have a lot of those photos, but there are two distinct occasions where a computer or hard drive failure resulted in the loss or corruption of some images. I have previously discussed my attitude to naming files over the years. Thankfully my education in both photography and information management improved that quite significantly and I recently finished going through my archive and giving everything a meaningful file name and organising it in a meaningful way. But the question remains: how can I mitigate the risk of losing my digital files? Can I do anything to salvage the corrupt photos?

My first instance of a hard drive failure occurred in 2006. I was backing everything up to an external hard drive and occasionally to CD. It had been months since I had backed up to CD so I lost some photos. I wrote a personal blog entry at the time, stating "this enforces the fact that digital photography is not safe and I should take more precautions in the future". In 2007 my PC hard drive failed right after I purchased my first iMac. I was due to have some event photography published in a local magazine but missed the deadline due to the failure. I wrote at the time, "I no longer trust technology. It hates me. It's the second time in two years a hdd has died and i've lost my work."

Since then, my backup system has not been much of a system. Up until recently I had my archive across multiple hard drives, with at least two copies of everything. I was also using DVDs up until mid-2009. At the end of 2016, I decided it was time to bring my archive onto a single backup system so I had everything in one, accessible location. This led to the purchase of a two bay RAID enclosure, which I set up in a RAID 1 drive mirroring configuration using two identical 3TB hard drives. I was still using an iMac at the time as my main computer so I decided to set this up in Apple's HFS+ format.

I recently decided to build myself a computer for the first time, which presented its own challenges (I have nightmares about thermal paste). Switching from OS X to Windows 10 is problematic with a HFS+ formatted external hard drive. There are programs available to be able to both read and write to Apple formatted hard drives on Windows computers (such as Paragon HFS+ or Mac Drive), but that is not ideal. I made the decision to copy the archive onto a dedicated hard drive on my new computer before erasing the external hard drive and reformatting it for Windows, which provided the perfect opportunity to assess and organise my archive. I knew there were corrupted images in my archive but hadn't tried to do anything about it until now.

The photo at the beginning of this post is an example of one of the many corrupted Canon CR2 camera raw files I have in my archive from the time of the second hard drive failure. These images were recovered by a family member at the time, but obviously not everything was a success. All of my backup copies appear to have the same corrupted files. I am thankful that I had started shooting in the camera raw file format at the time, because that gives me some options for recovery because most cameras embed a JPEG preview image into a raw file. Depending on the camera model and manufacturer, embedded JPEG files may be full resolution or smaller.

So how do you extract a JPEG from a camera raw file? Utilising a tool I have been using a lot both personally and professionally - ExifTool by Phil Harvey. ExifTool has a lot of great uses, particularly when it comes to digital photographs and metadata. Using the command line tool, it is possible to extract the JPEG image from the raw file and embed all the metadata from the original file (example provided by Harvey here under "copying examples"). Unfortunately for me, the camera model I was using at the time embedded smaller resolution previews so I will never have the full resolution images again, but a low resolution JPEG is better than nothing!

Preview JPEG extracted from corrupted Canon CR2 camera raw file.
Once I have finished extracting all of the JPEGs from my corrupted CR2 files, my next step will be to reformat my external RAID hard drive to Windows-based NTFS and copy everything back across from my computer. Before I do that, I will first make a backup copy of the files currently on my new computer to a regular, portable external hard drive (encrypted with BitLocker). This will become my offsite copy that will be stored at a relatives' house.

Ultimately, I will have three separate devices with a copy of all of my digital files. Technically this will be four copies with the RAID 1 setup. To combat the issue of file corruption I plan to use the BagIt standard to store checksums which can be validated on a regular basis. It is not ideal for working files, though, as any changes to the files will change their checksum.

It is hard to follow digital preservation standards on a personal level because it is not cheap. Good practice includes multiple independent copies that are geographically separated, using different storage technologies and actively monitoring storage to ensure any problems are detected and corrected quickly (Digital Preservation Handbook). I did not even bother looking at the costs involved in cloud storage for my almost 3TB digital archive and I am not sure if there are any consumer equivalents to digital asset and preservation management systems used by collecting institutions. My experience has taught me a lesson, though, and it is important to do the best you can within your means to backup, monitor and organise your digital life. Digital preservation is an ongoing activity. I have previously made the mistake of just putting things on a hard drive/CD/DVD and thinking it was safe. I will not make that same mistake again.


One of my highlights for 2017 was my involvement in an Agile project to create software for the bulk ingestion of digital collections into a digital preservation system. This was my first time coming across Scrum and Agile, as well as being involved in software development. While I had not previously had direct involvement in the big project management framework used by my organisation, the amount of effort required in planning, as well as its rigid structure and inability to deal with rapid changes in requirements, was understood. One of the first things I realised with the Agile approach was the ability for the project and the developers to adapt quickly to changing requirements while still delivering a minimum viable product (MVP). As the project progressed, we understood what was achievable and adjusted the MVP accordingly with the knowledge that it is not the completion of the project.

From a future user and subject matter expert point of view it was really engaging to see my input affect the development of the software, from the user interface to how the system deals with digital files. It was satisfying working collaboratively as a team, with regular meetings for sprint planning and review as well as regular stand-up meetings to keep the conversation going. 

Since I am not a fan of sports ball, Scrum terminology is lost on me. But the methodology is not. It is a lightweight process for managing and controlling software development in rapidly changing environments and is an intentionally iterative, team-based approach (Cervone 2011). It is relatively simple with clearly defined roles for each team member. It also provides the ability to quickly develop and test features before moving on to the next task and once again it was great to see my feedback from testing applied directly and quickly to the product. 

I look forward to continuing work on this collaborative project in 2018. It will be exciting to see its impact on workflows going forward, with the aim for more automated and less human intervention for the preparation and ingestion of files into the digital preservation system.

This post is my contribution to the GLAM Blog Club December theme: 'Collaboration'.

Further reading and references:


Cover image credit: British Rugby Union football players in a scrum, New South Wales, ca. 1930 [picture]. National Library of Australia. http://nla.gov.au/nla.obj-162078997

GLAM Blog Club provides a great opportunity to interact with the professional community while also presenting a topic and deadline to keep me active with my blog posts each month. It also provides a challenge in regards to the balance between professional and personal with topics such as "how I ended up here" and "identity". I made the decision early on that my blog would focus on the professional side of things, but some of the topics this year have really pushed that boundary and I have had to make a conscious decision on the amount of details I provide in my blog posts when verging on the personal.

It is really important to understand who your audience is when writing a blog post. That audience is essentially anyone with an internet connection. You never know who may come across your blog and that includes work colleagues, your boss, family and recruiters. I was really struck by Edward Shaddow's July blog post discussing identity and the use of real names on the internet. My earlier years on the internet involved the use of a pseudonym and I am now thankful for that. My angst-filled Livejournal posts are not linked to my name (I am glad Facebook did not exists when I was a teen). The internet does not make a distinction between personal and professional, so it is important to understand what you are putting out into the world and the implications it may have on your professional life when someone searches the internet for information about you.

To keep the balance in favour of professional, I try and relate my posts to my work as much as possible. Any excuse to write about digital preservation. And the great thing about GLAM Blog Club is that there is no problem with that. While some of my posts have verged into the personal realm, discussing my career or my identity in the information management profession, they were written with all of the above in mind.

This post is my contribution to the GLAM Blog Club November theme: 'Balance'.