Standards and best practice are important when dealing with digital assets in a cultural collecting institution. It is important when dealing with information. They facilitate the access, discovery and sharing of digital resources, as well as their long-term preservation1. This is where preferred file formats, procedures for offloading digital assets from physical carriers, documenting preconditioning actions and many other activities come into play.

But you cannot always control what you receive when it comes to digital collections. Standards are there for guidance and sometimes decisions need to be made on whether to allow something into the collection that does not meet them. The intrinsic value of the object, its uniqueness and rarity may very well trump the technical requirements for digital collecting. When dealing with born-digital photographs for example, where some institutions prefer a Camera Raw or uncompressed TIFF file format, a low resolution JPEG would also be accepted under the right circumstances.

The digital collecting workflow has changed significantly in the last 12 months in my workplace with the introduction of new standards and tools such as BagIt2 and Bagger3, as well as beginning to ingest the significant backlog of both digitised and born-digital collections into our digital preservation system. We have strict control over the process for new acquisitions, but our legacy collections are another story.

While checksums have been generated for acquisitions for a number of years, there are legacy collections that do not contain as much metadata as we generate and use today. This includes checksums, virus scans and information relating to the physical carrier it was received on. So now that we have these new procedures, guidelines and workflows in place, what do we do with these legacy collections? Should we go back to the creator and ask them to submit the files again? Should we try and locate the physical carrier it was received on before we had a policy in place to manage and store them? While this may be possible in some cases, there is a point where you need to draw the line and accept things as they are.

Authenticity is an attribute that is highly valued in digital preservation, where appropriate steps need to be taken to ensure that it is not compromised during the process of managing digital assets 4. It is important to establish authenticity (including fixity) as early as possible. Drawing the line with legacy material means accepting them as they are, generating checksums now and bringing them up to our current standards for our ingestion processes to make them accessible now and into the future.

Custodial control of digital assets can only be maintained within the context of both organisation and system policies, procedures, guidelines and following best practice. These will change over time and it is important to understand that you may have to let go of strict control requirements under some circumstances and do the best with what you have at the time.


This post is my contribution to the GLAM Blog Club April theme: 'Control'.


1. Standards and best practice, Digital Preservation Handbook. Digital Preservation Coalition.
https://www.dpconline.org/handbook/institutional-strategies/standards-and-best-practice

2. Kunze, J., Littman, J., Madden, L., Summers, E., Boyko, A., Vargas, B., 2016. The BagIt File Packaging Format (V0.97).
https://tools.ietf.org/html/draft-kunze-bagit-14

3. Bagger, Library of Congress.
https://github.com/LibraryOfCongress/bagger

4. Harvey, R., Weatherburn, J., 2018. Requirements for Successful Digital Preservation, in: Preserving Digital Materials. Rowman & Littlefield, Lanham, MD, United States, p. 86.



Cover image credit: Illustration copyright of digitalbevaring.dk and shared under a CC BY 2.5 Denmark licence (illustrations) https://creativecommons.org/licenses/by/2.5/dk/deed.en_GB , and a CC0 1.0 licence (icons) https://creativecommons.org/about/cc0.

Last year was a big year for me in terms of professional development (PD) and I am fortunate to be in a role that invariably includes learning (What file format is that? How do I get something off this physical carrier? How does this thing connect to that thing?). As 2018 begins, I feel like this year will be much the same. One of my goals this year is to actively take the time to learn more. It has been very easy to get swept up in everyday work over the last six months, so this year I plan on setting aside some dedicated time for learning.

I joined the ALIA PD Scheme this time last year and successfully completed my first year of compliance at the end of June. Based on financial year, I admittedly have not taken the time to reflect on the last six months so my first goal is to catch up on tracking my PD and plan the next five months to complete my second year of the scheme.

With that in mind, here are some things I want to learn in 2018:

  • Python: this has been at the top of my "to do" list for a while and I am increasingly finding myself in situations where I believe it could be quite useful. I have no interest in becoming a software developer, but this coding language has become quite popular with working professionals who are using programming skills to get better at their jobs. I plan on starting with Automate The Boring Stuff With Python and go from there. 

  • Public speaking: I have mentioned previously that this has not always been a strong point for me. I had several opportunities to present last year, which includes to students, to industry as well as developing and running a workshop for the first time. I am also looking forward to taking part in Australasia Preserves in Melbourne next month. This is not so much what I want to learn, more an area I would like to get more practice and experience.

  • Writing: GLAM Blog Club has provided a great opportunity to keep me active in writing something at least once a month for the past 12 months. While I plan on continuing that, I also want to start exploring other avenues. I had my first attempt at submitting an abstract to a conference last year. While unsuccessful, my plan is to build on that and keep trying.


This post is my contribution to the GLAM Blog Club January theme: 'What I want to learn in the year ahead'.

I have been taking photos with digital cameras since 2002. I still have a lot of those photos, but there are two distinct occasions where a computer or hard drive failure resulted in the loss or corruption of some images. I have previously discussed my attitude to naming files over the years. Thankfully my education in both photography and information management improved that quite significantly and I recently finished going through my archive and giving everything a meaningful file name and organising it in a meaningful way. But the question remains: how can I mitigate the risk of losing my digital files? Can I do anything to salvage the corrupt photos?

My first instance of a hard drive failure occurred in 2006. I was backing everything up to an external hard drive and occasionally to CD. It had been months since I had backed up to CD so I lost some photos. I wrote a personal blog entry at the time, stating "this enforces the fact that digital photography is not safe and I should take more precautions in the future". In 2007 my PC hard drive failed right after I purchased my first iMac. I was due to have some event photography published in a local magazine but missed the deadline due to the failure. I wrote at the time, "I no longer trust technology. It hates me. It's the second time in two years a hdd has died and i've lost my work."

Since then, my backup system has not been much of a system. Up until recently I had my archive across multiple hard drives, with at least two copies of everything. I was also using DVDs up until mid-2009. At the end of 2016, I decided it was time to bring my archive onto a single backup system so I had everything in one, accessible location. This led to the purchase of a two bay RAID enclosure, which I set up in a RAID 1 drive mirroring configuration using two identical 3TB hard drives. I was still using an iMac at the time as my main computer so I decided to set this up in Apple's HFS+ format.

I recently decided to build myself a computer for the first time, which presented its own challenges (I have nightmares about thermal paste). Switching from OS X to Windows 10 is problematic with a HFS+ formatted external hard drive. There are programs available to be able to both read and write to Apple formatted hard drives on Windows computers (such as Paragon HFS+ or Mac Drive), but that is not ideal. I made the decision to copy the archive onto a dedicated hard drive on my new computer before erasing the external hard drive and reformatting it for Windows, which provided the perfect opportunity to assess and organise my archive. I knew there were corrupted images in my archive but hadn't tried to do anything about it until now.

The photo at the beginning of this post is an example of one of the many corrupted Canon CR2 camera raw files I have in my archive from the time of the second hard drive failure. These images were recovered by a family member at the time, but obviously not everything was a success. All of my backup copies appear to have the same corrupted files. I am thankful that I had started shooting in the camera raw file format at the time, because that gives me some options for recovery because most cameras embed a JPEG preview image into a raw file. Depending on the camera model and manufacturer, embedded JPEG files may be full resolution or smaller.

So how do you extract a JPEG from a camera raw file? Utilising a tool I have been using a lot both personally and professionally - ExifTool by Phil Harvey. ExifTool has a lot of great uses, particularly when it comes to digital photographs and metadata. Using the command line tool, it is possible to extract the JPEG image from the raw file and embed all the metadata from the original file (example provided by Harvey here under "copying examples"). Unfortunately for me, the camera model I was using at the time embedded smaller resolution previews so I will never have the full resolution images again, but a low resolution JPEG is better than nothing!

Preview JPEG extracted from corrupted Canon CR2 camera raw file.
Once I have finished extracting all of the JPEGs from my corrupted CR2 files, my next step will be to reformat my external RAID hard drive to Windows-based NTFS and copy everything back across from my computer. Before I do that, I will first make a backup copy of the files currently on my new computer to a regular, portable external hard drive (encrypted with BitLocker). This will become my offsite copy that will be stored at a relatives' house.

Ultimately, I will have three separate devices with a copy of all of my digital files. Technically this will be four copies with the RAID 1 setup. To combat the issue of file corruption I plan to use the BagIt standard to store checksums which can be validated on a regular basis. It is not ideal for working files, though, as any changes to the files will change their checksum.

It is hard to follow digital preservation standards on a personal level because it is not cheap. Good practice includes multiple independent copies that are geographically separated, using different storage technologies and actively monitoring storage to ensure any problems are detected and corrected quickly (Digital Preservation Handbook). I did not even bother looking at the costs involved in cloud storage for my almost 3TB digital archive and I am not sure if there are any consumer equivalents to digital asset and preservation management systems used by collecting institutions. My experience has taught me a lesson, though, and it is important to do the best you can within your means to backup, monitor and organise your digital life. Digital preservation is an ongoing activity. I have previously made the mistake of just putting things on a hard drive/CD/DVD and thinking it was safe. I will not make that same mistake again.