The following notes and information came out of Peer Forum I: Disk Imaging, a discussion that took place at MoMA on December 7 and 8, 2017. The following represent the views of the speakers and participants who attended this meeting.
Policy and Procedures
Disk Image Acquisition (Compiled by Group 1)
Unpack - remove drive or not - evaluate hardware
Use write-blocker and disk image, OR
Linux boot USB, OR
Boot in target mode and disk image
Raw and forensic acquisition tools (choose from: FTK Imager GUI or CLI, Guymager, dd, etc). Whether you do raw, forensic, or both depends on your context.
Troubleshooting: if a physical image acquisition fails, try to acquire a logical image instead of physical, see if that works (see condition checking, Group 3 Activity, for ways to test); try ddrescue, last resort is DriveSavers (significant decision - high cost). Do you skip straight to DriveSavers for high-value drives showing signs of failure - depends of the value of the contents, availability of copies, etc.
Physical check first (good for all media). Unpack, photograph labels, activate write-protect on disks (just to be 100% safe), use Kryoflux hardware/software, 3.5/4.25” drives, board and software with license. Kryoflux has built-in write-blocker that is set by the position of a jumper so it is hard to bypass.
Flux transition file .img raw file, as well as tools to identify disk type, file system, etc. to add to collection management system.
DVD and Blu-ray discs, for the ones without copy protection (.iso image with dd, isobuster, dvdisaster, ddrescue for bad sectors) - try in different drives, check firmware in your drive to ensure it works properly. Make sure your tool works for Blu-ray. Explore use of Carbon Copy Cloner. .iso simplifies access - most video players can play it (VLC). FTK Imager only gives option to acquire .iso from optical. Not appropriate to use Guymager - makes the wrong image.
As of iOS11 no physical images of non-jail-broken devices.
Black Bag Tech, Macquisition, Black Light - go-to for iOS - expensive and fast-changing ($1,500-3,000 for extracting artifacts - have to have a real need). Tools stopped working with iOS11. May evolve.
Documentation (Compiled by Group 2)
Source (medium hardware, serial numbers, make/model, OS, environment - how best to capture is under debate)
Creation of the disk image (which software tools, what version, what interfaces, write-blocker, peripherals, data connections, author name, date, outline the purpose of the disk image to inform the status within the collection) - also document what were bad sectors or detail any issues in the imaging process
File name, as a component of the work (define your destination element that you are creating - status, component number, relation to artwork, include the source this disk image came from, point to related objects)
Disk image itself (technical metadata, FTK report for example - sectors, checksum, file size, etc. automatically generated, logical vs physical, dependencies of image)
Content of the image itself (Fiwalk report details every file in the disk image, characteristics, checksums for every file, automatically generates all this data - add to repository or documentation - Fiwalk runs under the hood of BitCurator and is more human-readable listing file types, etc) - correlate any damaged sectors to what files are affected
Tools for documentation: RegRipper (Windows), OSXRipper (Mac) - pulls report on everything installed, peripherals, etc - document the state of the computer - helps you review what you might want to preserve.
Digital Forensics XML (DFXML) - many tools use, export, and ingest that, but it’s specific to certain metadata and filesystems. You may have to transpose this data into different formats (human-readable, machine-readable, etc). Information about the physical computer can be captured by transcribing from the machine itself, as well as machine self-reporting like File System report. DFXML working group and schema (style-sheet for XML) can be leveraged.
Condition Assessment (Compiled by Group 3)
What to check:
Validity as well as usability
Primary cases under discussion during this activity:
Optical media and disk drives
Raw image acquisition: Before and after checksum manually.
Forensic image acquisition: In the acquisition process, there are 3 checksums: E01 header containing the raw image checksum, the raw image inside the E01, and the checksum of the raw contents of the original drive. Checksum validation of a forensic image is often automatic in the acquisition tool. Set the tool to do a checksum on the original drive that’s still connected to compare to the E01, not just compare the checksum in the E01 header to the raw file checksum inside the E01. Or it can re-check the raw image in the E01 against the original.
Optical image type: Logical. (Write-once has no deleted files, etc.) Bin/cue or .iso. Bin/cue can capture more than one file system. ISO as disk image type can’t. Bin/cue depends on what computer you’re reading the image from. Windows system may not read HFS.
For optical images, use a tool that can recognize all the optical file systems (including HFS, UTF, ISO) - such as Isolyzer.
Troubleshooting optical image acquisition: Try DVDisaster on failing optical media. Also try different drives - they read differently - correct/hide/repair optical defects differently. Plextor is the standard for CD-ROM playback and can read error rates off the drive directly to see if there are problems with a sector - reports, not just skips, damaged sectors/errors. Acquire image more than once and see if checksums agree - they may not, depending on the condition of the disc and the drive used (error correction of disk means checksum is not repeatable). Checksums before/after may not be valuable - more useful to checksum key files themselves than of the whole disc.
Know the source file system and make sure your OS can read it. The label on the disk is key for old floppies, for example. In one example, floppies thought to be damaged were actually MFS - needed to go back to OS7 to recognize the file system.
Put image onto a new drive and run it in the original system - does it work in context of itself, THEN open that system in a different computer and see if it works - are there dependencies you didn’t think of that become apparent?
Run a clone next to original system.
Verify through mounting that you created an image with a valid file system, that you can view files when it’s mounted, and can extract files from it. If you can see the directory structure, that’s a good sign. See if by mounting it you can recognize the file system - not all file systems open on all computers. BitCurator is able to browse HFS and other file systems. Just because you can’t mount it doesn’t mean it’s not valid. Browse first, as a first check, and try emulation later. See if you can view file contents. This won’t tell you if the system will run.
There may be dependencies in the environment, such as a script that is set to auto-run and if you don’t have that exact memory state, it might not relaunch the same way. Document that state from exhibition copy computer, for instance. In EaS (emulation as a service) you can pause the environment, and relaunch it in that exact state.
Do qualitative comparisons after bit-checking and checksumming - does it work in the way the artist intended and the way we documented it upon acquisition
When to do usability testing:
Better to do now. Don’t wait. Better to know now than when you want to exhibit it. Adds to the acquisition workflow. Especially with vulnerable optical media. Test performability / operability.
Did not touch on for example the microcontrollers, media players and removable media. Significant hardware dependencies of some of those. The technology exists to emulate Raspberry Pi - if you have an image of the SD card, you can run in an emulator. Limitations to the data you can retrieve from some microcontrollers that contain binary files. You can dump compiled binary hex file from Arduino (AVRdude / AVRstudio is the compiler) - move that binary hex file to another microcontroller (one in that same “family” of devices). Better to get source code at acquisition.
Uses and Access (Compiled by Group 4)
Uses for disk images besides preservation:
Exhibition in the gallery (Eaas, MAME, emulators)
Online exhibition (Eaas)
Internal access/appraisal/curator access (through emulation, web-based is easiest such as MoMA/SFMOMA Susan Kare example)
Loans, packaging, etc. - to send complex software-based packages around, as long as the target machine is well-known (such as exhibition replacements - great for emergencies - great for BrightSign and media player/microcontroller SD cards you can just make a new one by putting the clone on a new card - don’t need to fully structure or understand it - just deploy the package, easy and fast)
Archiving a server environment - Rackspace (a cloud computing vendor) allows users to create a disk image of a virtual server environment. A server image was created for a web-based artwork, to capture the specific state and configuration of the virtual server. FTK can read this type of image (VHD file, virtual hard drive) - other formats may be available depending on virtualization options
Documentation of processes (when artwork is changed or re-created from scratch, to test whether the documentation works - non-invasive - can go back to prior versions - VMWare, QEMU (managing QCOW files yourself) - EaaS provides it too, almost like VMWare - can also generate exhibition copies with these trees without affecting original, although storage is a concern)
Virtual Toolkit - Perform file format conversions or other types of file manipulation in legacy systems (Final Cut to Premiere, or move from an old codec to a newer one, migrate a file, such as Photoshop, from an older to a newer version)
Output from legacy systems - Eaas to print and make PDFs out of emulated systems
Take images of legacy systems/software (software you don’t want to lose, gallery/archival display, legacy uses)