Friday, September 11, 2015

Testing a Permanent Digital Storage Archive – Part 2: OAS and Rosetta

Testing Permanent Digital Storage Archive – Part 2: OAS and Rosetta. Chris Erickson. September 10, 2015.
     The Optical Archive System from Hitachi LG Data Storage fits in a server rack and can contain 10 units, called libraries. Each library unit contains 100TB of data storage on 500 long-term optical discs. More information at Rosetta Users Group 2015: New Sources and Storage Options For Rosetta (slides 13 – 16) or this YouTube video.

Connecting the OAS and Rosetta Systems:
Once the optical archive was installed in our Library, it was then connected to our Rosetta system, which was very easy to do and only took a couple of minutes. In the Rosetta administrative module I created a new File storage group with the OAS path and the storage capacity. The IE and Metadata storage groups were left as they were, directed to our library server. The files in those groups are much smaller and accessed more often than the files. I then added a new storage rule so Rosetta could determine whether to write the files to our library server, to our Amazon storage account, or to the OAS.

Write functionality:
When the data is written to the optical discs a fixity check is done to ensure that the file is 100 % accurate. Once the file is written to the optical disc, the data is permanent.  Even if the system were to go away, the data discs are permanent and could still be read on any Blu-ray device.  I ingested a couple hundred GBs into Rosetta which were then written out to the OAS discs. (Overall I added over 4 TB of data.) We never encountered any difficulties with writing data to the OAS. We did try to disrupt or corrupt the writing process to see if we could get it to fail or to write bad data, but even our systems engineer with root access was unable to affect the data in any way.

Normally our test Rosetta system is configured for only a small number of files, so there is limited processing space, about 45GB. (Our live production Rosetta system has 2 TB of processing space). Because of the limited processing space on the test server, I could not run an unrestricted ingest without filling up the disk space. So I ingested a limited number of items at a time and then also cleared the processing space before ingesting more. The chart below shows the ingest amounts for two of the afternoons when the ingest processes were run-each took about 5 hours. (An unrestricted ingest would likely result in at least four times as many items per day.)


           IEs
         Files
        GBs
8,019
50,856
352.53
6,639
48,736
346.42
 
Read functionality:
This is an optical device, so I did not know if Rosetta would be able to read the discs. And since it is an optical device the OAS has to locate the correct disc and load the disc in a drive to retrieve the data (there are 12 read / write drives for each library). The retrieval process can take up to 90 seconds. Our Rosetta system is used as a dark archive, so the retrieval time was not a problem. The question was whether or not Rosetta would wait while the file was being retrieved or if it would time out. From the first request, the OAS read functionality worked flawlessly. Rosetta worked well with the retrieval / access time while the disc was retrieved and the file read. Once the disc was in the drive, access for any other files on the disc was about as fast as if it were on spinning disc.

Here is a chart of access times for one of the groups that I checked:


Title
   Files
    Item size
     Access time

in Item
              MB
           Min:sec
List of titles of genealogical articles
9
169
1:16
Jackson collection image
2
16
1:20
Jackson collection image
2
8
0:23
John O. Bird children
1
6
1:25
Cardston Alberta Temple
1
5
0:18
Piano
1
11
1:34
F Edwards
1
2
0:14
E O Haymond
1
4
0:09
 Taj Mahal,
2
14
0:27
 Taj Mahal,
2
14
0:09
Millie Gallup
1
5
0:14
History of the Lemen family
9
528
0:17
The Boynton family.
9
405
0:11
Register and almanac
9
537
0:25
The crawfishes of the state
9
214
0:14
Tank
1
6
0:20
Parley D. Thomas
1
4
0:19
Blake family : a genealogical history
9
146
0:10
 
From the access time column it is obvious when a new disc is retrieved, as the time is over 60 seconds. Once the disc has been loaded then the access time for subsequent files is much lower.These access times are for the master files, which can be quite large.

The setup process, writing and reading all went extremely well. The next step was to run an automated fixity check on the OAS files from within Rosetta.
(Updated to clarify and answer questions.)

No comments: