diff --git a/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package-info.java b/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package-info.java deleted file mode 100644 index 645e905f7..000000000 --- a/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package-info.java +++ /dev/null @@ -1,178 +0,0 @@ -/* $This file is distributed under the terms of the license in /doc/license.txt$ */ - -/** - *
- * The code in this package implements the Vitro file-storage system. - *
- * - *- * The system incorporates a number of ideas from the PairTree specification, - *
- * A typical structure would look like this: - *
- * + basedir - * | - * +--+ file_storage_namespaces.properties - * | - * +--+ file_storage_root - *- * The
file_storage_root
directory contains the subdirectories
- * that implement the encoded IDs, and the final directory for each ID will
- * contain a single file that corresponds to that ID.
- *
- *
- * - * To reduce the length of the file paths, the system will can be initialized - * to recognize certain sets of characters (loosely termed "namespaces") and - * to replace them with a given prefix and separator character during ID - * encoding. - *
- *- * For example, the sytem might be initialized with a "namespace" of - * "http://vivo.mydomain.edu/file/". If that is the only namespace, it will - * be internally assigned a prefix of "a", so a URI like this: - *
http://vivo.mydomain.edu/file/n3424/myPhoto.jpg- * would be converted to this: - *
a~n3424/myPhoto.jpg- * - *
- * The namespaces and their assigned prefixes are stored in a properties file - * when the structure is initialized. When the structure is re-opened, the - * file is read to find the correct prefixes. The file - * might look like this: - *
- * a = http://the.first.namespace/ - * b = http://the.second.namespace/ - *- * - * - *
- * This is a multi-step process: - *
" * + , < = > ? ^ | \ ~- * The hexadecimal encoding consists of a caret followed by 2 hex digits, - * e.g.: ^7C - *
ark:/13030/xt12t3
becomes
- * ark/+=1/303/0=x/t12/t3
- * http://n2t.info/urn:nbn:se:kb:repos-1
becomes
- * htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1
- * what-the-*@?#!^!~?
becomes
- * wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f
- * http://vivo.myDomain.edu/file/n3424
with namespace
- * http://vivo.myDomain.edu/file/
and prefix
- * a
becomes
- * a~n/342/4
- *
- *
- * - * The name of the file is encoded as needed to guard against illegal - * characters for the filesystem, but in practice we expect little encoding - * to be required, since few files are named with the special characters. - *
- * - *- * The encoding process is the same as the "rare character encoding" and - * "common character encoding" steps used for ID encoding, except that - * periods are not encoded. - *
- */ - -package edu.cornell.mannlib.vitro.webapp.filestorage.backend; diff --git a/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package.html b/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package.html new file mode 100644 index 000000000..ffa218f3c --- /dev/null +++ b/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package.html @@ -0,0 +1,282 @@ + + + ++The code in this package implements the Vitro file-storage system. +
+ ++The system incorporates a number of ideas from the PairTree specification, +
+ A typical structure would look like this: +
+ + basedir + | + +--+ file_storage_namespaces.properties + | + +--+ file_storage_root ++ The
file_storage_root
directory contains the subdirectories
+ that implement the encoded IDs, and the final directory for each ID will
+ contain a single file that corresponds to that ID.
+
+
++ To reduce the length of the file paths, the system will can be initialized + to recognize certain sets of characters (loosely termed "namespaces") and + to replace them with a given prefix and separator character during ID + encoding. +
++ For example, the sytem might be initialized with a "namespace" of + "http://vivo.mydomain.edu/file/". If that is the only namespace, it will + be internally assigned a prefix of "a", so a URI like this: +
http://vivo.mydomain.edu/file/n3424/myPhoto.jpg+ would be converted to this: +
a~n3424/myPhoto.jpg+ +
+ The namespaces and their assigned prefixes are stored in a properties file + when the structure is initialized. When the structure is re-opened, the + file is read to find the correct prefixes. The file + might look like this: +
+ a = http://the.first.namespace/ + b = http://the.second.namespace/ ++ + +
+ This is a multi-step process: +
" * + , < = > ? ^ | \ ~+ The hexadecimal encoding consists of a caret followed by 2 hex digits, + e.g.: ^7C +
ark:/13030/xt12t3
becomes
+ ark/+=1/303/0=x/t12/t3
+ http://n2t.info/urn:nbn:se:kb:repos-1
becomes
+ htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1
+ what-the-*@?#!^!~?
becomes
+ wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f
+ http://vivo.myDomain.edu/file/n3424
with namespace
+ http://vivo.myDomain.edu/file/
and prefix
+ a
becomes
+ a~n/342/4
+
+
++ The name of the file is encoded as needed to guard against illegal + characters for the filesystem, but in practice we expect little encoding + to be required, since few files are named with the special characters. +
+ ++ The encoding process is the same as the "rare character encoding" and + "common character encoding" steps used for ID encoding, except that + periods are not encoded. +
+ ++ The uploaded image files are identified by a combination of URI and filename. + The URI is used as the principal identifier so we don't need to worry about + collisions if two people each upload an image named "image.jpg". + + The filename is retained so the user can use their browser to download their + image from the system and it will be named as they expect it to be. +
+ ++ We wanted a way to store thousands of image files so they would not + all be in the same directory. We took our inspiration from the + PairTree + folks, and modified their algorithm to suit our needs. + + The general idea is to store files in a multi-layer directory structure + based on the URI assigned to the file. +
+ ++ Let's consider a file with this information: +
+ URI = http://vivo.mydomain.edu/individual/n3156 + Filename = lily1.jpg ++ + +
+ We want to turn the URI into the directory path, but the URI contains + prohibited characters. Using a PairTree-like character substitution, + we might store it at this path: +
+ /usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg ++ + +
+ Using that scheme would mean that each file sits in its own directory + under the storage root. At a large institution, there might be hundreds of + thousands of directories under that root. +
+ ++ By breaking this into PairTree-like groupings, we insure that all files + don't go into the same directory. + + Limiting to 3-character names will insure a maximum of about 30,000 files + per directory. In practice, the number will be considerably smaller. + + So then it would look like this: +
+ /usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg ++ + +
+ But almost all of our URIs will start with the same namespace, so the + namespace just adds unnecessary and unhelpful depth to the directory tree. + We assign a single-character prefix to that namespace, using the + file_storage_namespaces.properties file in the uploads directory, like this: +
+ a = http://vivo.mydomain.edu/individual/ ++ + And our URI now looks like this: +
+ a~n3156 ++ + Which translates to: +
+ /usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg ++ + +
+ So what we hope we have implemented is a system where: +
+ By the way, almost all of this is implemented in +
+ edu.cornell.mannlib.vitro.webapp.filestorage.backend.FileStorageHelper ++ and illustrated in +
+ edu.cornell.mannlib.vitro.webapp.filestorage.backend.FileStorageHelperTest ++