diff --git a/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package-info.java b/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package-info.java deleted file mode 100644 index 645e905f7..000000000 --- a/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package-info.java +++ /dev/null @@ -1,178 +0,0 @@ -/* $This file is distributed under the terms of the license in /doc/license.txt$ */ - -/** - *

- * The code in this package implements the Vitro file-storage system. - *

- * - *

Relationship to PairTree

- * - *

- * The system incorporates a number of ideas from the PairTree specification, - *

- * but is different in several respects: - * - *

- * - *

Directory structure

- * - *

- * A typical structure would look like this: - *

- *   + basedir
- *   |
- *   +--+ file_storage_namespaces.properties
- *   |
- *   +--+ file_storage_root
- *   
- * The file_storage_root directory contains the subdirectories - * that implement the encoded IDs, and the final directory for each ID will - * contain a single file that corresponds to that ID. - *

- * - *

Namespaces

- * - *

- * To reduce the length of the file paths, the system will can be initialized - * to recognize certain sets of characters (loosely termed "namespaces") and - * to replace them with a given prefix and separator character during ID - * encoding. - *

- *

- * For example, the sytem might be initialized with a "namespace" of - * "http://vivo.mydomain.edu/file/". If that is the only namespace, it will - * be internally assigned a prefix of "a", so a URI like this: - *

http://vivo.mydomain.edu/file/n3424/myPhoto.jpg
- * would be converted to this: - *
a~n3424/myPhoto.jpg
- *

- *

- * The namespaces and their assigned prefixes are stored in a properties file - * when the structure is initialized. When the structure is re-opened, the - * file is read to find the correct prefixes. The file - * might look like this: - *

- *     a = http://the.first.namespace/
- *     b = http://the.second.namespace/
- *   
- *

- * - *

ID encoding

- * - *

- * This is a multi-step process: - *

- * Examples: - *
ark:/13030/xt12t3 becomes - * ark/+=1/303/0=x/t12/t3 - *
http://n2t.info/urn:nbn:se:kb:repos-1 becomes - * htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1 - *
what-the-*@?#!^!~? becomes - * wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f - *
http://vivo.myDomain.edu/file/n3424 with namespace - * http://vivo.myDomain.edu/file/ and prefix - * a becomes - * a~n/342/4 - *

- * - *

Filename encoding

- * - *

- * The name of the file is encoded as needed to guard against illegal - * characters for the filesystem, but in practice we expect little encoding - * to be required, since few files are named with the special characters. - *

- * - *

- * The encoding process is the same as the "rare character encoding" and - * "common character encoding" steps used for ID encoding, except that - * periods are not encoded. - *

- */ - -package edu.cornell.mannlib.vitro.webapp.filestorage.backend; diff --git a/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package.html b/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package.html new file mode 100644 index 000000000..ffa218f3c --- /dev/null +++ b/webapp/src/edu/cornell/mannlib/vitro/webapp/filestorage/backend/package.html @@ -0,0 +1,282 @@ + + + +

+The code in this package implements the Vitro file-storage system. +

+ +

Relationship to PairTree

+ +

+The system incorporates a number of ideas from the PairTree specification, +

+but is different in several respects: + +

+ +

Directory structure

+ +

+ A typical structure would look like this: +

+  + basedir
+  |
+  +--+ file_storage_namespaces.properties
+  |
+  +--+ file_storage_root
+  
+ The file_storage_root directory contains the subdirectories + that implement the encoded IDs, and the final directory for each ID will + contain a single file that corresponds to that ID. +

+ +

Namespaces

+ +

+ To reduce the length of the file paths, the system will can be initialized + to recognize certain sets of characters (loosely termed "namespaces") and + to replace them with a given prefix and separator character during ID + encoding. +

+

+ For example, the sytem might be initialized with a "namespace" of + "http://vivo.mydomain.edu/file/". If that is the only namespace, it will + be internally assigned a prefix of "a", so a URI like this: +

http://vivo.mydomain.edu/file/n3424/myPhoto.jpg
+ would be converted to this: +
a~n3424/myPhoto.jpg
+

+

+ The namespaces and their assigned prefixes are stored in a properties file + when the structure is initialized. When the structure is re-opened, the + file is read to find the correct prefixes. The file + might look like this: +

+    a = http://the.first.namespace/
+    b = http://the.second.namespace/
+  
+

+ +

ID encoding

+ +

+ This is a multi-step process: +

+ Examples: +
ark:/13030/xt12t3 becomes + ark/+=1/303/0=x/t12/t3 +
http://n2t.info/urn:nbn:se:kb:repos-1 becomes + htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1 +
what-the-*@?#!^!~? becomes + wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f +
http://vivo.myDomain.edu/file/n3424 with namespace + http://vivo.myDomain.edu/file/ and prefix + a becomes + a~n/342/4 +

+ +

Filename encoding

+ +

+ The name of the file is encoded as needed to guard against illegal + characters for the filesystem, but in practice we expect little encoding + to be required, since few files are named with the special characters. +

+ +

+ The encoding process is the same as the "rare character encoding" and + "common character encoding" steps used for ID encoding, except that + periods are not encoded. +

+ +
+
+
+ +

+ This was summarized in a post to the vivo-dev-all list on 11/29/2010 +

+ +

+ The uploaded image files are identified by a combination of URI and filename. + The URI is used as the principal identifier so we don't need to worry about + collisions if two people each upload an image named "image.jpg". + + The filename is retained so the user can use their browser to download their + image from the system and it will be named as they expect it to be. +

+ +

+ We wanted a way to store thousands of image files so they would not + all be in the same directory. We took our inspiration from the + PairTree + folks, and modified their algorithm to suit our needs. + + The general idea is to store files in a multi-layer directory structure + based on the URI assigned to the file. +

+ +

+ Let's consider a file with this information: +

+		URI = http://vivo.mydomain.edu/individual/n3156
+		Filename = lily1.jpg
+	
+

+ +

+ We want to turn the URI into the directory path, but the URI contains + prohibited characters. Using a PairTree-like character substitution, + we might store it at this path: +

+		/usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg
+	
+

+ +

+ Using that scheme would mean that each file sits in its own directory + under the storage root. At a large institution, there might be hundreds of + thousands of directories under that root. +

+ +

+ By breaking this into PairTree-like groupings, we insure that all files + don't go into the same directory. + + Limiting to 3-character names will insure a maximum of about 30,000 files + per directory. In practice, the number will be considerably smaller. + + So then it would look like this: +

+		/usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg
+	
+

+ +

+ But almost all of our URIs will start with the same namespace, so the + namespace just adds unnecessary and unhelpful depth to the directory tree. + We assign a single-character prefix to that namespace, using the + file_storage_namespaces.properties file in the uploads directory, like this: +

+		a = http://vivo.mydomain.edu/individual/
+	
+ + And our URI now looks like this: +
+		a~n3156
+	
+ + Which translates to: +
+		/usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg
+	
+

+ +

+ So what we hope we have implemented is a system where: +

+

+ +

+ By the way, almost all of this is implemented in +

+		edu.cornell.mannlib.vitro.webapp.filestorage.backend.FileStorageHelper
+	
+ and illustrated in +
+		edu.cornell.mannlib.vitro.webapp.filestorage.backend.FileStorageHelperTest
+	
+