From abac59cd184e4151cc0b28095ba0f86aae92ede3 Mon Sep 17 00:00:00 2001
From: Graham Triggs
+ * The code in this package implements the Vitro file-storage system.
+ *
+ * The system incorporates a number of ideas from the PairTree specification,
+ * Relationship to PairTree
+ *
+ *
+ * but is different in several respects:
+ *
+ *
+ *
+ * A typical structure would look like this: + *
+ * + basedir + * | + * +--+ file_storage_namespaces.properties + * | + * +--+ file_storage_root + *+ * The
file_storage_root
directory contains the subdirectories
+ * that implement the encoded IDs, and the final directory for each ID will
+ * contain a single file that corresponds to that ID.
+ *
+ * + * To reduce the length of the file paths, the system will can be initialized + * to recognize certain sets of characters (loosely termed "namespaces") and + * to replace them with a given prefix and separator character during ID + * encoding. + *
+ *+ * For example, the sytem might be initialized with a "namespace" of + * "http://vivo.mydomain.edu/file/". If that is the only namespace, it will + * be internally assigned a prefix of "a", so a URI like this: + *
http://vivo.mydomain.edu/file/n3424/myPhoto.jpg+ * would be converted to this: + *
a~n3424/myPhoto.jpg+ * + *
+ * The namespaces and their assigned prefixes are stored in a properties file + * when the structure is initialized. When the structure is re-opened, the + * file is read to find the correct prefixes. The file + * might look like this: + *
+ * a = http://the.first.namespace/ + * b = http://the.second.namespace/ + *+ * + *
+ * This is a multi-step process: + *
" * + , < = > ? ^ | \ ~+ * The hexadecimal encoding consists of a caret followed by 2 hex digits, + * e.g.: ^7C + *
ark:/13030/xt12t3
becomes
+ * ark/+=1/303/0=x/t12/t3
+ * http://n2t.info/urn:nbn:se:kb:repos-1
becomes
+ * htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1
+ * what-the-*@?#!^!~?
becomes
+ * wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f
+ * http://vivo.myDomain.edu/file/n3424
with namespace
+ * http://vivo.myDomain.edu/file/
and prefix
+ * a
becomes
+ * a~n/342/4
+ *
+ * + * The name of the file is encoded as needed to guard against illegal + * characters for the filesystem, but in practice we expect little encoding + * to be required, since few files are named with the special characters. + *
+ *+ * The encoding process is the same as the "rare character encoding" and + * "common character encoding" steps used for ID encoding, except that + * periods are not encoded. + *
+ *+ * The uploaded image files are identified by a combination of URI and filename. + * The URI is used as the principal identifier so we don't need to worry about + * collisions if two people each upload an image named "image.jpg". + * + * The filename is retained so the user can use their browser to download their + * image from the system and it will be named as they expect it to be. + *
+ *+ * We wanted a way to store thousands of image files so they would not + * all be in the same directory. We took our inspiration from the + * PairTree + * folks, and modified their algorithm to suit our needs. + * + * The general idea is to store files in a multi-layer directory structure + * based on the URI assigned to the file. + *
+ *+ * Let's consider a file with this information: + *
+ * URI = http://vivo.mydomain.edu/individual/n3156 + * Filename = lily1.jpg + *+ * + *
+ * We want to turn the URI into the directory path, but the URI contains + * prohibited characters. Using a PairTree-like character substitution, + * we might store it at this path: + *
+ * /usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg + *+ * + *
+ * Using that scheme would mean that each file sits in its own directory + * under the storage root. At a large institution, there might be hundreds of + * thousands of directories under that root. + *
+ *+ * By breaking this into PairTree-like groupings, we insure that all files + * don't go into the same directory. + * + * Limiting to 3-character names will insure a maximum of about 30,000 files + * per directory. In practice, the number will be considerably smaller. + * + * So then it would look like this: + *
+ * /usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg + *+ * + *
+ * But almost all of our URIs will start with the same namespace, so the + * namespace just adds unnecessary and unhelpful depth to the directory tree. + * We assign a single-character prefix to that namespace, using the + * file_storage_namespaces.properties file in the uploads directory, like this: + *
+ * a = http://vivo.mydomain.edu/individual/ + *+ * And our URI now looks like this: + *
+ * a~n3156 + *+ * Which translates to: + *
+ * /usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg + *+ * + *
+ * So what we hope we have implemented is a system where: + *
+ * By the way, almost all of this is implemented in + *
+ * edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelper + *+ * and illustrated in + *
+ * edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelperTest + *+ * + */ +package edu.cornell.mannlib.vitro.webapp.filestorage.impl; \ No newline at end of file diff --git a/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package.html b/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package.html deleted file mode 100644 index 56cddec11..000000000 --- a/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package.html +++ /dev/null @@ -1,282 +0,0 @@ - - - -
-The code in this package implements the Vitro file-storage system. -
- --The system incorporates a number of ideas from the PairTree specification, -
- A typical structure would look like this: -
- + basedir - | - +--+ file_storage_namespaces.properties - | - +--+ file_storage_root -- The
file_storage_root
directory contains the subdirectories
- that implement the encoded IDs, and the final directory for each ID will
- contain a single file that corresponds to that ID.
-
-
-- To reduce the length of the file paths, the system will can be initialized - to recognize certain sets of characters (loosely termed "namespaces") and - to replace them with a given prefix and separator character during ID - encoding. -
-- For example, the sytem might be initialized with a "namespace" of - "http://vivo.mydomain.edu/file/". If that is the only namespace, it will - be internally assigned a prefix of "a", so a URI like this: -
http://vivo.mydomain.edu/file/n3424/myPhoto.jpg- would be converted to this: -
a~n3424/myPhoto.jpg- -
- The namespaces and their assigned prefixes are stored in a properties file - when the structure is initialized. When the structure is re-opened, the - file is read to find the correct prefixes. The file - might look like this: -
- a = http://the.first.namespace/ - b = http://the.second.namespace/ -- - -
- This is a multi-step process: -
" * + , < = > ? ^ | \ ~- The hexadecimal encoding consists of a caret followed by 2 hex digits, - e.g.: ^7C -
ark:/13030/xt12t3
becomes
- ark/+=1/303/0=x/t12/t3
- http://n2t.info/urn:nbn:se:kb:repos-1
becomes
- htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1
- what-the-*@?#!^!~?
becomes
- wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f
- http://vivo.myDomain.edu/file/n3424
with namespace
- http://vivo.myDomain.edu/file/
and prefix
- a
becomes
- a~n/342/4
-
-
-- The name of the file is encoded as needed to guard against illegal - characters for the filesystem, but in practice we expect little encoding - to be required, since few files are named with the special characters. -
- -- The encoding process is the same as the "rare character encoding" and - "common character encoding" steps used for ID encoding, except that - periods are not encoded. -
- -- The uploaded image files are identified by a combination of URI and filename. - The URI is used as the principal identifier so we don't need to worry about - collisions if two people each upload an image named "image.jpg". - - The filename is retained so the user can use their browser to download their - image from the system and it will be named as they expect it to be. -
- -- We wanted a way to store thousands of image files so they would not - all be in the same directory. We took our inspiration from the - PairTree - folks, and modified their algorithm to suit our needs. - - The general idea is to store files in a multi-layer directory structure - based on the URI assigned to the file. -
- -- Let's consider a file with this information: -
- URI = http://vivo.mydomain.edu/individual/n3156 - Filename = lily1.jpg -- - -
- We want to turn the URI into the directory path, but the URI contains - prohibited characters. Using a PairTree-like character substitution, - we might store it at this path: -
- /usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg -- - -
- Using that scheme would mean that each file sits in its own directory - under the storage root. At a large institution, there might be hundreds of - thousands of directories under that root. -
- -- By breaking this into PairTree-like groupings, we insure that all files - don't go into the same directory. - - Limiting to 3-character names will insure a maximum of about 30,000 files - per directory. In practice, the number will be considerably smaller. - - So then it would look like this: -
- /usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg -- - -
- But almost all of our URIs will start with the same namespace, so the - namespace just adds unnecessary and unhelpful depth to the directory tree. - We assign a single-character prefix to that namespace, using the - file_storage_namespaces.properties file in the uploads directory, like this: -
- a = http://vivo.mydomain.edu/individual/ -- - And our URI now looks like this: -
- a~n3156 -- - Which translates to: -
- /usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg -- - -
- So what we hope we have implemented is a system where: -
- By the way, almost all of this is implemented in -
- edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelper -- and illustrated in -
- edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelperTest -- diff --git a/pom.xml b/pom.xml index e505234fa..a88b25586 100644 --- a/pom.xml +++ b/pom.xml @@ -10,13 +10,35 @@