diff --git a/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package-info.java b/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package-info.java new file mode 100644 index 000000000..386f40f4d --- /dev/null +++ b/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package-info.java @@ -0,0 +1,259 @@ +/** + * + *

+ * The code in this package implements the Vitro file-storage system. + *

+ *

Relationship to PairTree

+ *

+ * The system incorporates a number of ideas from the PairTree specification, + *

+ * but is different in several respects: + * + *

+ *

Directory structure

+ *

+ * A typical structure would look like this: + *

+ * + basedir
+ * |
+ * +--+ file_storage_namespaces.properties
+ * |
+ * +--+ file_storage_root
+ * 
+ * The file_storage_root directory contains the subdirectories + * that implement the encoded IDs, and the final directory for each ID will + * contain a single file that corresponds to that ID. + *

+ *

Namespaces

+ *

+ * To reduce the length of the file paths, the system will can be initialized + * to recognize certain sets of characters (loosely termed "namespaces") and + * to replace them with a given prefix and separator character during ID + * encoding. + *

+ *

+ * For example, the sytem might be initialized with a "namespace" of + * "http://vivo.mydomain.edu/file/". If that is the only namespace, it will + * be internally assigned a prefix of "a", so a URI like this: + *

http://vivo.mydomain.edu/file/n3424/myPhoto.jpg
+ * would be converted to this: + *
a~n3424/myPhoto.jpg
+ *

+ *

+ * The namespaces and their assigned prefixes are stored in a properties file + * when the structure is initialized. When the structure is re-opened, the + * file is read to find the correct prefixes. The file + * might look like this: + *

+ * a = http://the.first.namespace/
+ * b = http://the.second.namespace/
+ * 
+ *

+ *

ID encoding

+ *

+ * This is a multi-step process: + *

+ * Examples: + *
ark:/13030/xt12t3 becomes + * ark/+=1/303/0=x/t12/t3 + *
http://n2t.info/urn:nbn:se:kb:repos-1 becomes + * htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1 + *
what-the-*@?#!^!~? becomes + * wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f + *
http://vivo.myDomain.edu/file/n3424 with namespace + * http://vivo.myDomain.edu/file/ and prefix + * a becomes + * a~n/342/4 + *

+ *

Filename encoding

+ *

+ * The name of the file is encoded as needed to guard against illegal + * characters for the filesystem, but in practice we expect little encoding + * to be required, since few files are named with the special characters. + *

+ *

+ * The encoding process is the same as the "rare character encoding" and + * "common character encoding" steps used for ID encoding, except that + * periods are not encoded. + *

+ *
+ *
+ *
+ *

+ * This was summarized in a post to the vivo-dev-all list on 11/29/2010 + *

+ *

+ * The uploaded image files are identified by a combination of URI and filename. + * The URI is used as the principal identifier so we don't need to worry about + * collisions if two people each upload an image named "image.jpg". + * + * The filename is retained so the user can use their browser to download their + * image from the system and it will be named as they expect it to be. + *

+ *

+ * We wanted a way to store thousands of image files so they would not + * all be in the same directory. We took our inspiration from the + * PairTree + * folks, and modified their algorithm to suit our needs. + * + * The general idea is to store files in a multi-layer directory structure + * based on the URI assigned to the file. + *

+ *

+ * Let's consider a file with this information: + *

+ * 		URI = http://vivo.mydomain.edu/individual/n3156
+ * 		Filename = lily1.jpg
+ * 	
+ *

+ *

+ * We want to turn the URI into the directory path, but the URI contains + * prohibited characters. Using a PairTree-like character substitution, + * we might store it at this path: + *

+ * 		/usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg
+ * 	
+ *

+ *

+ * Using that scheme would mean that each file sits in its own directory + * under the storage root. At a large institution, there might be hundreds of + * thousands of directories under that root. + *

+ *

+ * By breaking this into PairTree-like groupings, we insure that all files + * don't go into the same directory. + * + * Limiting to 3-character names will insure a maximum of about 30,000 files + * per directory. In practice, the number will be considerably smaller. + * + * So then it would look like this: + *

+ * 		/usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg
+ * 	
+ *

+ *

+ * But almost all of our URIs will start with the same namespace, so the + * namespace just adds unnecessary and unhelpful depth to the directory tree. + * We assign a single-character prefix to that namespace, using the + * file_storage_namespaces.properties file in the uploads directory, like this: + *

+ * 		a = http://vivo.mydomain.edu/individual/
+ * 	
+ * And our URI now looks like this: + *
+ * 		a~n3156
+ * 	
+ * Which translates to: + *
+ * 		/usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg
+ * 	
+ *

+ *

+ * So what we hope we have implemented is a system where: + *

+ *

+ *

+ * By the way, almost all of this is implemented in + *

+ * 		edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelper
+ * 	
+ * and illustrated in + *
+ * 		edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelperTest
+ * 	
+ *

+ */ +package edu.cornell.mannlib.vitro.webapp.filestorage.impl; \ No newline at end of file diff --git a/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package.html b/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package.html deleted file mode 100644 index 56cddec11..000000000 --- a/api/src/main/java/edu/cornell/mannlib/vitro/webapp/filestorage/impl/package.html +++ /dev/null @@ -1,282 +0,0 @@ - - - -

-The code in this package implements the Vitro file-storage system. -

- -

Relationship to PairTree

- -

-The system incorporates a number of ideas from the PairTree specification, -

-but is different in several respects: - -

- -

Directory structure

- -

- A typical structure would look like this: -

-  + basedir
-  |
-  +--+ file_storage_namespaces.properties
-  |
-  +--+ file_storage_root
-  
- The file_storage_root directory contains the subdirectories - that implement the encoded IDs, and the final directory for each ID will - contain a single file that corresponds to that ID. -

- -

Namespaces

- -

- To reduce the length of the file paths, the system will can be initialized - to recognize certain sets of characters (loosely termed "namespaces") and - to replace them with a given prefix and separator character during ID - encoding. -

-

- For example, the sytem might be initialized with a "namespace" of - "http://vivo.mydomain.edu/file/". If that is the only namespace, it will - be internally assigned a prefix of "a", so a URI like this: -

http://vivo.mydomain.edu/file/n3424/myPhoto.jpg
- would be converted to this: -
a~n3424/myPhoto.jpg
-

-

- The namespaces and their assigned prefixes are stored in a properties file - when the structure is initialized. When the structure is re-opened, the - file is read to find the correct prefixes. The file - might look like this: -

-    a = http://the.first.namespace/
-    b = http://the.second.namespace/
-  
-

- -

ID encoding

- -

- This is a multi-step process: -

- Examples: -
ark:/13030/xt12t3 becomes - ark/+=1/303/0=x/t12/t3 -
http://n2t.info/urn:nbn:se:kb:repos-1 becomes - htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1 -
what-the-*@?#!^!~? becomes - wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f -
http://vivo.myDomain.edu/file/n3424 with namespace - http://vivo.myDomain.edu/file/ and prefix - a becomes - a~n/342/4 -

- -

Filename encoding

- -

- The name of the file is encoded as needed to guard against illegal - characters for the filesystem, but in practice we expect little encoding - to be required, since few files are named with the special characters. -

- -

- The encoding process is the same as the "rare character encoding" and - "common character encoding" steps used for ID encoding, except that - periods are not encoded. -

- -
-
-
- -

- This was summarized in a post to the vivo-dev-all list on 11/29/2010 -

- -

- The uploaded image files are identified by a combination of URI and filename. - The URI is used as the principal identifier so we don't need to worry about - collisions if two people each upload an image named "image.jpg". - - The filename is retained so the user can use their browser to download their - image from the system and it will be named as they expect it to be. -

- -

- We wanted a way to store thousands of image files so they would not - all be in the same directory. We took our inspiration from the - PairTree - folks, and modified their algorithm to suit our needs. - - The general idea is to store files in a multi-layer directory structure - based on the URI assigned to the file. -

- -

- Let's consider a file with this information: -

-		URI = http://vivo.mydomain.edu/individual/n3156
-		Filename = lily1.jpg
-	
-

- -

- We want to turn the URI into the directory path, but the URI contains - prohibited characters. Using a PairTree-like character substitution, - we might store it at this path: -

-		/usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg
-	
-

- -

- Using that scheme would mean that each file sits in its own directory - under the storage root. At a large institution, there might be hundreds of - thousands of directories under that root. -

- -

- By breaking this into PairTree-like groupings, we insure that all files - don't go into the same directory. - - Limiting to 3-character names will insure a maximum of about 30,000 files - per directory. In practice, the number will be considerably smaller. - - So then it would look like this: -

-		/usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg
-	
-

- -

- But almost all of our URIs will start with the same namespace, so the - namespace just adds unnecessary and unhelpful depth to the directory tree. - We assign a single-character prefix to that namespace, using the - file_storage_namespaces.properties file in the uploads directory, like this: -

-		a = http://vivo.mydomain.edu/individual/
-	
- - And our URI now looks like this: -
-		a~n3156
-	
- - Which translates to: -
-		/usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg
-	
-

- -

- So what we hope we have implemented is a system where: -

-

- -

- By the way, almost all of this is implemented in -

-		edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelper
-	
- and illustrated in -
-		edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelperTest
-	
-

diff --git a/pom.xml b/pom.xml index e505234fa..a88b25586 100644 --- a/pom.xml +++ b/pom.xml @@ -10,13 +10,35 @@ 1.9.0-SNAPSHOT pom + Vitro + Vitro semantic web application project + http://vivoweb.org/ + + + + BSD 3-Clause License + https://raw.github.com/vivo-project/Vitro/develop/LICENSE + repo + + + + + + scm:git:git@github.com:vivo-project/Vitro.git + scm:git:git@github.com:vivo-project/Vitro.git + git@github.com:vivo-project/Vitro.git + HEAD + + yyyy-MM-dd HH:mm:ss ${maven.build.timestamp} - Vitro - api dependencies @@ -29,6 +51,53 @@ vitro-dev + + release + + + + org.apache.maven.plugins + maven-source-plugin + 2.2.1 + + + attach-sources + + jar-no-fork + + + + + + org.apache.maven.plugins + maven-javadoc-plugin + 2.9.1 + + + attach-javadocs + + jar + + + + + true + + + + org.sonatype.plugins + nexus-staging-maven-plugin + 1.6.3 + true + + ossrh + https://oss.sonatype.org/ + true + + + + + @@ -71,15 +140,10 @@ - - - scm:git:git@github.com:vivo-project/Vitro.git - scm:git:git@github.com:vivo-project/Vitro.git - git@github.com:vivo-project/Vitro.git - HEAD - + + + ossrh + https://oss.sonatype.org/content/repositories/snapshots + +