How to Write to Centera Storage Appliances: Introduction
This article is part of a series of articles that I am writing to illustrate the use of the EMC Centera SDK and the .NET wrapper being developed as an open source project to store "fixed content" on the EMC Centera storage appliance. Before I start, I'd like to explain what "fixed content" is and give an overview of the reasoning behind the emergence of this type of storage.
Fixed Content Definition
Fixed content is information that never changes after creation. It's actively referenced, typically shared among users and must be retained (maintaining a copy of fixed content for a mandatory period of time) for a long period of time. Examples include: electronic documents, presentations and e-books; rich media such as movies, videos, digital photographs and audio files; check images and financial statements; bioinformatics, X-rays, MRIs and CAT scans; CAD/CAM diagrams and blueprints and e-mail messages.
Examples of "Fixed Content"
- An average enterprise (a 250-person organization) generates approximately 1.5 TB of e-mails per year.
- A picture archive in a large hospital may generate more than 5 TB per year in digital X-rays or MRIs.
- Banks are scanning millions of check images per year, requiring multiple terabytes of storage.
State of the Industry
A large portion of all digital information is fixed content. It is estimated that fixed content will be the largest portion of digital content created by the human race in the next century, exceeding all dynamic content put together.
Also, the information life cycle drives towards more fixed content. Enterprises embracing things like e-mail and electronic documents are increasing the need for fixed content storage exponentially. Finally, emerging regulations requiring retention (maintaining a copy of fixed content for a mandatory period of time) in the financial and healthcare industries are creating a huge need for fixed content storage and fixed content solutions.
The EMC Centera appliance is one of the appliances that are available in the market today to satisfy that need. Other companies like NETApp have solutions equivalent to Centera's, but this series of articles is specific to showing how to code using the Centera SDK.
What You Will Need to be Able to Develop Against the Appliance
To start writing content to the Centera appliance, you will need to have the Centera SDK. You will need to register on the EMC site to download the SDK. There are a number of versions of the SDK available for download. Use the 3.1SP1 version. Click here to download the SDK. Note that the only way to save content on most "fixed content" storage devices is through the use of the device propriety API(s) that the manufacturer of the device publishes. Some manufactures do offer an open standard (CIFS, NFS, HTTP and WebDAV interfaces) to read/write to their own devices. Usually, however, you end up losing a lot of the device power. Things like WORM (write-once-read-many) functionality or retention capabilities are usually lost with the open standards.
You will also need the .NET wrapper for the Centera SDK. The latest version of the open source .NET project is on SourceForge here.
You need to have access to the "Public Centera" appliances. EMC recognized that the Centera device is not available everywhere and so set up an appliance on the internet that developers can develop against. The content of this appliance is purged periodically by EMC. The latest IP(s) can be found on EMC site. As of this writing, the valid IP(s) are:
- EMEA1 - 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206
- EMEA2 - 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
- EMEA3 - 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168
- EMEA4 - 22.214.171.124, 126.96.36.199
- EMEA5 - 188.8.131.52, 184.108.40.206
- US1 - 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
- US2 - 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168
- US3 - 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206
- US4 - 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
- US5 - 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168
Special Architecture Knowledge You Need
The Centera appliance stores content. This content is stored using an address. This content/address combination is called CAS or "content addressable storage." So, you will hear/read about this term in the industry these days.
The smallest block of data that can be stored must be housed inside a memory block that the SDK calls "C-Clip." In other words, you have to create a C-Clip and place your content inside the C-Clip first. Then you send the C-Clip to the Centera to be saved. The C-Clip itself is made of 2 other components, the Content Descriptor File -- or CDF for short -- and the BLOB.
The Content Descriptor File is an XML file that holds metadata. The CDF contains TAGS and ATTRIBUTES.
- An XML tag in the CDF
- A user-defined name
- An XML attribute in the CDF
- A user-defined value
<My_App name= "ImageStoreServer"/>
The C-Clip also holds a BLOB. The BLOB is usually the content you want to store. BLOBs have the following characteristics:
- They hold objects stored on Centera
- They are represented as distinct bit sequences of the objects you are trying to store
Centera runs an OS called "CenteraStar." This OS is optimized for writing and reading the C-Clip objects.
Centera objects have metadata. The applications you develop create metadata associated with one or more objects. These objects are stored independently of volume/directory information, as in the image below:
Overall process overview:
Centera: Three Modes
Centera acts like a standard magnetic storage. An object marked for deletion is deleted immediately.
Active retention protection ensures availability of objects for a configurable period of time. An object marked for deletion is not deleted until the retention period passes.
Compliance Plus Mode
Similar to compliance mode, compliance plus mode uses retention periods. The default retention period is infinite. Unlike compliance mode, data never purges.
Benefits of the Compliance Modes
- Retention enforcement
- Retention is set on the clip; applies to all BLOBs that are referenced by the clip
- Cannot delete a clip/BLOB when retention has not expired
- Once retention expires, clip is eligible for deletion
- Data deletion enhancements: shredding
- Overwrites data multiple times with a random bit pattern
The Centera-supplied Software Development Kit (SDK) contains:
- C callable libraries
- Java interface that utilizes JNI
- Sample code
It can be downloaded here. You will need to create an account with EMC to be able to download the full SDK.
Why Is It needed?
- Provides content-addressing framework
- No file system and associated drawbacks
- Applications access the Centera via API calls only
A cluster is a logical CAS archive that appears to your application as a single unit. A cluster can be accessed by one or more applications via a set of node IP addresses and access profiles.
A pool is an SDK object that represents one or more clusters. Your application must OPEN a pool by providing a series of node IP addresses and access profile credentials for the desired set of clusters. The first accessible IP address in the list represents the primary cluster, while subsequent IP addresses are considered the secondary clusters (assuming that they represent distinct clusters). The pool object also auto-discovers any replica clusters that are configured via the primary or secondary clusters.
The system administrator creates access profiles to applications. Profiles are a means to enforce authentications and authorization. The system administrator can determine which applications have access to a cluster and what operations they can perform. An application can only log into Centera if a profile for that application has been created on the Centera cluster and the credentials for that profile have been made available to the application server.
Once the profiles have been created on the Centera cluster, the system administrator exports the profile information to a Pool Entry Authorization (PEA) file and copies this file to the application server. The system administrator can set an environment variable that points to the PEA file or can leave it to the application to give the path to this file.
When you code your application, you can ignore the PEA file and the cluster will point the SDK to the location of the PEA file to use. Alternatively, as a developer, your enterprise may have created specific PEA files and distributed them to the development team. At this point, you can give the full path of the PEA file in your code when opening the pool. It is important to note that for these articles, the public available PEA profiles will be used. The files have the following naming convention: ClusterName_ProfileName_CapabilitiesList.pea. For example, us2_armTest2_rdqeDcwh.pea translates to:
- Application Profile belongs to Centera Cluster US2
- Profile Test2, Advanced Retention Management (arm) enabled
- Capabilities: all enabled – please refer to the list below
- r: read
- w: write
- d: delete
- q: query
- e: exists
- D: privileged delete
- c: clip copy
- h: retention hold
All profiles except "Profile1" are configured to enable the "monitor" capability. Each profile also comes enabled with a name/secret combination that corresponds to the profile name. Thus, to access a profile defined by the us2_armTest2_rdqeDcwh.pea file, the application could alternatively use
"name=armTest2,secret=armTest2" in the connect string.
And as Forrest Gump said in the drama movie with the same name, "That's all I am going to say about that." This introduction should give you enough knowledge to be able to read the SDK and write code to use the Centera appliance. Since this article is one in a series of articles I am writing about different functionalities, each individual article will have this introduction and then will discuss the specific Centera functionality the article will address.
How to Set Up the Development Environment
In Visual Studio, create a new project called "AdrdProjectCentera1" as in the figure below:
Note: I am creating the project on my E: drive in the CAS directory. The project name in this article is "AdrdProjectCentera1." This will create the directory structure needed by Visual Studio. The directory of interest in this solution structure is the debug directory that gets created by Visual Studio. In this article, the full path of the directory of interest is as follows: E:\CAS\AdrdProjectCentera1\AdrdProjectCentera1\AdrdProjectCentera1\bin\Debug. Note that your path will be different depending on the location of your project.
The next step is to unzip the EMC Centera SDK files. The SDK is delivered from the EMC site as a single zipped file. The default zip file name is 3.1_SDK_Windows_gcc.zip as of Oct 13, 2007. Once the file is unzipped, a number of directories will be created. Copy the files in the lib directory to the debug directory created by Visual Studio in step 1.
The files that you will copy are FPLibrary.dll, fpos32.dll, fpparser.dll and pai_module.dll. There is also an FPLibrary.jar file that exists in that lib directory. You do not need to copy that file. The FPLibrary.jar file is the Java wrapper for FPLibrary.dll. This JAR is the equivalent of the .NET wrapper that the SourceForge project is all about. Also, all the LIB files are to be used if you are developing using C or C++. Just ignore these files for this article.
Next, download all the PEA files to be able to develop against the "public Centera." I will use the "US X" PEA files from the EMC website here as of Oct 13th, 2007. Make sure you copy the PEA files to the debug directory described in step 1 above.
The next step would be to unzip the .NET wrapper you downloaded from the SourceForge site. The default zipped file that you downloaded would be FPApi.NET.zip. Once it is fully unzipped, the following directories will be created:
The ZIP file from SourceForge does not include the binary file of the wrapper (compiled version of the code). So, you will need to compile the code to generate the final wrapper that you will use in this article project. To do so, double click on Wapper.sln, which the ZIP file extraction created. This should start a new instance of Visual Studio and the solution should look as follows:
Compile the solution by selecting the "Build" -->"Build Solution" menu options, as in the next figure:
Once the build is complete, copy the files FPSDK.dll and FPSDK.pdb that are generated as a result of the solution build to the debug directory created in step 1.
The final debug directory for the solution should look like this:
The final step is to set a reference to FPSDK.DLL in your solution. To do so, open the original solution you created in step 1 (if it is not already open).
Finally, the Article Content: How to Retrieve the Centera Cluster Capabilities
The following screen shot is this article's UI.
To actually retrieve the cluster information, you need to make the following API calls:
- Open the Centera cluster by creating an instance of the wrapper
- Use the
FPPool instance you created in the step above to retrieve the cluster capabilities.
- Close the
Open the Pool
To open the Centera pool, you will need the cluster "Connection String." This is usually an IP address if a single Centera, or a number of IP(s) if Centera is configured as a cluster separated by commas. Also, concatenated to the IP list a "?" sign and the the full path of the PEA file. In the code associated with this article, the PEA files are included in the debug directory.
Sample of the Connection String
22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206?us1_profile1_rwqe.pea
Retrieve the Cluster Capabilities
#region Build the String to display in the UI
strPoolInfo = ("\nPool Information" + "\n================" +
"\nCluster ID: " + myPool.ClusterID +
"\nCluster Time: " + myPool.ClusterTime +
"\nCluster Name: " + myPool.ClusterName +
"\nCentraStar software version: " + myPool.CentraStarVersion +
"\nSDK version: " + FPPool.SDKVersion +
"\nCluster Capacity (Bytes): " + myPool.Capacity +
"\nCluster Free Space (Bytes): " + myPool.FreeSpace +
"\nCluster BlobNamingSchemes : " + myPool.BlobNamingSchemes +
"\nCluster Capacity: " + myPool.Capacity.ToString() +
"\nCluster CenteraEdition: " + myPool.CenteraEdition +
"\nCluster ClipBufferSize: " + myPool.ClipBufferSize.ToString() +
"\nCluster DeleteAllowed: " + myPool.DeleteAllowed.ToString() +
"\nCluster DeletionsLogged: " + myPool.DeletionsLogged.ToString() +
"\nCluster ExistsAllowed: " + myPool.ExistsAllowed.ToString() +
"\nCluster QueryAllowed: " + myPool.QueryAllowed.ToString() +
"\nCluster RetentionDefault: " + myPool.RetentionDefault.ToString()+
"\nCluster ReadAllowed: " + myPool.ReadAllowed.ToString()+
"\nCluster WriteAllowed: " + myPool.WriteAllowed.ToString());
Close the FPPool
In the sample included, I have opened the pool inside a using statement. Therefore, when done, the
FPPool will be closed. It is possible to use the following statement:
Explaining the Capabilities
ClusterID: Unique ID of the cluster
ClusterTime: Time on the cluster; note that all Centera maintain GMT time
ClusterName: The name given to the cluster; most of the time, this value is never used or filled by the Centera administrators
CentraStarVersion: the version of the OS running on Centera
SDKVersion: The version of the SDK your application is using; usually it is the version you downloaded from EMC, but note that newer versions of the SDK can talk to earlier versions of the CenteraStar OS
Capacity: Total space on the Centera pool you are connecting to
FreeSpace: Total available space on the Centera pool you are connecting to
CenteraEdition: Is either
CE+. Please see the Centera Modes section earlier in this article
DeleteAllowed: Is deletion of clips allowed on this pool
DeletionsLogged: Is deletion logged; usually this is set to
true for auditing purposes and especially if the pool/cluster is in
RetentionDefault: The default retention period; most of the public Centera clusters have this value set to
00:00:00, which implies that there is no retention. In other words, C-Clips can be deleted immediately
For all other capabilities, please see the Centera API reference GUID Centera_SDK_3.1_API_Ref_Guide.pdf and review the
FPPool_GetCapability API. Also included in the demo code are 2 classes that are used to serialize the capabilities. The classes are named
AdrdCenteraRetentionInfoItem, respectively. These classes represent most of the capabilities that you will ever use when developing with Centera. I will use them in my next 2 articles on how to write to Centera and how to read from Centera.
You can get the Microsoft or a PDF version of this article from here.
- 19 October, 2007 -- Original version posted