Working with Document Classification-related Objects
To enable documents of a specified MIME type to be automatically classified, you need the following objects:
- A document classifier to apply a Content Engine class to the documents of the specified MIME type. See Creating a Document Classifier.
- A
DocumentClassificationActionobject that associates the specified MIME type with the document classifier. See Creating a DocumentClassificationAction Object.
You can also retrieve DocumentClassificationAction objects.
To submit a document for classification and to view its classification status, you must check in a document with auto-classification enabled. See Auto-classifying a Document.
For an overview of automatic document classification, see Document Classifications.
Creating a Document Classifier
To create a document
classifier, you must implement the DocumentClassifier interface as a Java™ or JavaScript component.
A classifier implementation determines the Content Engine class to which a checked-in Document object
belongs, and then applies the class to the object. Typically, this
effort involves parsing the content of the Document object
and mapping metadata from the content to properties of the Content Engine class.
The
following examples show Java and JavaScript implementations,
each of which classifies documents of MIME type "text/pdf". Retrieving
the document's PDF content as an InputStream object,
the classify method uses a third-party API to parse
the content. It tests the subject field of the PDF content. If the
subject indicates that the PDF document is a loan application, then
the method uses the changeClass method to apply the
"PdfLoanApplication" class to the Document object.
Also, the method maps metadata from the PDF content to properties
of the "PdfLoanApplication" class. If the PDF document is not a loan
application, then the default class of the Document object
is maintained.
DocumentClassifier implementation that
is packaged with the Content Engine,
go to this Content Engine directory:- Windows: C:\Program Files\Filenet\Content Engine\samples
- non-Windows: /opt/IBM/FileNet/ContentEngine/samples
Java Example
package sample.actionhandler;
import com.filenet.api.core.*;
import com.filenet.api.engine.DocumentClassifier;
import com.filenet.api.exception.*;
import java.io.*;
import com.ticdoc.pdfextract.*; // 3rd-party API for parsing PDF documents
public class DocClassifyHandler implements DocumentClassifier
{
public void classify(Document doc)
{
try
{
// Get PDF content from the document passed to this method.
InputStream IS= doc.accessContentStream(0);
// Use 3rd-party API to get PDF document metadata.
PDFDocument pdfDoc = PDFDocument.load(IS);
PDFDocumentInformation pdfProperties = pdfDoc.getDocumentInformation();
pdfDoc.close();
// Get subject of PDF document.
String pdfSubject = pdfProperties.getSubject();
// Classify based on PDF subject.
if ( pdfSubject.equalsIgnoreCase("loan application") )
{
// Apply new class.
doc.changeClass("PdfLoanApplication");
// Get PDF properties to be mapped to document.
String pdfloanType = pdfProperties.getLoanType();
String pdfApplicantName = pdfProperties.getApplicant();
String pdfDateSubmitted = pdfProperties.getModificationDate().getTime().toString();
// Set properties for Document stored in object store.
doc.getProperties().putValue("LoanType", pdfloanType);
doc.getProperties().putValue("ApplicantName", pdfApplicantName);
doc.getProperties().putValue("ApplicationDate", pdfDateSubmitted);
doc.getProperties().putValue("DocumentTitle", "PDF Loan Application");
// Set security owner based on loan type.
if ( pdfloanType.equalsIgnoreCase("home loan application") )
doc.set_Owner("GEvans");
else if (pdfloanType.equalsIgnoreCase("auto loan application") )
doc.set_Owner("EMesker");
}
}
catch(Exception e)
{
throw new RuntimeException(e);
}
}
}
JavaScript Example
importPackage(java.lang);
importPackage(Packages.com.filenet.api.core);
importPackage(Packages.com.ticdoc.pdfextract); // 3rd-party API for parsing PDF documents
function classify(doc)
{
try {
// Get PDF content from document passed to this method.
var IS= doc.accessContentStream(0);
// Use 3rd-party API to get PDF document metadata.
var pdfDoc = PDFDocument.load(IS);
var pdfProperties = pdfDoc.getDocumentInformation();
pdfDoc.close();
// Get subject of PDF document.
var pdfSubject = pdfProperties.getSubject();
// Classify based on PDF subject.
if ( pdfSubject.equalsIgnoreCase("loan application") )
{
// Apply new class.
doc.changeClass("PdfLoanApplication");
// Get PDF properties to be mapped to document.
var pdfloanType = pdfProperties.getLoanType();
var pdfApplicantName = pdfProperties.getApplicant();
var pdfDateSubmitted = pdfProperties.getModificationDate().getTime().toString();
// Set properties for Document stored in object store.
doc.getProperties().putValue("LoanType", pdfloanType);
doc.getProperties().putValue("ApplicantName", pdfApplicantName);
doc.getProperties().putValue("ApplicationDate", pdfDateSubmitted);
doc.getProperties().putValue("DocumentTitle", "PDF Loan Application");
// Set security owner based on loan type.
if ( pdfloanType.equalsIgnoreCase("home loan application") )
doc.set_Owner("GEvans");
else if (pdfloanType.equalsIgnoreCase("auto loan application") )
doc.set_Owner("EMesker");
}
}
catch (e) {
throw new RuntimeException(e);
}
}
Creating a DocumentClassificationAction Object
A DocumentClassificationAction object identifies
the document classifier to be started when a document is checked in
with auto-classification enabled. The following Java and C# code examples show how to create
a DocumentClassificationAction object and set its
properties. The MimeType property associates the DocumentClassificationAction object
with documents of the same MIME type and the property is set to "text/pdf".
When documents of this MIME type are checked in with auto-classification
enabled, the document classifier that is associated with this DocumentClassificationAction object will be started.
A document classifier is associated with a DocumentClassificationAction object
through the ProgId and, conditionally, CodeModule properties. For
a classifier that is implemented with JavaScript, you must set the ProgId property
to "Javascript". For a classifier that is implemented with Java, you must set the ProgId
property to the fully qualified name of the document classifier. The
following examples assume a classifier that is implemented with Java.
If, as shown in the examples, the document classifier is contained
within a CodeModule stored in an object store, you
must also get the CodeModule object, then assign
it to the CodeModule property of the DocumentClassificationAction object. Note that
you cannot set the CodeModule property to a reservation (in progress)
version of CodeModule. For more information, see Creating a CodeModule Object.
When saved, a DocumentClassificationAction object
is stored in the Document Classification Actions folder of a Content Engine object store.
Java Example
...
// Create document classification action.
DocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.createInstance(os,
ClassNames.DOCUMENT_CLASSIFICATION_ACTION);
// Set MIME type that associates action to documents of same MIME type.
docClassAction.set_MimeType("text/pdf");
// Set ProgId property with fully qualified name of classifier.
docClassAction.set_ProgId("sample.actionhandler.DocClassifyHandler");
// Get CodeModule object.
CodeModule cm = Factory.CodeModule.getInstance( os,
ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}") );
// Set CodeModule property.
docClassAction.set_CodeModule(cm);
docClassAction.set_DisplayName("DocumentClassificationAction");
docClassAction.save(RefreshMode.REFRESH);
}
C# Example
...
// Create document classification action.
IDocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.CreateInstance(os,
ClassNames.DOCUMENT_CLASSIFICATION_ACTION);
// Set MIME type that associates action to documents of same MIME type.
docClassAction.MimeType = "text/pdf";
// Set ProgId property with fully qualified name of classifier.
docClassAction.ProgId = "sample.actionhandler.DocClassifyHandler";
// Get CodeModule object.
ICodeModule cm = Factory.CodeModule.GetInstance( os,
ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}"));
// Set CodeModule property.
docClassAction.CodeModule = cm;
docClassAction.DisplayName = "DocumentClassificationAction";
docClassAction.Save(RefreshMode.REFRESH);
}
Retrieving DocumentClassificationAction Objects
You
can get a single DocumentClassificationAction object with a Factory.DocumentClassificationAction method.
You can also get a collection of DocumentClassificationAction objects
(DocumentLifecycleActionSet) by retrieving the
DocumentLifecycleActions property on an ObjectStore object.
The following Java and
C# examples show how to retrieve a DocumentLifecycleActionSet collection
from an object store. The examples iterate the set, and, for each DocumentClassificationAction object
in the collection, the examples retrieve the object's MimeType, ProgId,
and CodeModule properties. Note that a document classifier referenced
by a DocumentClassificationAction object might not
be contained within a CodeModule stored in an object
store. This scenario occurs for a classifier that is implemented with
either JavaScript or Java that is specified in the class
path of the application server where the Content Engine is running.
Java Example
...
DocumentClassificationActionSet actionSet = os.get_DocumentClassificationActions();
DocumentClassificationAction actionObject;
Iterator iter = actionSet.iterator();
while ( iter.hasNext() )
{
actionObject = (DocumentClassificationAction)iter.next();
System.out.println("DocumentClassificationAction: " +
actionObject.get_DisplayName() +
"\n MimeType is " + actionObject.get_MimeType() +
"\n ProgId is " + actionObject.get_ProgId() );
String cmName = actionObject.get_CodeModule() != null ?
actionObject.get_CodeModule().getProperties().getStringValue("Name") :
"not assigned to this action";
System.out.println(" CodeModule is " + cmName);
}
}
C# Example
...
IDocumentClassificationActionSet actionSet = os.DocumentClassificationActions;
IDocumentClassificationAction actionObject;
System.Collections.IEnumerator iter = actionSet.GetEnumerator();
while (iter.MoveNext())
{
actionObject = (IDocumentClassificationAction)iter.Current;
System.Console.WriteLine("IDocumentClassificationAction: " +
actionObject.DisplayName +
"\n MimeType is " + actionObject.MimeType +
"\n ProgId is " + actionObject.ProgId );
String cmName = actionObject.CodeModule != null ?
actionObject.CodeModule.Properties.GetStringValue("Name") :
"not assigned to this action";
System.Console.WriteLine(" CodeModule is " + cmName);
}
}
Auto-classifying a Document
You can automatically classify documents
with MIME types for which a classification infrastructure has previously been
set up. That is, for a particular MIME type, a corresponding document
classifier and a DocumentClassificationAction object
must exist.
The following Java and C# examples show how to submit a document of MIME type "text/pdf" for automatic classification.
In the code examples, a Document object is created for a PDF document,
and the object's properties are set, most notably the ContentElements property and the MimeType
property. The ContentElements property is set to the PDF content of the document, and this content
is later parsed by the document classifier. The value of the Document object's
MimeType property must match the value of the DocumentClassificationAction object's
MimeType property. The Document object is then checked in, with the checkin method that specifies the AUTO_CLASSIFY constant.
The examples also include code to monitor the classification process by reading the checked-in document's ClassificationStatus property, which is set to a DocClassificationStatus constant. When an auto classification request is made, the initial ClassificationStatus value is CLASSIFICATION_PENDING. The code repeatedly checks the status until the property's value changes.
Because a document classifier runs as an asynchronous action, an auto-classification request is initially queued, and represented by a DocumentClassificationQueueItem object. This queued state corresponds with the CLASSIFICATION_PENDING status.
Java Example
...
Document doc = Factory.Document.createInstance(os, "Document");
FileInputStream fileIS = new FileInputStream("C:\\EclipseWorkspace\\Documents\\loanapplication.pdf");
// Create content transfer list.
ContentTransferList contentList = Factory.ContentTransfer.createList();
ContentTransfer ctNew = Factory.ContentTransfer.createInstance();
ctNew.setCaptureSource(fileIS);
contentList.add(ctNew);
// Set content on Document object.
doc.set_ContentElements(contentList);
// Set Document properties.
doc.getProperties().putValue("DocumentTitle", "PDF Document");
doc.set_MimeType("text/pdf");
// Check in document and commit to server.
doc.checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION);
doc.save(RefreshMode.REFRESH);
// Check classification status during auto classify.
while (doc.get_ClassificationStatus() == DocClassificationStatus.CLASSIFICATION_PENDING)
{
System.out.println( "Classification status is " + doc.get_ClassificationStatus() );
doc.refresh();
}
System.out.println("Classification status is " + doc.get_ClassificationStatus() );
}
C# Example
...
IDocument doc = Factory.Document.CreateInstance(os, "Document");
Stream fileStream = File.OpenRead(@"C:\\EclipseWorkspace\\Documents\\loanapplication.pdf");
// Create content transfer list.
IContentTransferList contentList = Factory.ContentTransfer.CreateList();
IContentTransfer ctNew = Factory.ContentTransfer.CreateInstance();
ctNew.SetCaptureSource(fileStream);
contentList.Add(ctNew);
// Set content on Document object.
doc.ContentElements = contentList;
// Set Document properties.
doc.Properties["DocumentTitle"] = "PDF Document";
doc.MimeType = "text/pdf";
// Check in document and commit to server.
doc.Checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION);
doc.Save(RefreshMode.REFRESH);
// Check classification status during auto classify.
while (doc.ClassificationStatus == DocClassificationStatus.CLASSIFICATION_PENDING)
{
System.Console.WriteLine("Classification status is " + doc.ClassificationStatus);
doc.Refresh();
}
System.Console.WriteLine("Classification status is " + doc.ClassificationStatus);
}