DAF Platform™ Classification functionality from Concept Searching Inc. adds a new and powerful tool to the user when accessing millions of records and hundreds of Terabytes.
The Classifier option to the platform provides automatic classification during the content acquisition process. The Classifier is an integral part of the advance indexing and search technology supported by the DAF Platform and subject to a separate license for its support.
Classification is the allocation of items into groups according to types; automatic classification is the processes of classifying items in an automated manner. The Classifier supports the automatic classification (allocation) of document types into groups or classes based on their determined content. Unlike structured or numeric records, documents represent natural language in unstructured textural or visual data. These elements present a unique set of challenges in determining their inherent content. The following terms are helpful in understanding the technology of classification.
- Natural language processing - the use of linguistic analytics to infer meaning from grouped words.
- Ontology - the relationships between language and logic - pave a logical avenue to a high level of accurate and automated document classification.
- Taxonomies - pre-determined hierarchies of industry or domain related terms that when applied to logical classification further improves the accuracy of automated document classification.
From a classification perspective, the words spud and oil in a food and beverage industry classification would likely relate to a potato and vegetable oil and potato chips. The same words classified under energy industry taxonomy would produce document sets related to holes and petroleum. In DAF we refer to the development and implementation of these elements as “building a vocabulary”.
Automatic classification of documents adds structure to the data source based on the retrieval functionality and information management requirements of the user’s research objectives. Once documents have been classified, the Virtual Fileroom™ application allows users to browse the document collection, by using an expanding tree-view to represent the taxonomy structure. Taxonomies can also be used as a method of filtering search requests so that results are restricted to a selected node on the hierarchy.
Another use of the automatic classification process is to automatically tag business records with controlled metadata to increase the quality of the search results. Using the Taxonomy management tool a controlled vocabulary can be created and used to not only implement the automatic classification process but also used to implement the automatig tagging of business records for your Records Manager.
A third use of the automatic classification process is to, within a portal environment or application, present users with the most recent or fresh documents pertaining to subjects that are relevant to each user's profile.
The Classifier Engine
The Classifier is a leading-edge, rules-based categorization module providing customers complete control of rules-based descriptors unique to their implementation.
The Classifier provides an easy to implement and maintain categorization descriptor table through which all rules and terms can be defined and managed. This approach eliminates the error prone results of "training" algorithms typically found in other text retrieval solutions.
The Classifier automatically identifies, as part of the indexing process, the categories to which incoming document belongs. Each category is identified by a unique descriptor and is associated with key descriptive words and/or phrases held in the database.
Finally, the Classifier enables subject matter experts to quickly implement a vocabulary (taxonomy) where all documents automatically classify (are assigned) to multiple nodes at index time. The vocabulary may then be used as a way of browsing the document collection or as a filter when running ad hoc text based searches. This technology can also be used as part of a SharePoint® solutions allowing your to leverage your investment.
Key Points to Note are, The Engine:
- is a rules based engine.
- overcomes the inherent issues with engines requiring training sets that tend to extract erroneous data and reduce accuracy.
- automatically classifies documents to multiple classifications.
- offers the users the ability to browse via one or more vocabularies.
- enables the use of Boolean filters to automatically limit the results to a selected classification.
The Taxonomy Manager
The conceptTaxonomyManager is an application from Concept Searching designed specifically to build and manage taxonomies that work with the Classification option to automatically classify and automatically tag business records. Additionally this application is also used with conceptClassifier for SharePoint® to provide automatic classification and automatic tagging of business records in the SharePoint® environment.
The Taxonomy Manager application provides totally automatic generation of taxonomy node clues from compound terms found in the document corpus. The advantage is that there is no need for training sets or comples Boolean rules. This delivers a much faster ROI when used in conjunction with the Categorization and Classification option from DAF Platform.
Generating related topics by extracting compound terms from your document set has been proven to significantly reduce the amount of time required to create a taxonomy. In addition, the output is normal text that business staff can understand and further extend.
The Taxonomy Manager Benefits:
- Used to build and maintain taxonomies
- No need for training sets
- No need for complex Boolean rules
- Provides Folksonomy support
- Technology is delivered as web parts for support in SharePoint®
- Fully SOA compliant services for automatic classification and taxonomy management
- Based on open standards
- Simple to install, maintain, and administer
- Easy to use by subject matter experts
Click hereand visit our Document Library for more Concept Searching offerings.