CDI Software

DAF Platform™ Classification functionality from Concept Searching Inc. adds a new and powerful tool to the user when accessing millions of records and hundreds of Terabytes.

The Classifier option to the platform provides automatic classification during the content acquisition process. The Classifier is an integral part of the advance indexing and search technology supported by the DAF Platform and subject to a separate license for its support.

Classification is the allocation of items into groups according to types; automatic classification is the processes of classifying items in an automated manner. The Classifier supports the automatic classification (allocation) of document types into groups or classes based on their determined content. Unlike structured or numeric records, documents represent natural language in unstructured textural or visual data. These elements present a unique set of challenges in determining their inherent content. The following terms are helpful in understanding the technology of classification.

Natural language processing - the use of linguistic analytics to infer meaning from grouped words.
Ontology - the relationships between language and logic - pave a logical avenue to a high level of accurate and automated document classification.
Taxonomies - pre-determined hierarchies of industry or domain related terms that when applied to logical classification further improves the accuracy of automated document classification.

From a classification perspective, the words spud and oil in a food and beverage industry classification would likely relate to a potato and vegetable oil and potato chips. The same words classified under energy industry taxonomy would produce document sets related to holes and petroleum. In DAF we refer to the development and implementation of these elements as “building a vocabulary”.

Automatic classification of documents adds structure to the data source based on the retrieval functionality and information management requirements of the user’s research objectives. Once documents have been classified, the Virtual Fileroom™ application allows users to browse the document collection, by using an expanding tree-view to represent the taxonomy structure. Taxonomies can also be used as a method of filtering search requests so that results are restricted to a selected node on the hierarchy.

Another use of the automatic classification process is to automatically tag business records with controlled metadata to increase the quality of the search results. Using the Taxonomy management tool a controlled vocabulary can be created and used to not only implement the automatic classification process but also used to implement the automatig tagging of business records for your Records Manager.

A third use of the automatic classification process is to, within a portal environment or application, present users with the most recent or fresh documents pertaining to subjects that are relevant to each user's profile.

The Classifier Engine

The Classifier is a leading-edge, rules-based categorization module providing customers complete control of rules-based descriptors unique to their implementation.

The Classifier provides an easy to implement and maintain categorization descriptor table through which all rules and terms can be defined and managed. This approach eliminates the error prone results of "training" algorithms typically found in other text retrieval solutions.

The Classifier automatically identifies, as part of the indexing process, the categories to which incoming document belongs. Each category is identified by a unique descriptor and is associated with key descriptive words and/or phrases held in the database.

Finally, the Classifier enables subject matter experts to quickly implement a vocabulary (taxonomy) where all documents automatically classify (are assigned) to multiple nodes at index time. The vocabulary may then be used as a way of browsing the document collection or as a filter when running ad hoc text based searches. This technology can also be used as part of a SharePoint^® solutions allowing your to leverage your investment.

Key Points to Note are, The Engine:

is a rules based engine.
overcomes the inherent issues with engines requiring training sets that tend to extract erroneous data and reduce accuracy.
automatically classifies documents to multiple classifications.
offers the users the ability to browse via one or more vocabularies.
enables the use of Boolean filters to automatically limit the results to a selected classification.

The Taxonomy Manager

The conceptTaxonomyManager is an application from Concept Searching designed specifically to build and manage taxonomies that work with the Classification option to automatically classify and automatically tag business records. Additionally this application is also used with conceptClassifier for SharePoint^® to provide automatic classification and automatic tagging of business records in the SharePoint^® environment.

The Taxonomy Manager application provides totally automatic generation of taxonomy node clues from compound terms found in the document corpus. The advantage is that there is no need for training sets or comples Boolean rules. This delivers a much faster ROI when used in conjunction with the Categorization and Classification option from DAF Platform.

Generating related topics by extracting compound terms from your document set has been proven to significantly reduce the amount of time required to create a taxonomy. In addition, the output is normal text that business staff can understand and further extend.

The Taxonomy Manager Benefits:

Used to build and maintain taxonomies
No need for training sets
No need for complex Boolean rules
Provides Folksonomy support
Technology is delivered as web parts for support in SharePoint^®
Fully SOA compliant services for automatic classification and taxonomy management
Based on open standards
Simple to install, maintain, and administer
Easy to use by subject matter experts

For more information on The Taxonomy Manger Click here and visit our Document Library for more Concept Searching offerings.

The DAF Platform is designed to easily integrate with search engines using their APIs allowing us to deliver the latest and most advanced knowledge retrieval technology for indexing and searching a wide range of distributed information sources.

The Text Search Engine supported will process hundreads of document types stored on file servers, in GroupWare systems, relational databases, document management systems, Intranets, and the Internet. Today's technology excels in distributed client/server environments and scales to large numbers of digital assets and users.

We support users in a distributed architecture using the sophisticated search engine infrastructure to simultaneously access digital assets in multiple repositories. Depending on the search engine used, users can perform Concept, Pattern, Soundex or Boolean searches over all configured repositories.

The Text Search Engines currently supported are:

The Text Search Engines have a client server and SOA compliant architectures. The architecture is made up of the following basic code components:

Client Handler - Handles client requests for queries (which are then passed to the search servers), for document text (which may extracted, filtered, and reformatted before being returned to the user), and document meta-data information.

Search Servers - These are the servers that actually perform the search. They can handle multiple simultaneous queries (multi-threaded) and search over multiple repositories of information. Search servers may be clustered to handle larger databases or larger user loads.

Highly scalable - DAF's Text Engine is highly scalable to meet customer requirements. As the data repositories grow so will the performance. At least one query server is required on all DAF systems to actually execute the search requested by the client. Each query server can execute queries over multiple repositories, and each server can handle multiple queries simultaneously (multi-threaded). Multiple query servers return merged results and presents them to one client. This architecture allows you flexibility in distributing tasks across multiple query servers.

Security is provided at the repository and document level based on Access Controls assigned to the content and users of the system. The DAF Administration module interacts with security configurations maintained in systems such as Active Directory and LDAP making the information available to the Virtual FileRoom™ administrator for configuration into the system. Security is enforced by FinderManager™ providing document level at the time a search is executed while repository security is enforced at login time. Field level security may also be provided based on specific implementation needs and it is not available out of the box.

Repositories in the DAF Platform™ are easily integrated with other applications (external applications) used in your enterprise, whether home grown or from a third party. The access controls managed by external applications are made available to the DAF Administration module for configuration into the system. These external application repositories will contain the security information forwarded from the external applications allowing FinderManager™ to enforce it.

Security in the DAF Platform is complete and enforceable. In short, users may access only those documents they have privileges to and only in the repositories they have access to.

Available Contracts