Content in Search and Inference Engines

Angel Sancho Ferrer
Written by Angel Sancho Ferrer
on August 15, 2012

“Actionable Content” as was explained in a recent post by Jack Lynch “means moving from a model where we structure content through meta tags and taxonomies to make it more ‘discoverable,’ to a model where we continue to structure for discoverability but also use Natural Language Processing, Text Analytics and domain expertise to make the content more ‘actionable’.”

The integration of content and software is a topic as broad as how to create Artificial Intelligence; and it is very deeply related with the core assets of Wolters Kluwer, with enriched content and algorithms that understand those special structures for research or workflow tools.

There are three broad categories of content.

  1. Documents, created for human consumption (the unstructured information in primary and secondary sources);
  2. Complementary structures, such as metadata and indexes, to help human searches, initially on paper;
  3. New kinds of structured content, as graphs, rules, or ontologies, where the content is created for its consumption by sophisticated software, not for humans.

Inference engines versus search engines

A search engine is like a librarian that creates a dossier with photocopies and highlighted parts so it is less volume of information to work on. A more proactive adviser could maintain an external dialog to modify the information request, and do internal reasoning to present something more ready to be actionable.

In terms of input and output the differences between the two main knowledge technologies can be summarized as follows.

Search technologies, as powerful as they have demonstrated to be, have limits. They:

  • are reactive, and so depend on the quality of the query;
  • cannot create information, just select the best documents and fragments (without modifying them, just copy and paste).

Rule systems, on the other side:

  • Can follow a dialog-based approach to obtain more information from the user, and even from a software system, changing its internal states and strategy.
  • Can create information that was not there (i.e. a computable document).

Both use content, and its algorithms must be targeted for and evolved to leverage each kind of data structures available.

Content for search

In the beginning, the search technology was the content – through the creation of metadata, topical indexes or tables of content – that allow humans to discover what is inside the repositories of information.

This value-added content can also be leveraged by search engines, creating experiences that simply could not exist on paper:

  • Build more complex queries and post-search filters (facets);
  • Define business models (system queries);
  • Help define what a “relevant” document is, with the understanding of available or computed metadata, at index or search time;
  • Improve the extraction of better fragments with the knowledge of the inner structure of the document;
  • Improve the query quality with semantic dictionaries and natural language processing algorithms;
  • New functionalities as the suggestions of queries or documents.

Content is necessary to improve quality of the query, the results, and chunk extraction, but new algorithms and new types of auxiliary content must be created.

The driving forces for the next generation of search engines are:

  1. How to leverage existing CMS to improve search (e.g. fancy hits, authority, document types, or fragments);
  2. What new content to create (e.g. semantic dictionaries);
  3. New functionalities that anticipate patterns in user behavior (e.g. suggestions, best bets, dialogs).

Content for artificial intelligence

Creating an inspiring and natural artificial intelligence system is not as easy as it seems: “The expert system has a major flaw, which explains its low success despite the principle having existed for 70 years: knowledge collection and its interpretation into rules, or knowledge engineering.” There are plenty of new concepts to be solved:

  • Knowledge acquisition – Humans don’t know what they know, and even when there are “formal rules” it can be difficult to translate to programs.
  • Knowledge representation – Conrad Wolfram in the Wolters Kluwer’s 2012 Technology Conference said that Wolfram Alpha cannot simply use one rule or programing language to encode its computable documents.
  • Verifying knowledge base completeness and consistency – These knowledge encoding systems because of its “actionable” nature are more like a software system than a CMS, with the added problem that a rule system, like in a legal system, can have loopholes, ambiguities, overlaps and inconsistencies.
  • Creation of inference engines that know how to transverse the chosen representation of rules, facts, ontologies, goals and meta-rules.
  • Business models – As they are very much targeted solutions, it limits the market, so are difficult to make the high investment needed in these new content encoding.

However, Wolfram Alpha, Apple Siri or IBM Watson have demonstrated that Artificial Intelligence is crossing the plateau it seemed to have hit.

Advanced technologies trends

In the medium term products will be for sure different, because knowledge workers still have lots of tasks that need to be improved, and information technologies have plenty of opportunities to develop:

  1. New types of structured content (documents, metadata, advanced structures), through editors or software.
  2. New algorithms, to leverage those structures for search or computing, and others to extract structure from the unstructured text. Main research products will move from reactive search to a more proactive paradigm, making search behave more like an expert system.
  3. New content, which needs new people skills and methodologies.

These whole range of opportunities are central to Wolters Kluwer core competences, so there are clear driving forces of how the future will be for the next generation of professional information tools.



Comments

There have been made comments on this article

  1. [...] we showed in a previous post, plenty of issues have to be solved to create a legal system expert, like knowledge acquisition and [...]

Exploring trends, content, technology, and new ideas in the global information industry. New posts every Monday, Friday, and whenever the innovation bug inspires us. Visit www.wolterskluwer.com to learn all about us.
Recent comments
dropdown