Linking NPG ontologies to external datasets

 

We (Macmillan Science and Education) have made efforts to begin linking our domain models to external datasets.  Our article-types, journals and subjects models are now linked to DBpedia (Wikipedia) and Wikidata.  Our subjects model is additionally linked to Bio2RDF and MeSH.

Also, our core model is now linked to a number of other external models (CIDOC, FaBIO, schema.org, etc.).

Our models, now including these links, are available to view or download at nature.com/ontologies.

We will continue to refine and expand these links and would be interested in any thoughts, ideas and feedback from the community, particularly around any additional datasets we should consider linking to.

NPG Article Types Ontology

The NPG ArticleTypes Ontology is a categorization of kinds of publication which are used to index and group content published by Springer Nature. This taxonomy is organised into a single tree using the SKOS vocabulary. It includes article-types that are directly applied to content, such as Article, Review Article, News, or Book Review plus higher-level groupings such as Research, News and Comment, or Amendments and Corrections.

URL OF KNOWLEDGE MODEL:

http://www.nature.com/ontologies/models/domain/article-types/

OWNED/DEVELOPED BY:

ADOPTED (AS OPPOSED TO OWNED) BY ORGANIZATIONS/PUBLISHERS:

Springer Nature

HOW IS THIS KM APPLIED?

Applied manually by authors or editorial staff as part of the standard publishing workflow.

DESCRIPTION OF THE CURRENT USE CASE(S) OF THE KM

This model allows us to categorize content based on the type of publication, allowing content of similar type to be grouped or filtered at varying degrees of granularity.

IS THE KM BEING ACTIVELY DEVELOPED?

Yes, internally

LICENSE INFORMATION:

CC0 – http://creativecommons.org/about/cc0

Auto-tagging

We have a deep taxonomy of CS concepts – at the deepest level of the tree there are seven levels. The most useful concepts for precision search are of course the most granular concepts represented by the leaves of the tree.

However, concepts can be multi-parented, so the accurate application of a concept to a text requires that the context within the tree, i.e., the correct branch, be understood.

While expert authors who apply the terms to their articles have varying degrees of interest and attention to this indexing task, our experience shows that they rarely misapply terms – sometimes they appear lazy and are happy to assign only high-level concepts such as “Software” which is not too useful.

However, our experience with an auto-tagger shows that a huge amount of “noise” is created. We consider the noise unacceptable – presenting it to users will create distrust in the taxonomy itself.

We have been expanding the logical rules of the auto-tagger in an effort to reduce the noise to an acceptable level. So far, without success.

I have been trying to understand why.

So far, the best explanation I can come up with is that while hierarchical context is readily understood by the human brain, auto-taggers based on statistical occurrences of a concept and within proximity of other words and concepts, cannot accurately reproduce hierarchical context.

Any advice would be appreciated.