All posts by Christian Kohl

Director Information & Publishing Technology, De Gruyter

Linking NPG ontologies to external datasets

 

We (Macmillan Science and Education) have made efforts to begin linking our domain models to external datasets.  Our article-types, journals and subjects models are now linked to DBpedia (Wikipedia) and Wikidata.  Our subjects model is additionally linked to Bio2RDF and MeSH.

Also, our core model is now linked to a number of other external models (CIDOC, FaBIO, schema.org, etc.).

Our models, now including these links, are available to view or download at nature.com/ontologies.

We will continue to refine and expand these links and would be interested in any thoughts, ideas and feedback from the community, particularly around any additional datasets we should consider linking to.

NPG Article Types Ontology

The NPG ArticleTypes Ontology is a categorization of kinds of publication which are used to index and group content published by Springer Nature. This taxonomy is organised into a single tree using the SKOS vocabulary. It includes article-types that are directly applied to content, such as Article, Review Article, News, or Book Review plus higher-level groupings such as Research, News and Comment, or Amendments and Corrections.

URL OF KNOWLEDGE MODEL:

http://www.nature.com/ontologies/models/domain/article-types/

OWNED/DEVELOPED BY:

ADOPTED (AS OPPOSED TO OWNED) BY ORGANIZATIONS/PUBLISHERS:

Springer Nature

HOW IS THIS KM APPLIED?

Applied manually by authors or editorial staff as part of the standard publishing workflow.

DESCRIPTION OF THE CURRENT USE CASE(S) OF THE KM

This model allows us to categorize content based on the type of publication, allowing content of similar type to be grouped or filtered at varying degrees of granularity.

IS THE KM BEING ACTIVELY DEVELOPED?

Yes, internally

LICENSE INFORMATION:

CC0 – http://creativecommons.org/about/cc0

Springer Nature

Springer Nature is one of the world’s leading global research, educational and professional publishers, home to an array of respected and trusted brands providing quality content through a range of innovative products and services.

Springer Nature is the world’s largest academic book publisher, publisher of the world’s most influential journals and a pioneer in the field of open research. Springer Nature was formed in 2015 through the merger of Nature Publishing Group, Palgrave Macmillan, Macmillan Education and Springer Science+Business Media.

Springer Nature is embracing linked data technologies as an integral part of its content publishing operations and has developed a data model which is highly responsive to new and legacy business requirements. Linked data is central to the customer experience in providing content discovery applications and in facilitating emergent behaviours in interacting with content.

Many of the models developed have recently been published on our Ontologies Portal at nature.com/ontologies and are shared in order to contribute to the wider linked data community and to provide a public reference. These models cover publication things – articles, figures, etc. – and classification things – article-types, subjects, etc. – plus additional things used to manage our content publishing operation – assets, events, etc.

We have also published a model for conference proceedings on our LOD Conference Portal at lod.springer.com

Knowledge Models Used

Contact

NPG Subjects Ontology

The NPG Subjects Ontology is a polyhierarchical categorization of scholarly subject areas which are used for the indexing of content by Springer Nature. It includes subject terms of varying levels of specificity such as Biological sciences (top level), Cancer (level 2), or B-2 cells (level 7). In total there are more than 2750 subject terms, organised into a polyhierarchical tree using the SKOS vocabulary.

URL

http://www.nature.com/ontologies/models/domain/subjects/

Owned/Developed by:

Adopted by Organizations / Publishers:

How is this KM applied?

Applied manually, by authors or editorial staff as part of the standard publishing workflow, or by professional indexers

Description of the current use cases of the KM

The NPG Subjects Ontology constitutes the main backbone of nature.com subject areas, a new section on nature.com that allows users to browse content topically rather than navigate via the more usual journal paradigm. Each of the terms in the ontology includes a link to the relevant subject page on nature.com.

Is the KM being actively developed?

Yes, internally

 License Information:

CC0 – http://creativecommons.org/about/cc0

NICEM Thesaurus

Short Description

Thesaurus of the National Information Center for Educational Media (NICEM), used for indexing the records of NICEM’s bibliographic database.

Owned / Developed by

  1. Name of Owner – Access Innovations, Inc.
  2. Name of Developer – Access Innovations, Inc.
  3. Technical Contact – Mary Garcia, *protected email*
  4. License Contact – *protected email*

How is this KM applied?

  1. Manually | Auto-tagging software | Both
  2. By Authors | Editorial Staff | Professional indexers

How is this KM used?

  1. Direct Bibliographic Search | Indirect (e.g., used to expose content resulting from other user actions)
  2. Display | Grouping of results
  3. People search | Author profiles | Publication profile

Description of the current use cases of the KM

Used for indexing bibliographic records of non-print educational media in the NICEM database.

What are the main goals for using this KM?

  1. Enhance UX
  2. Increase Search Engine Ranking
  3. Increase time user spends on site
  4. Increase traffic
  5. Increase downloads

Rationale for KM vs other means of searching and browsing?

The thesaurus contains terms that reflect the subject matter of educational material, especially at the K-12 levels. An associated rule base that has been developed specifically for those terms enables appropriate indexing and accurate retrieval of bibliographic records, as well as user-friendly browsing in conjunction with a search interface.

 Is the KM being actively developed?

  1. Yes, internally

License Information:

  1. Terms of license or link to license terms – Contact *protected email*.

PLOS

PLOS (Public Library of Science) is a nonprofit publisher and advocacy organization founded to accelerate progress in science and medicine by leading a transformation in research communication.

Our core objectives are to provide ways to overcome unnecessary barriers to immediate availability, access and use of research, pursue a publishing strategy that optimizes the quality and integrity of the publication process, and develop innovative approaches to the assessment, organization and reuse of ideas and data.

Knowledge Models Used:

  • PLOS Thesaurus

Contact:

Thesaurus of Psychological Index Term®

SHORT DESCRIPTION/ABOUT

The Thesaurus of Psychological Index Terms® is the controlled vocabulary used by APA’s professional indexers to index all of APA’s databases:  PsycINFO®, PsycARTICLES®, PsycBOOKS®, PsycEXTRA®, PsycTESTS®, PsycTHERAPY®,  and sycCRITIQUES®.  With the wide variety of concepts and vocabulary used in the psychological literature, search and retrieval of specific psychological concepts is virtually impossible without the controlled vocabulary of the Thesaurus. It provides a way of structuring the  diverse concepts in the field of psychology to assist in the creation of efficient and consistent indexing.   The Thesaurus , first published in 1974, has an influential role in research because it reflects the most current trends found in the behavioral and social science literature.  The Thesaurus can help authenticate the use of terms as they become accepted nomenclature.

URL OF KNOWLEDGE MODEL:

OWNED/DEVELOPED BY:

  1. Name of Owner: The American Psychological Association
  2. Name of Developer: Ian Galloway
  3. Technical Contact: igalloway [at] apa.org
  4. License Contact: Jan Fleming, jfleming [at] apa.org

HOW IS THIS KM APPLIED?

  1. Manually | Auto-tagging software | Both
  2. By Authors | Editorial Staff | Professional indexers

HOW THIS IS KM USED?

  1. Direct Bibliographic Search | Indirect (e.g., used to expose content resulting from other user actions)
  2. Display | Grouping of results
  3. People search | Author profiles | Publication profile

DESCRIPTION OF THE CURRENT USE CASE(S) OF THE KM:

The Thesaurus of Psychological Index Terms is currently used primarily to index over 3.7 million records that can be found in PsycINFO.

DESCRIPTION OF FUTURE/POTENTIAL USE CASES OF THE KM (NOT YET REALIZED)

Development of an ontology based on the structure and concepts found in the Thesaurus is currently underway.

WHAT ARE THE MAIN GOALS FOR USING THIS KM?

  1. Enhance UX
  2. Increase Search Engine Ranking
  3. Increase time user spends on site
  4. Increase traffic
  5. Increase downloads
  6. …)

RATIONALE FOR KM VS OTHER MEANS OF SEARCHING AND BROWSING?

The sheer volume of APA’s databases means that researchers need a pragmatic and efficient way to discover the precise records they are seeking. The Thesaurus provides the most targeted way find the major concepts within our database records.  The Thesaurus has been developed specifically to work with the interdisciplinary nature of the psychological literature. In conjunction with faceted searching, researchers can quickly sift through over 3 million records dating back to the 19th century with confidence.

IS THE KM BEING ACTIVELY DEVELOPED?

Yes, internally

LICENSE INFORMATION:

Terms of license or link to license terms: http://www.apa.org/about/contact/copyright/index.aspx

American Psychological Association

Knowledge Models Used

About

The American Psychological Association is the largest scientific and professional organization representing psychology in the United States. APA is the world’s largest association of psychologists, with nearly 130,000 researchers, educators, clinicians, consultants and students as its members.

Our mission is to advance the creation, communication and application of psychological knowledge to benefit society and improve people’s lives. We do this by:

  • Encouraging the development and application of psychology in the broadest manner.
  • Promoting research in psychology, the improvement of research methods and conditions and the application of research findings.
  • Improving the qualifications and usefulness of psychologists by establishing high standards of ethics, conduct, education and achievement.
  • Increasing and disseminating psychological knowledge through meetings, professional contacts, reports, papers, discussions and publications.

Contact:

ACM Computing Classification System (CCS)

Short Description

ACM has published a de facto standard taxonomy for classifying and indexing computing literature and researchers’ areas of expertise since the 1960s. The CCS underwent a major overhaul in 1982 with substantive updates in 1998 and 2012.

The 2012 CCS was created by group of 120 ACM volunteers, a third of them ACM Fellows, who collaborated with ACM Staff and with Semedica, a Division of Silverchair.

The Update Project was led by Professor Zvi Kedem of NYU who served as its Editor-in-Chief, working closely with Bernard Rous, ACM Director of Publications.

The 2012 ACM Computing Classification System has been developed as a poly-hierarchical ontology that can be utilized in semantic web applications. It relies on a semantic vocabulary as the single source of categories and concepts that reflect the state of the art of the computing discipline and is receptive to structural change as it evolves in the future.

ACM has provided tools to facilitate the application of 2012 CCS categories to forthcoming papers.

URL OF KNOWLEDGE MODEL:

http://dl.acm.org/ccs.cfm

The full CCS classification tree is freely available for educational and research purposes in these downloadable formats: SKOS (xml), Word, and HTML. In the ACM Digital Library, the CCS is presented in a visual display format that facilitates navigation and feedback. The full CCS classification tree is also viewable as a flat file in the Digital Library.

 OWNED/DEVELOPED BY:

  1. Owner: ACM
  2. Developer: ACM-Semedica
  3. Technical Contact: Bernard Rous (rous [at] @hq.acm.org)
  4. License Contact: Deborah Cotton (cotton [at] hq.acm.org)

ADOPTED BY:

Various libraries, companies, and publishers such as Springer, IEEE, and Emerald have made use of the ACM CCS.

 HOW IS THIS KM APPLIED?

  1. ACM articles are generally indexed manually. Auto-tagging software is being evaluated.
  2. Index terms are applied by ACM authors and by professional indexers.
  3. A map of the 1998 CCS to the 2012 version has been built and automatically run against all articles in the ACM Digital Library. Both the 1998 and 2012 sets of concepts are available on Citation Pages of all indexed articles at this time.
  4. In displays of Article Citation Page under “Index Terms” tab. (See: 1145/1963190.1963191)
  5. In tag cloud displays of topics covered by specific publications (See: http://dl.acm.org/pub.cfm?id=J401) or Special Interest Groups (See: http://dl.acm.org/sig.cfm?id=SP923) or Institutional Profile pages (See: http://dl.acm.org/inst_page.cfm?id=60022148)
  6. CCS subjects are included in the index for Simple Search
  7. CCS subjects are directly searchable in Advanced Search. See left bottom of page: http://dl.acm.org/advsearch.cfm
  8. CCS subjects are themselves clickable to return papers indexed by those concepts
  9. CCS subjects are currently displayed on Author Profiles pages under Subject Areas. (See: http://dl.acm.org/author_page.cfm?id=81100246710)

DESCRIPTION OF FUTURE/POTENTIAL USE CASES OF THE KM (NOT YET REALIZED)

ACM is building a community and people-oriented search where the primary objects returned are experts, their attributes, and their contextual relationships. Published works will become attributes of the author (rather than the primary object of bibliographic search where authors are attributes of the published.) One clear use of the CCS in this new facility is the direct and immediate ability to discover people who are expert in one of the defined subject areas; to order them by their impact in that area; and display their working relationships.

Additionally, the 2012 CCS is only partially deployed in the ACM Digital Library today. Sections of the ACM DL still rely on the 1998 version of the CCS.

 WHAT ARE THE MAIN GOALS FOR USING THIS KM?

To enable efficient and precise discovery and exploration of topics

The ACM CCS is a hierarchical taxonomy. It is designed to provide a cognitive map of the computing space from the most general subject areas to the most specific topics.

RATIONALE FOR KM VS OTHER MEANS OF SEARCHING AND BROWSING?

When speaking of taxonomies in computer science circles, the question is often asked “Why bother? Taxonomies are antiquated; Google renders them unnecessary; and the ACM CCS is not used by anyone other than authors who are required to index their ACM articles with it, much to their irritation.”

There are certainly camps within the Information Retrieval community on this issue; one tends to dismiss the usefulness of taxonomies in today’s world while another sees them as powerful and with growing application. In the scientific, technical, and medical (STM) publishing domain, the taxonomic approach to semantic classification is booming — with publishers using taxonomy to allow users to cross-cut content topically, increasing application usage.

CCS searches in the ACM Digital Library have been a relatively small percentage of total searches. Yet some part of the user community finds the CCS very useful in search. Despite the fact that direct searching on CCS subject categories is not highly visible (being found rather cryptically at the bottom left of the Advanced Search page (http://portal.acm.org/advsearch.cfm), the annual number of CCS searches launched in the ACM DL is still about half a million. Adjustments in the Digital Library user interface to promote the CCS as a retrieval tool should multiply this number many fold.

Many scholarly publishers in the scientific, technical, and medical fields are making use of taxonomic classification to create topic collections and virtual journals that dynamically rebuild as new content is added.

ACM efforts to derive topical visualizations from our full-text index proved inferior to those derived from taxonomic terms. Using author-supplied keywords (which are not selected from a controlled vocabulary) was somewhat better but also proved inferior.

Google-type searching appears best suited to directed searches where the user knows exactly what he is looking for. Google supplies almost total recall and the user supplies the precision. For more general subject exploration and discovery purposes, the searches do not work quite as well. And the page-ranking algorithm itself skews results by defining relevance in terms of popularity. Finally, it should be noted that Google is well aware of how its indexes are enhanced by structured, fielded data leading to improved precision in searches; that is why Google Scholar at least has tried to make arrangements with all the publishers whose sites it crawls to provide specific standard meta-tagging. Most publishers, including ACM, have complied in their indexing agreements with Google Scholar.

Lastly, a robust up-to-date taxonomy provides a cognitive map of the discipline. This in itself can be useful in understanding what computer science is all about; where a specific area of concentration fits within the broader discipline; and in development of curricula.

IS THE KM BEING ACTIVELY DEVELOPED?

Yes. The ACM CCS itself is evolving along with its deployment.

 LICENSE INFORMATION:

The full CCS classification tree is freely available for educational and research purposes in these downloadable formats: SKOS (xml), Word, and HTML.

For commercial use, please write cotton [at] hq.acm.org

ACM (Association for Computing Machinery, Inc.)

Knowledge Models Used:

 About

ACM is widely recognized as the premier membership organization for computing professionals, delivering resources that advance computing as a science and a profession; enable professional development; and promote policies and research that benefit society.

ACM hosts the computing industry’s leading Digital Library, and serves its global members and the computing profession with journals and magazines, conferences, workshops, electronic forums, and Learning Center.

Knowledge Models Used:

Contact:

  • Bernard Rous, ACM Director of Publications, rous [at] hq.acm.org
  • http://www.acm.org/