Semantics and metadata

Content and context: content is not data

It may seem odd to begin the discussion of content and context with data, but stay with me. The importance of having good data hygiene, good reference data sets, and data standards for interoperability is growing as we need to present better visualisations or crunch data in more sophisticated ways. The ability to carry out increasingly complex data analysis often means bringing multiple data sets together, using more sophisticated algorithms, to perform more complex calculations.

However, data does not tell the whole story; presenting that data doesn’t mean that the audience will necessarily understand what the data means. The role of context is provided by content. In a 2018 study discussed in Science Daily, between half and three-quarters of the study group had trouble interpreting statistical data, particularly when presented as probabilities instead of natural frequencies. Content fills the gap by adding context. That context completes the story by filling in the gaps of “what does that data actually mean?” to the audience segment who are not data scientists or statistics enthusiasts or not experts in a particular field.

Frontiers. “Why don’t we understand statistics? Fixed mindsets may be to blame.” ScienceDaily. ScienceDaily, 12 October 2018.

Data tells only part of the story

For all the importance of content to be delivered along with data, content has been largely ignored in the data arena. The manipulation of content to turn copy into meaningful context bears little resemblance to the processes for bringing meaning to data. Editing data means changing and possibly normalising a data point in a database cell. Editing content means checking the accuracy, the consistency of the language, the spelling and grammar, and most importantly, the context. Sandwiching a sentence between two other sentences is not a neutral act; it can enhance the telling of the data story or backfire terribly and not only detract from the story but offend your audiences in the process.

Content and data are maintained in very different ways

Content is a very separate discipline, and although the automated delivery of content dovetails with the need for automated data delivery, that’s there the similarity ends. Yet there is a very strong temptation to treat content like data, confining content to cells in a database, moving it around in static chunks like so much boxed cargo. The mechanisms meant for processing data limit content in so many ways. The complexity and nuance are dampened; the contexts are limited; its potential is hobbled. The editing process becomes cumbersome and error-prone; content bloat occurs as copies are pasted into multiple database cells; and the overhead of content maintenance becomes unwieldy. To allow content to operate at its full potential, it needs to use its own standards and its own semantics, which ultimately enables its ability to interoperate.

How we produce content affects how much value it accrues

The end-to-end method of producing content that can co-exist alongside data is best described by breaking down the production process into three distinct areas.

  • The first area is the authoring and editing environment. This is where authors can manipulate content by combining content components into larger content objects; these objects use one of the content standards appropriate for authoring systems. These systems allow authors to transclude content fragments for use across many content objects, do batch checking for many levels of content quality, and add semantics to content objects for outcomes such as personalisation, aggregation by keywords, or automated delivery and presentation by a CMS.
  • The second area is a processing environment. Once the content is considered final, the authoring and editing environment sends the content to be processed. Processing is similar to doing a software build; the processing builds publication-ready strings, resolving all of the transclusions and cross-references, and auto-application of more semantics. The build also transforms the content into whichever formats are needed by downstream systems. That could mean HTML for a website, specialised XML for an enterprise system, strings destined for a bot or AI model, or even a PDF destined for a print publication.
  • The third area is a delivery environment. The “built” content is then treated as content objects that are ready for use by other systems. The content is held “on offer” in a repository or database, and is pulled by other management systems, as needed. This is where the content is delivered to, though it is not the final destination. It is at this point that would be appropriate for a database to pull content, for use alongside data. This ensures that a database stores only publication-ready strings, and allows authors to retain the powerful features needed to efficiently produce the needed content.

Intelligent content is valuable content

These content principles are known in the content industry as “intelligent content”, which states that content must be structurally rich and semantically categorised, to make content automatically discoverable, re-usable, reconfigurable, and adaptive. The demand for intelligent content is being driven by the offer of Content as a Service. This has opened the door for both content standards and applied semantics as ways of automatically delivering content alongside data for more targeted contexts.

Related:

To learn more about this topic, you can watch my presentation at ENDORSE.

The Quest for Content in Context: Using Standards and Semantics for Interoperability

ENDORSE: The European Data Conference on Reference Data and Semantics

Information operations in a digital environment

Share:

Share on twitter
Share on linkedin