Using analytics to take advantage of structured data in content
Guest column: Sachin Kamdar, the co-founder and CEO of the analytics platform Parse.ly, says there is a wealth of structured data readily available to digital publishers
What do digital publishers want out of their analytics? At the most basic level, anyone that puts in the effort to create content, whether that’s investigative journalism or a listicle, wants to understand if the information they’re providing is actually being read.
Of course, there are deeper questions that publishers want to know: is my work making an impact? How can I get more people to pay for a subscription? How can I get more people to read this?
It’s easy to come up with questions to ask about an audience or readership, and many people get frustrated that there isn’t an easy way to answer these questions with analytics.
The disconnect lies with computers, and how they process information. Computers find counting easy, so the first answer analytics typically tackles is a “counting” one: how many people read my site?
But getting an easy answer isn’t the same as getting the best answer, as many media creators and producers have realized since the advent of page views as the primary metric of publishing.
So, how can we get answers to the more complex questions – the crucial questions that are going to provide solutions for the industry to move forward, ensure that the work that we do reaches the audience it’s meant for, that it is making an impact, and more?
First, let’s talk about what is hard for a computer: pattern recognition. When you or I look at a story on a website, it’s easy for us to identify a few things, usually based on the layout of the page. If I asked you, you could tell me what the title of the article is, who the author is, and what the story is about.
However, computers have a much harder time with this. In fact, helping computers learn to recognize patterns is the basis for most Artificial Intelligence work that’s being done by companies like IBM!
Luckily for us, there is a bit of a shortcut we can take to help out the computers on our way to getting this information analyzed: taking advantage of content’s structured data.
Structured data, in theory, is simple: it’s creating a categorization of types of things. But if you’ve ever tried to organize a junk drawer, or worse, a garage filled with “uncategorized” items, you understand that the process can be challenging.
Luckily, in digital publishing, there’s a wealth of structured data readily available, no spring-cleaning initiative required. Your CMS, which was built to help publish content to the web, creates a massive amount of structured data, almost as an accidental byproduct.
By default, most publishers have a plethora of categorizations built into their sites, including things that you can see easily like sections, titles, and author names. There is additional information about things you might not be able to see, like the word count of an article, the date of any modifications, or the text in an image tag associated with the article. Developers can add additional structured data; a list of common structured data for online articles is available on schema.org: http://schema.org/Article.
The result is that publishers and digital media producers are working with some of the richest data sets when it comes to structured data, but many of them aren’t taking advantage of it.
This is the first step in answering questions other than “how many people read my site,” especially when combined with audience segmentations. For example: how likely were people to subscribe when they came from a specific source? Or, were articles with a certain number of words more or less likely to be seen by new readers?
When I talk to clients about “understanding their audience,” ultimately, this is the kind of understanding I want them to have. Knowing when they have more or less readers is nice, but taking advantage of the structured data at their fingertips is where publishers can really make a difference when it comes to analysis.
Sachin Kamdar is the co-founder and CEO of Parse.ly, an analytics platform that provides audience insight for digital publishers. He has been in the content and digital media business since 2009, when Parse.ly launched as part of DreamIt Ventures’ incubator program.