iMatrics, Auto-tagging, and Making News Relevant

iMatrics, metadata, and content auto-tagging

For more than a year, Sourcefabric and our partner news agency in Norway, NTB, have been working with iMatrics, a provider of AI solutions, to implement an auto-tagging feature into our newsroom management software Superdesk. Once complete, NTB’s Superdesk instance will use the iMatrics tool to analyse and categorise text with the goal of creating metadata that can deliver highly-relevant content to subscribers.

Sourcefabric recently spoke with iMatrics Senior Marketing and Sales Developer Fredrik Lundberg to learn more about their auto-tagging solution, and to hear how AI has the potential to make news more relevant, scalable, and equitable.

How did iMatrics get its start?

iMatrics started as a spin-off from Linköpings University. Our founder, Berkant Savas, had a lot of experience and knowledge in clustering and other kinds of AI models, and based on that knowledge, he founded iMatrics as an early-stage start-up that was looking for a problem to solve.

But it took three tries to get the model just right. First established at LEAD, a Swedish business incubator, the original idea was to solve a problem that was very big in Sweden – hate speech on web forums, blogs, and the comment sections of news sites. Berkant and his business partner, Mari Ahlquist, looked into developing an AI model that could analyse this content and determine if it was hate speech or not. If it was, then the comment could be blocked or erased by automation. Everybody thought it was a great idea, but no one wanted to pay for it, so it was back to the drawing board.

Eventually Berkant and Mari were introduced to Gota Media, a Swedish newspaper company. Gota Media was looking to solve a newsroom challenge: automatically tagging content with keywords and metadata. At the time, it was a loathed job at the end of day. No one wanted to do it, and when they did, it got done badly with inconsistent data. Gota Media asked if our technology could automate this task.

As a result, iMatrics created what it calls a “language-independent AI solution.” Explain what that is and how it works.

Basically, we developed our own language independent AI. It’s ‘language independent’ because it works with semantics and the relation between words. For example, when we analyse text and extract the data, the metadata that we extract does not need to be mentioned in the text explicitly. That’s because we know from the relationship of the words what the article is about. Our system interprets text like the human brain would – through meaning, rather than a simple keyword analysis – which increases the metadata tagging accuracy.

How many languages is your system operating in?

Our solution is language independent, which means we can support any language. Currently, our model is customised for the languages that our customers work in – Danish, Dutch, English, French, Finnish, German, Norwegian and Swedish.

In addition to AI-driven entity recognition, the iMatrics tool integrates with IPTC Media Topics and Wikidata, right?

We work with several taxonomies based on our customers’ needs – both custom taxonomies and with standards like IPTC Media Topics, which are the standard classification for the news industry. Wikidata, meanwhile, is becoming the standard knowledge base for entities – such as persons, organisations, places, or objects. By tagging news content with IPTC Media Topics and Wikidata, it’s possible to deliver content to customers interested in those specific topics. But this requires precision, and that’s where iMatrics comes in.

For example, there are a lot of places called ‘Sweden’ around the world. It would be bad to send push notices to people who are interested in Sweden, the country, with articles from Sweden, the town, in New York. The entities need to be correct, and one way of getting them right is to verify what we find in the articles with the associated Wikidata. That’s the challenge of auto-tagging – how to correctly recognise entities and do it consistently.

With NTB, we are also enabling a feedback loop with a link to Wikidata. If an NTB journalist is writing an article with new entities that should be in Wikidata, they can add the information that they have for others to use later.

What are the practical newsroom benefits of these AI tools that iMatrics has created?

There are three. First, we help newsrooms save time and money. Second, with good data, news organisations can personalise content for their audiences with recommendations, newsfeeds, newsletters, and topic/theme pages that more precisely match their interests. And third, the iMatrics API is compatible with just about any CMS.

You’re also working on a gender analytics feature for your clients. What is that?

It's something that came from one of our Swedish customers. At that paper, readers saw themselves as consumers of an ‘old man's newspaper.’ Editors wanted to change that; they wanted to produce news for the entire population. One way of doing that is to have gender represented in the news in a better way. So, we set our tool to measure gender-specific pronouns, providing editors with a way to gauge the gender bias of their coverage.

Is iMatrics the future of news innovation?

We believe that we create value and help news organisations better understand their customers while saving money and time in the process. Our mission is to make the news industry data driven, and we do this by helping newsrooms create good data that they can use to drive their digital journey. We also have a start-up approach, which makes us very agile in helping customers with product development and metadata-related projects.

Blog

iMatrics, Auto-tagging, and Making News Relevant

Resources