Cobaltmetrics: Web-Scale Citation Tracking
Aim
With Cobaltmetrics, Thunken is on a mission to make altmetrics genuinely alternative. Traditional citation indexes have stringent inclusion criteria and focus on privileged publication venues. Altmetrics were designed to overcome some of these limitations, but most data providers still somehow rely on predefined lists of citable/indexable research outputs, and they only scratch the surface of the web. We argue that the only way forward is to embrace web-scale citation tracking.
Methods
Cobaltmetrics crawls the web to index hyperlinks and persistent identifiers as first-class citations. We analyze a wide range of websites to reveal insightful links between documents. Cobaltmetrics goes deeper than backlink databases and altmetrics aggregators to help you report on all types of content: publications, books, clinical trials, patents, software artifacts, derivative works, etc. The web is our corpus, and our URI transmutation API collates citations to all known versions of a document.
Cobaltmetrics combines the best of citation indices, altmetrics aggregators, and backlinks databases. Citation indices like OpenCitations, Scopus, or Web of Science focus on citations between traditional scholarly publications. Our approach is both complementary and much broader. In Cobaltmetrics, we track citations between all types of content on the web, not only publications. We think that it is not up to citation aggregators to define what is citable, so we have no selection criteria based on a document's format, language, publication venue, persistent identifiers, etc. Altmetrics aggregators like Altmetric, Crossref Event Data, or Plum Analytics are quite similar to Cobaltmetrics. However, we think that they are not alt- enough as, for many data sources, they focus on data published in a handful of languages and/or have restrictive selection criteria regarding the documents they index. Our goal is to go deeper: the web is our corpus, and we index all citations, no matter the language, the format, or the identifier. We also think that our URI transmutation API surpasses their search engines when it comes to aggregating or deduplicating results. On the web, backlinks and citations are similar objects. That being said, backlink databases also lack our URI transmutation API, i.e. the ability to collate backlinks to all known versions of a document. With Cobaltmetrics, you can not only discover that a given page links to your content, but also all the short URLs and other identifiers that directly or indirectly identify your content.
One of our core principles is that it is not up to altmetrics data providers to decide what is citable, our role is to observe all citation patterns on the web. The web is not FAIR (and will most likely never be) and that is just fine. To produce a corpus that is diverse and inclusive, we track all URIs: every hyperlink, every occurrence of a URI is a citation. One of our biggest challenges is to collate URIs that directly or indirectly identify the same resource, so that citation counts and attention scores can be tallied accurately. We will present the design rationale of our URI transmutation API, and discuss how it relates to other approaches like meta-resolvers (e.g. identifiers.org and n2t.net) and PID graphs (e.g. FREYA).
Results and Discussion
We will then move into a discussion of web-scale altmetrics, a.k.a. alt-altmetrics. We must forget all limitations regarding publishing formats, languages, APIs, and, most importantly, data sources. Metrics are a sampling game, and the web is our corpus. We have started building an infrastructure for web-scale altmetrics by ingesting the massive datasets produced by the CommonCrawl project. Cobaltmetrics is thus in no way restricted to the scholarly web, and we hope the corpus will be useful to other communities. We will discuss how Cobaltmetrics compares to other altmetrics data providers Altmetric, Crossref Event Data, and PlumX Metrics. We will then share the lessons we have learned in the past 18 months, including implementation choices, negative results, and tips to pull citation data at scale with our API.
In particular, we will present results from the analysis of legal citations. Recent initiatives to open access to the law now make it possible to track and analyze legal data on a large scale. Cobaltmetrics partnered with CourtListener to explore the potential in tracking and analyzing citations to and from court opinions from all state and federal courts in the US. Evaluating legal data gives insight into how resources are used, how resources influence other courts and other resources, and how different resources are connected across jurisdictions. We will discuss the main challenges in extracting and normalizing citations in court opinions.
Conclusion
We will then present preliminary results regarding the most cited domains in the CourtListener corpus. We will conclude with a special announcement about Cobaltmetrics, linked data, and permissive data licenses!
- Gwen FranckEIFL, Lithuania
Gwen Franck is consultant and facilitator, interested in the ‘hands on’ aspects of Open Science such open access publishing, self-archiving… More →
- Victoria TsoukalaEuropean Commission
Victoria Tsoukala works as a Policy Officer in the European Commission, DG RTD.G2: Open Science, in Secondment from her position at the… More →
- Adriaan van der WeelLeiden University
Adriaan van der Weel is Bohn extraordinary professor of Modern Dutch Book History at the University of Leiden and lecturer in Book and… More →
- Sami SyrjämäkiFederation of Finnish Learned Societies
Dr Sami Syrjämäki is the head of publications at the Federation of Finnish Learned Societies. His expert work focuses on science policies… More →
- Thed van LeeuwenLeiden University
Thed van Leeuwen is a senior researcher at the Centre for Science and Technology Studies (CWTS) of Leiden University in the Netherlands. As… More →
- Andrei RostovtsevDissernet, Russia
Prof Andrei Rostovtsev is a Russian physicist, doctor of physical and mathematical sciences. He graduated from the National Research Nuclear… More →
- Vanessa ProudmanSPARC Europe
Vanessa Proudman is Director of SPARC Europe; she is working to make Open the default in Europe. Vanessa has 20 years’ international… More →
- Ana MarušićUniversity of Split
Ana Marušić is Professor of Anatomy and Chair of the Department of Research in Biomedicine and Health at the University of Split School of… More →
- Alen VodopijevecRuđer Bošković Institute
MSc Alen Vodopijevec obtained his diploma in 2003 at the University of Zagreb, Faculty of Social Sciences and Humanities, and currently is… More →
- Anita Pavić Pintarić
- Damien VannsonThunken
Builder at heart, driven by the satisfaction of turning shower thoughts and back-of-the-envelope plans into full-fledged, user-friendly… More →
- Danijel GudeljUniversity of Zagreb
Danijel Gudelj is M.A. of sociology and croatology, graduated at Centre for Croatian Studies, University of Zagreb. Currently, he is a… More →
- Blaž RebernjakUniversity of Zagreb
Blaž Rebernjak was born in Zagreb in 1983, where he finished primary and secondary schools. In 2007 he obtained his MA and in 2013 his PhD… More →
- Evgenia Arh
- Drahomira CuparUniversity of Zadar
Drahomira Cupar, Phd, is an assistant professor at the University of Zadar, Department of Information Sciences. She obtained her PhD in… More →
- Elizabeth WagerSideview
Elizabeth (Liz) Wager, PhD is a freelance consultant and trainer who has worked on six continents. She chaired the Committee on Publication… More →
- Filip HorvatUniversity of Rijeka
Filip Horvat is a librarian at the Faculty of Civil Engineering, University of Rijeka. He received his Master’s degree in Information… More →
- Goranka MitrovićNational and University Library in Zagreb
Goranka Mitrović, senior librarian, works at the National and University Library in Zagreb, Croatia (NUL) since 1993. Her research interest… More →
- Draženko CeljakUniversity Computing Centre
MSc Draženko Celjak is the head of data services at SRCE – University of Zagreb University Computing Centre. He coordinates and leads the… More →
- Iva Melinščak ZlodiUniversity of Zagreb
Iva Melinščak Zlodi works as an e-resources librarian at the Library of the University of Zagreb Faculty of Humanities and Social Sciences… More →
- Ivana MajerUniversity of Zagreb
Ivana Majer graduated from the Faculty of Humanities and Social Sciences at the University of Zagreb, and got her degree in Croatian… More →
- Irena KranjecUniversity of Zagreb
Irena Kranjec works as a subject librarian for information sciences at the Library of the Faculty of Humanities and Social Sciences… More →
- Jasminka MaravićCARNet Department for Education Support
Jasminka Maravić is Project Manager at CARNet Department for Education Support. During her 14 years in CARNet she has been involved in… More →
- Krešimir ZauderUniversity of Zadar
Krešimir Zauder was born in Zagreb, Croatia in 1980. He graduated Information science and English language and literature in 2006. In 201… More →
- Jure TriglavCollaborative Knowledge Foundation
Jure is the lead developer at the Collaborative Knowledge Foundation, where he develops the PubSweet framework and supports its community. More →
- Josipa Zetović
- Kristina RomićNational and University Library in Zagreb
Kristina Romić works at the Acquisition Department, National and University Library in Zagreb, Croatia. She graduated from the Faculty of… More →
- Ksenija Baždarić
- Ksenija Švenda RadeljakUniversity of Zagreb
Ksenija Švenda Radeljak is employed at the Library of Department of Social Work at the Faculty of Law University in Zagreb. The areas of her… More →
- Linda SīleUniversity of Antwerp
Linda Sīle is doctoral student at the University of Antwerp within the Centre for R&D Monitoring (ECOOM). My current work spans somewhat… More →
- Lovela Machala PoplašenUniversity of Zagreb
Lovela Machala Poplašen is a head librarian at the Andrija Štampar Library, School of Public Health, School of Medicine, University of… More →
- Ljiljana Poljak
- Luc BorutaThunken
Ph.D. in computational linguistics, natural language processor, interested in linked data and linguistic diversity. In previous lives, Luc… More →
- Ljiljana Jertec MusapSRCE – University Computing Centre, University of Zagreb
MSc Ljiljana Jertec is a librarian and computer specialist at SRCE – University of Zagreb University Computing Centre. She has a Master’s… More →
- Lucija VejmelkaUniversity of Zagreb
Lucija Vejmelka is an assistant professor at the University of Zagreb, Faculty of Law, Department of Social Works, where she leads the… More →
- Marijana Briški Gudelj
- Marijana GlavicaUniversity of Zagreb
MSc Marijana Glavica works as a systems librarian at the University of Zagreb Faculty of Humanities and Social Sciences Library, where she… More →
- Marina Cvitanušić BrečićCroatian Agency for Science and Higher Education
Marina Cvitanušić Brečić works at the Analytics and Statistics Department of the Croatian Agency for Science and Higher Education (ASHE… More →
- Marina GrubišićCroatian Agency for Science and Higher Education
Marina Grubišić is the head of the Analytics and Statistics Department of the Croatian Agency for Science and Higher Education (ASHE). She… More →
- Matko MarušićUniversity of Split
Matko Marušić is Professor Emeritus at the University of Split, Split, Croatia. He was a Professor at Medical Schools (in Zagreb and Split… More →
- Nicolas Robinson-Garcia
- Neven Pintarić
- Paulin RibbeOpenEdition
Paulin Ribbe is Project Manager for the OPERAS infrastructure at OpenEdition (France, Marseille - CNRS, AMU, EHESS, Avignon Univ.). He holds… More →
- Radovan VranaUniversity of Zagreb
Born in Zagreb, Croatia. Primary and secondary education completed in Zagreb. Croatia. Graduated information sciences and the English… More →
- Rafaelly StavaleUniversity of Brasília
Rafaelly Stavale is a current student of Nursing at Universidade de Brasília – UnB. She has recently completed the Principles and Practices… More →
- Olga KirillovaAssociation of Science Editors and Publishers (ASEP), Moscow, Russia
Olga V. Kirillova, Candidate of Science (Engineering, 2004), the President of the Association of Science Editors and Publishers (ASEP, since… More →
- Pierre MounierOpenEdition
Pierre Mounier is deputy director of OpenEdition , a comprehensive infrastructure based in France for open access publication and… More →
- Rodrigo CostasLeiden University
Rodrigo Costas is an experienced researcher in the field of information science and bibliometrics. With a PhD in Library and Information… More →
- Tihana RubićUniversity of Zagreb
Tihana Rubić is an assistant professor at the Department of Ethnology and Cultural Anthropology, Faculty of Humanities and Social Sciences… More →
- Vicko TomićUniversity of Split
Vicko Tomić is a research assistant at the Department of Research in Biomedicine and Health at the University of Split School of Medicine… More →
- Vanessa FairhurstCrossref
Vanessa Fairhurst joined Crossref in 2017 and is based at the Oxford office. As Community Outreach Manager, her role involves working… More →
- Želimir KurtanjekUniversity of Zagreb
Želimir Kurtanjek is a retired professor of chemical engineering with an interest in biotechnology, biostatistics and big data analytics… More →
- Vlatka BožičevićUniversity of Zagreb
Vlatka Božičević gratuated from Religious Pedagogy and Catechetics at the Catholic Faculty of Theology University of Zagreb and the… More →
- Željka Salopek
- Zoran VelagićUniversity of Osijek
Zoran Velagić is a professor of book history and publishing studies at the University of Osijek, Faculty of Humanities and Social Sciences… More →