Drowned in literature? These smart software tools can help


Artwork by The Project Twins

Whenever Eddie Smolyansky had a few moments to himself, he tried to keep up to date with new publications in his field. But in 2016, the Tel Aviv, Israel-based computer vision researcher was receiving hundreds of automated literature recommendations a day. “At one point, bathroom breaks weren’t enough,” he says. The recommendations were “far too many and impossible to follow”.

Smolyansky’s “food fatigue” will be familiar to many scholars. Academic alerting tools, originally designed to draw attention to relevant articles, have themselves become a hindrance, flooding the inboxes of scientists around the world.

“I haven’t even read my automated searches on PubMed lately, because it’s really overwhelming,” says Craig Kaplan, a biologist at the University of Pittsburgh in Pennsylvania. “Honestly, I can’t keep up with the literature.”

But change is on the way. In 2019, Smolyansky co-founded Connected Papers, one of a new generation of visual literature mapping and recommendation tools. Other services that promise to tame information overload, integrating Twitter feeds and daily news as well as search, are also available.

Origin story

Instead of providing a daily list of new articles via email, Connected Papers uses a single user-chosen “source article” to create a map of related research, based in part on overlapping citations. The service recently surpassed one million users, Smolyansky says.

Maps are color-coded according to publication date, and users can switch between “earlier”, seminal, and later, “derivative” articles that build on them. The idea is that scientists can search for an original paper they’re interested in and see on the resulting map which recent papers have caused a stir in their field, how they relate to other research, and how many citations they have. accumulated.

“You don’t have to sit on the pipe of papers and watch each paper that comes out for fear of missing it,” Smolyansky says. The tool is also useful when scientists want to delve into an entirely new area, he adds, offering an overview of essential literature.

Another visual mapping tool is Open Knowledge Maps, a service offered by a Vienna-based nonprofit of the same name. It was founded in 2015 by Peter Kraker, a former academic communication researcher at Graz University of Technology in Austria.

Open Knowledge Maps creates its maps based on keywords rather than a central article, and relies on text similarity and metadata to determine how articles are related. The tool organizes 100 articles in similar subfields into bubbles whose relative positions suggest similarity; a search for articles on ‘climate change’, for example, might generate a related bubble on ‘risk knowledge’.

Maps for these bubbles can be constructed in about 20 seconds, and users can modify them to include the 100 most recently published relevant articles, or other resources. Open Knowledge Maps includes not only journal articles, but also content such as datasets and research software. Its users have created more than 400,000 cards so far, says Kraker.

Amie Fairs, who studies languages ​​at the University of Aix-Marseille in France, is a self-proclaimed Open Knowledge Maps enthusiast. “One particularly nice thing about Open Knowledge Maps is that you can search for very broad topics, like ‘language production’, and it can group articles into topics you might not have not contemplated,” says Fairs. For example, when she searched for “phonological brain regions” — the areas of the brain that process sound and meaning — Open Knowledge Maps suggested a subfield for research into age-related processing differences. “I hadn’t considered looking at the aging literature for information on this before, but now I will,” she says.

Yet despite its enthusiasm for the service, Fairs still tends to find new articles through alerts from Google Scholar, the dominant tool in the field; it’s easier to go “down the rabbit hole,” she explains, following a chain of papers that quote each other.

Click to recommend

Google Scholar recommends articles based on articles users have written and listed in their profiles. The algorithm is not public, but the company says recommendations are based on “what topics you write about, where you post, authors you work with and cite, authors who work in the same domain you and the citation graph.” Users can manually set up additional email alerts based on searches by keywords or particular authors.

Aaron Tay, a librarian at Singapore Management University who studies academic research tools, receives literary recommendations from Twitter and Google Scholar, and finds that the latter often highlights the same articles as his human colleagues, albeit a few days later. late. Google Scholar “is almost always on target,” he says.

Besides published articles, Google Scholar could also pick up preprints as well as “poor quality theses and dissertations,” Tay says. Even so, “you get some gems you might not have seen,” he says. (Scopus, a competing literature database run by Amsterdam-based publisher Elsevier, began incorporating preprints earlier this year, a spokesperson said. But it does not index theses and dissertations. covered by Google Scholar,” he says.)

Google Scholar does not disclose the size of its database, but it is widely acknowledged to be the largest corpus in existence, with nearly 400 million articles by one estimate (Mr. Gusenbauer Scientometrics 118, 177–214; 2019). Open Knowledge Maps, meanwhile, is built on top of the open-source academic search engine Bielefeld, which has more than 270 million documents, including preprints, and is organized to remove spam.

Connected Papers uses the publicly available corpus compiled by Semantic Scholar – a tool set up in 2015 by the Allen Institute for Artificial Intelligence in Seattle, Washington – amounting to approximately 200 million papers, including preprints. Smolyansky acknowledges that this size discrepancy means that “very rarely” Google Scholar will find “a niche article from the 1970s” that Semantic Scholar does not find.

Semantic Scholar’s alert system, called the Adaptive Search Feed, builds a list of recommended articles that users can form by liking or disliking the articles they see. To decide which papers are similar to these, it uses a machine learning model trained on mutual citations and papers that Semantic Scholar users viewed sequentially. It has some 8 million monthly users.


Feedly, launched in 2008, also uses upvotes and downvotes to find out what new academic research is most relevant to the user, and boasts an AI assistant that can be trained on specific keywords or topics. But Feedly isn’t aimed specifically at researchers – it aims to be a comprehensive dashboard for monitoring news, RSS feeds (which help alert users to new content on websites), online forum Reddit, Twitter and podcasts. A free version is available, but additional features, such as the ability to track 100+ sources and hide ads, cost $6 or more per month (unlike most of the other tools mentioned here, which are completely free; another paid option is ResearchGate +Plus, which boosts user visibility and offers advanced statistics).

ResearchRabbit, which fully launched in August 2021, describes itself as “Spotify for Articles.” Users begin by saving relevant documents to a collection. With each item added, ResearchRabbit updates its list of recommended items, reflecting how the music streaming platform makes recommendations based on the songs users add to their playlists. The company behind it, based in Seattle, Washington, hasn’t revealed exactly how it assesses relevance, though it says it focuses on precise recommendations rather than floods of alerts. “We only want to send the most relevant articles to our users,” says chief executive Michael Ma.

Amber Brown Ruiz, a doctoral candidate in special education and disability policy at Virginia Commonwealth University in Richmond, finds ResearchRabbit’s alerts to be more personalized than Google Scholar, which sometimes feeds her articles that are superficially similar to her own work but are turn out to be well outside his discipline.

Ruiz also uses Connected Papers to find new articles. She finds it less automated than Google Scholar, which emails new articles, “but you can manually go in and figure out which articles are newer,” she says.

What all of these tools have in common is that they use a kind of artificial intelligence to come up with their recommendations. But some researchers appreciate the human touch, valuing recommendations from colleagues and contacts on Twitter, for example. ResearchGate, the long-running platform that bills itself as a sort of social network for scientists, says it offers the best of both worlds (ResearchGate is in a content-sharing partnership with Springer Nature, which publishes Nature).

Founded in 2008, ResearchGate sends article recommendations via email and delivers them via a streaming feed when users are logged in. (Users can also see a chronological feed of articles posted by their ResearchGate contacts.) Although it does not make its algorithm public, it does use information about a user’s posts and the posts they have. consulted on the platform to understand their interests. It then calculates related articles based on shared quotes and extracted topics and keywords. ResearchGate currently comprises some 149 million publication pages and has 20 million users.

“ResearchGate’s secret sauce is the combination of an active social network and a huge search graph,” says Joseph Debruin, director of product management at Los Angeles, Calif.-based ResearchGate.

Five years after realizing he was drowning in new papers, Smolyansky is finally able to get rid of his “scientific fear of missing out.” “You don’t have to have that FOMO feeling,” he says.


Comments are closed.