A simple library for communicating with publication information servers: pub med and semantic scholar.
Currently allows (a) searching on conjunctions and disjunctions, (b) fetching the details of a paper
(c) the publications citing a paper, (d) publications cited by a paper, (e) simple reporting of fetched information
and (f) storing fethed information to local databases.
Since version 0.1 the library supports caching of the paper information on Prolog term or csv data files
and odbc connected or sqlite databases. Also as of 0.1 pub_graph is debug/1 aware. To see information regarding
the progress of execution, use
?- debug(pub_graph).
The pack requires the curl executable to be in the path. Only tested on Linux.
It is being developed on SWI-Prolog 6.1.8 and it should also work on Yap Prolog.
To install under SWI simply do
?- pack_install(pub_graph). % and load with ?- use_module(library(pub_graph)).
The storing of paper and citation depends on db_facts and for sqlite connectivity on proSQlite (both available as SWI packs and from http://stoics.org.uk/~nicos/sware/)
ncbi
(https://www.ncbi.nlm.nih.gov/pubmed/) and semscholar
(http://semanticscholar.org/) are the known IdTypes.
The predicate does not connect to the server, it only type checks the shape of Id.
If Id is an integer or an atom that can be turned to an integer, then IdType is instantiated to ncbi
.
There are three term forms for semscholar
.
The following two ids correspond to the same paper.
?- pub_graph_id( 12075665, Type ). Type = ncbi. ?- pub_graph_id( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Type ). Type = semscholar.
?- pub_graph_version(V,D). V = 1:2:0, D = date(2023, 9, 20).
Search in pub_graph for terms in the search term STerm
.
In this, conjunction is marked by , (comma) and
disjunction by ; (semi-column). '-' pair terms are considered as
Key-Value and interpreted as Value[Key] in the query.
List are thought to be flat conjoint search terms with no pair values in them which are
interpreted by pub_graph also as OR operations.
(See example below.)
Known keys are : journal
, pdat
. au
, All Fields
The predicate constructs a query that is posted via the http API provided
by NCBI (http://www.ncbi.nlm.nih.gov/books/NBK25500/).
Options should be a term or list of terms from:
ncbi
terms: Title, Title/Abstract and Affiliation.
The higher the number the looser the match. The default allows for no intervening words, so only
exact sub-matches will be returned (see example: fixme below)
see: https://pubmed.ncbi.nlm.nih.gov/help/#proximity-searchingQTrans
the actual query ran on the
the pub_graph server.Tmp
is variable the file that was used
to receive the results from pub_graph.Keep==true
Verbose == true
then the predicate is verbose about its progress by,
for instance, requesting query is printed on current output stream.?- St = (journal=science,[breast,cancer],pdat=2008), pub_graph_search( St, Ids, [verbose(true),qtranslation(QTrans)] ), length( Ids, Len ), write( number_of:Len ), nl, pub_graph_summary_display( Ids, _, display(all) ). https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=100&term=science[journal]+AND+breast+cancer+AND+2008[pdat] tmp_file(/tmp/swipl_3884_9) number_of:6 ---- 1:19008416 Author=[Varambally S,Cao Q,Mani RS,Shankar S,Wang X,Ateeq B,Laxman B,Cao X,Jing X,Ramnarayanan K,Brenner JC,Yu J,Kim JH,Han B,Tan P,Kumar-Sinha C,Lonigro RJ,Palanisamy N,Maher CA,Chinnaiyan AM] Title=Genomic loss of microRNA-101 leads to overexpression of histone methyltransferase EZH2 in cancer. Source=Science Pages=1695-9 PubDate=2008 Dec 12 Volume=322 Issue=5908 ISSN=0036-8075 PmcRefCount=352 PubType=Journal Article FullJournalName=Science (New York, N.Y.) ---- 2:18927361 Author=Couzin J Title=Genetics. DNA test for breast cancer risk draws criticism. Source=Science ... ... ... 6:18239125 Author=[Silva JM,Marran K,Parker JS,Silva J,Golding M,Schlabach MR,Elledge SJ,Hannon GJ,Chang K] Title=Profiling essential genes in human mammary cells by multiplex RNAi screening. Source=Science Pages=617-20 PubDate=2008 Feb 1 Volume=319 Issue=5863 ISSN=0036-8075 PmcRefCount=132 PubType=Journal Article FullJournalName=Science (New York, N.Y.) ---- St = (journal=science, [breast, cancer], pdat=2008), Ids = ['19008416', '18927361', '18787170', '18487186', '18239126', '18239125'], QTrans = ['("Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]) AND ("breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]) AND 2008[pdat]'], Len = 6. ?- date(Date), St = (author='Borst Piet'), pub_graph_search( St, Ids, verbose(true) ), length( Ids, Len ), write( number_of:Len ), nl. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=100&term=Borst%20Piet\[author\] tmp_file(/tmp/swipl_18703_0) number_of:83 Date = date(2018, 9, 22), St = (author='Borst Piet'), Ids = ['29894693', '29256493', '28821557', '27021571', '26774285', '26530471', '26515061', '25799992', '25662217'|...], Len = 83. ?- date(Date), pub_graph_search( prolog, Ids ), length( Ids, Len ), write( number_of:Len ), nl. number_of:100 Date = date(2018, 9, 22), Ids = ['30089663', '28647861', '28486579', '27684214', '27142769', '25509153', '24995073', '22586414', '22462194'|...], Len = 100. ?- date(Date), pub_graph_search( prolog, Ids, retmax(200) ), length( Ids, Len ), write( number_of:Len ), nl. number_of:127 Date = date(2018, 9, 22), Ids = ['30089663', '28647861', '28486579', '27684214', '27142769', '25509153', '24995073', '22586414', '22462194'|...], Len = 127. ?- St = ('breast','cancer','Publication Type'='Review'), date(Date), pub_graph_search( St, Ids, reldate(30) ), length( Ids, Len ). Date = date(2018, 9, 22), Ids = ['30240898', '30240537', '30240152', '30238542', '30238005', '30237735', '30236642', '30236594', '30234119'|...], Len = 100. ?- pub_graph_summary_display( 30243159, _, true ). ---- 1:30243159 Author=[Wang K,Yee C,Tam S,Drost L,Chan S,Zaki P,Rico V,Ariello K,Dasios M,Lam H,DeAngelis C,Chow E] Title=Prevalence of pain in patients with breast cancer post-treatment: A systematic review. ---- true.
Version 0:3 (pub_graph_version(1:2:0,_D)
).
?- date(Date), pub_graph_search(title='Bayesian networks elucidate', Ids, true), length(Ids,Len). Date = date(2023, 9, 20), Ids = ['35379892'], Len = 1. ?- date(Date), pub_graph_search(title='Bayesian elucidate', Ids, true), length(Ids,Len). Date = date(2023, 9, 20), Ids = [], Len = 0. ?- date(Date), pub_graph_search(title='Bayesian elucidate', Ids, gap(1)), length(Ids, Len), pub_graph_summary_display(Ids, _, true). ---- 1:35379892 Author=[Angelopoulos N,Chatzipli A,Nangalia J,Maura F,Campbell PJ] Title=Bayesian networks elucidate complex genomic landscapes in cancer. ---- Date = date(2023, 9, 20), Ids = ['35379892'], Len = 1. ?- date(D), write('Appears in abstract: "explainable Artificial Intelligence models"'), nl, pub_graph_search('Title/Abstract'='explainable Artificial Intelligence models', Ids, true), pub_graph_summary_display(Ids). 1 ... 10:32417928 Author=[Payrovnaziri SN,Chen Z,Rengifo-Moreno P,Miller T,Bian J,Chen JH,Liu X,He Z] Title=Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. ?- date(D), pub_graph_search('Title/Abstract'='explainable Intelligence models', Ids, true). D = date(2023, 9, 20), Ids = []. ?- date(D), pub_graph_search((tiab='explainable Intelligence models',affiliation=sanger), Ids, gap(1)). D = date(2023, 9, 20), Ids = ['35379892'].
Also 0:3 added quote_value(Qv)
. Compare:
?- date(Date), pub_graph_search(title='Bayesian networks elucidate', Ids, true), length(Ids,Len). Date = date(2023, 9, 20), Ids = ['35379892'], Len = 1. ?- date(Date), pub_graph_search(title='Bayesian networks elucidate', Ids, quote_value(false)), length(Ids,Len). Date = date(2023, 9, 20), Ids = ['35923659', '35379892', '32609725', '29055062', '27303742', '26362267'], Len = 6.
pub_graph_summary_display( Ids, _Summary, [] ).
pub_graph_summary_display( Ids, Summary, [] ).
Opts
Ids
.
Disp values of var(Disp)
, '*' and 'all', list all available values.?- date(Date), pub_graph_search((programming,'Prolog'), Ids), length( Ids, Len), Ids = [A,B,C|_], pub_graph_summary_display( [A,B,C] ). ---- 1:28486579 Author=[Holmes IH,Mungall CJ] Title=BioMake: a GNU make-compatible utility for declarative workflow management. ---- 2:24995073 Author=[Melioli G,Spenser C,Reggiardo G,Passalacqua G,Compalati E,Rogkakou A,Riccio AM,Di Leo E,Nettis E,Canonica GW] Title=Allergenius, an expert system for the interpretation of allergen microarray results. ---- 3:22215819 Author=[Mørk S,Holmes I] Title=Evaluating bacterial gene-finding HMM structures as probabilistic logic programs. ---- Date = date(2018, 9, 22), Ids = ['28486579', '24995073', '22215819', '21980276', '15360781', '11809317', '9783213', '9293715', '9390313'|...], Len = 43. A = '28486579', B = '24995073', C = '22215819'.
?- pub_graph_summary_display( 30235570, _, display(*) ). ---- 1:30235570 Author=[Morgan CC,Huyck S,Jenkins M,Chen L,Bedding A,Coffey CS,Gaydos B,Wathen JK] Title=Adaptive Design: Results of 2012 Survey on Perception and Use. Source=Ther Innov Regul Sci Pages=473-481 PubDate=2014 Jul Volume=48 Issue=4 ISSN=2168-4790 PmcRefCount=0 PubType=Journal Article FullJournalName=Therapeutic innovation & regulatory science ----
?- pub_graph_cited_by( 20195494, These ), pub_graph_summary_display( These, _, [display(['Title','Author','PubDate'])] ). ---- 1:29975690 Author=[Tang K,Boudreau CG,Brown CM,Khadra A] Title=Paxillin phosphorylation at serine 273 and its effects on Rac, Rho and adhesion dynamics. PubDate=2018 Jul ---- 2:29694862 Author=[McKenzie M,Ha SM,Rammohan A,Radhakrishnan R,Ramakrishnan N] Title=Multivalent Binding of a Ligand-Coated Particle: Role of Shape, Size, and Ligand Heterogeneity. PubDate=2018 Apr 24 ---- 3:29669897 Author=[Padmanabhan P,Goodhill GJ] Title=Axon growth regulation by a bistable molecular switch. PubDate=2018 Apr 25 ... ... 26:20473365 Author=[Welf ES,Haugh JM] Title=Stochastic Dynamics of Membrane Protrusion Mediated by the DOCK180/Rac Pathway in Migrating Cells. PubDate=2010 Mar 1 ---- These = [29975690, 29694862, 29669897, 28752950, 27939309, 27588610, 27276271, 25969948, 25904526|...]. ?- pub_graph_summary_display( 20195494, _Res, true ). ---- 1:20195494 Author=[Cirit M,Krajcovic M,Choi CK,Welf ES,Horwitz AF,Haugh JM] Title=Stochastic model of integrin-mediated signaling and adhesion dynamics at the leading edges of migrating cells. ---- true. ?- pub_graph_summary_display( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, _, display(all) ). ---- 1:cbd251a03b1a29a94f7348f4f5c2f830ab80a909 arxivId=[] authors=[Graham J. L. Kemp,Nicos Angelopoulos,Peter M. D. Gray] doi=10.1109/TITB.2002.1006298 title=Architecture of a mediator for a bioinformatics database federation topics=[] venue=IEEE Transactions on Information Technology in Biomedicine year=2002 ---- true.
Options is a term option or list of terms from the following;
true
update the cache if you do an explicit retrieval.?- date(D), pub_graph_cited_by( 12075665, By ), length( By, Len ). D = date(2018, 9, 22), By = [25825659, 19497389, 19458771], Len = 3. ?- date(D), pub_graph_cited_by( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, By ), length( By, Len ). D = date(2018, 9, 22), By = ['2e1f686c2357cead711c8db034ff9aa2b7509621', '6f125881788967e1eec87e78b3d2db61d1a8d0ac'|...], Len = 12.
Options is a term option or list of terms from the following;
?- date(D), pub_graph_cites( 20195494, Ids ), length( Ids, Len ), write( D:Len ), nl. date(2018,9,22):38 D = date(2018, 9, 22), Ids = ['19160484', '19118212', '18955554', '18800171', '18586481'|...], Len = 38. % pubmed does not have references cited by the following paper: ?- date(D), pub_graph_cites( 12075665, Ids ), length( Ids, Len ), write( D:Len ), nl. false. % whereas, semanticscholar.org finds 17 (non '') of the 21: ?- date(D), pub_graph_cites( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Ids ), length( Ids, Len ), write( D:Len ), nl. date(2018,9,22):17 D = date(2018, 9, 22), Ids = ['6477792829dd059c7d318927858d307347c54c2e', '1448901572d1afd0019c86c42288108a94f1fb25', |...], Len = 17. ?- pub_graph_summary_display( 12075665, Results, true ). ---- 1:12075665 Author=[Kemp GJ,Angelopoulos N,Gray PM] Title=Architecture of a mediator for a bioinformatics database federation. ---- Results = [12075665-['Author'-['Kemp GJ', 'Angelopoulos N', 'Gray PM'], ... - ...|...]].
Can include journal impact factor if jif/6 is provided.
Output rows contain #citing, [IF ,] Date, Journal, Title, Author, (Title urled to pubmed/$id)
Opts
has(Val)
,quite(Val)
]?- pub_graph_table
id(s)
IdS.Options is a single term, or list of the following terms:
true
be verbose.false
if you dont want the cache to be updated with newly downloaded information.?- date(Date), Opts = names(['Author','PmcRefCount','Title']), pub_graph_summary_info( 12075665, Results, Opts ), write( date:Date ), nl, member( R, Results ), write( R ), nl, fail. date:date(2018,9,22) Author-[Kemp GJ,Angelopoulos N,Gray PM] PmcRefCount-3 Title-Architecture of a mediator for a bioinformatics database federation. false. ?- pub_graph_summary_info(12075665,Res,[]), member(R,Res), write( R ), nl, fail. Author-[Kemp GJ,Angelopoulos N,Gray PM] Title-Architecture of a mediator for a bioinformatics database federation. Source-IEEE Trans Inf Technol Biomed Pages-116-22 PubDate-2002 Jun Volume-6 Issue-2 ISSN-1089-7771 PmcRefCount-3 PubType-Journal Article FullJournalName-IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society ?- pub_graph_summary_info( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Res, true ), member( R, Res ), write( R ), nl, fail. arxivId-[] authors-[Graham J. L. Kemp,Nicos Angelopoulos,Peter M. D. Gray] doi-10.1109/TITB.2002.1006298 title-Architecture of a mediator for a bioinformatics database federation topics-[] venue-IEEE Transactions on Information Technology in Biomedicine year-2002 false.
?- pub_graph_abstracts( 24939894, Abs ). Abs = ['Lemur tyrosine kinase 3 (LMTK3) is associated with cell proliferation and',...].
Options is a single term, or list of the following terms:
Type == false
or absent to turn caching offtrue
Type is one of csv,prolog,sqlite
and odbc
. In the first 3 cases, Object should be a filename
and for odbc
it should be a DSN token. In the case of filenames, the default value for Object
is formed as, <type>_<id1>{_<id2>}.<type_ext>.
<type_ext> is either set to Ext or if this is missing it is deduced from Type. It can be set to ''
if you want no extension added.
Graph is compatible with the graph representation of Prolog unweighted graphs. That is, all vertices should appear in a keysorted list as V-Ns pairs, where V is the vertex and Ns is the sorted list of all its neighbours. Ns is the empty list if V has no neighbours, although this should only be the case here, if one of the input Ids has no citing papers or for the nodes at the edge of Depth.
?- pub_graph_cited_by_graph( 12075665, G, cache(sqlite) ).
Options is a single term or list of the following:
file(File)
file to use for storagesingle_file(Single)
boolean value, def. is true
.
false
seperate (aggregating) files are created
at each iteration
depth(D)
the overall depth limitcsv,prolog,odbc
and sqlite
files are recognised.
The former two are consulted into module pub_graph_cache
, and Handle is therofore not used.
For odbc/sqlite
files the lookups and database access is via the odbc and prosqlite libraries respectively.
Handle can be named to an alias of choise, otherwise a opaque atom is returned with which the db is accessed.
Which, should either be cited_by
or info
.
Options is a term or list of terms from:
ext(Ext)
extension to try on the file. Use the empty atom if you do not want the library to
use the default extension for the type of cache used.
Options are also passed to the underlying open operations for the type chosen. So for instance
you can provide the username and passward for the odbc connection with user(U)
and password(P)
.
Opts a term or list of terms from: