Textmining Tool
The Pubget Textmining Tool can transform your subscription PDFs into XML, including full text and associated metadata
- Uploading New Projects
- Importing From Embase
- Managing Existing Projects
- Project Status View
- Creating Projects via API
Uploading New Projects
To begin a textmining project, enter a project name, change email address if desired, and choose one of three methods to add PMIDs
- Direct input via textbox -- simply copy and paste
- Upload a comma- or newline-separated file containing your chosen PMIDs, or a references file in RIS format
- Input a PubMed search phrase. All PubMed operators will work.



Check the number of articles returned by this search with the "Check query size" button
Submit your project and wait for confirmation
Importing From Embase 
Importing documents from Embase into Pubget is possible by using Embase's "export" feature
First, search and select your documents for export, either using the checkboxes...


Then, export to RIS format


Import this file into Pubget just as we do above
Managing Existing Projects 
Once your project is running, you can see the status of all projects you have created. By clicking the "everyone" radio button, you can also see those of your colleages
What each column means:
- Name: Project name
- Email: User to be emailed upon completion
- Count: Number of documents in initial set (final count may be decreased for recency on email-extraction accounts)
- Full text: Documents in project with full text found
- Abstract only: Documents in project with abstracts found
- Status: Status code indicating progress of listed projects
- Holdings: Documents in project with no full text that should be available through subscriptions
- Delivered: Number of articles delivered in resultant XML or Excel Doc (full text and abstract-only)
Status codes
- Finished: Project has completed and user has been notified via email
- Loading: Articles are being loading into Pubget's systems for download
- Queued: Articles have been loaded, but project has not started downloading yet
- Running: Project is currently downloading
- Paused: Project is paused for system maintenance
- Stalled: There is an error with the system that is causing downloading to stall; Pubget support is automatically notified when this happpens
Individual projects can be cancelled, if they are not finished, with the cancel button
Click an individual project name to show the project status view
Project Status View
What each column means
- PMID: link to article on Pubget
- PDF path: actual path to PDF online
- Confidence: accuracy of conversion to XML based on comparison to Pubget's abstract
- Word count: full text words converted from PDF
Creating Projects via API
In addition to creating projects via a web form, you can use the XML API to do the same
To do, pass an XML string to the API URL
The XML fields are:
<project_list>
<project>
<project_name>default2011</project_name>
<user_email>matt@pubget.com</user_email>
<pmids>
<pmid>1001</pmid>
<pmid>1002</pmid>
</pmids>
<query/>
</project>
<project>
<project_name>default2012</project_name>
<user_email>matt@pubget.com</user_email>
<pmids>
<pmid>1001</pmid>
<pmid>1002</pmid>
</pmids>
<query/>
</project>
</project_list>
You can add up to 10 projects at once, with up to 50,000 PMIDs total
Here is sample code in ruby
,
perl
, and bash ![]()
Note that you will have to handle two possible errors, which will be enumerated in the response:
- No such user as named by the <user_email> tag
- Project name has already been used at your institution