Textmining Tool

The Pubget Textmining Tool can transform your subscription PDFs into XML, including full text and associated metadata

Uploading New ProjectsIcon_linkout

Help_tm_upload_top

To begin a textmining project, enter a project name, change email address if desired, and choose one of three methods to add PMIDs

  1. Direct input via textbox -- simply copy and paste
  2. Help_tm_upload_box

  3. Upload a comma- or newline-separated file containing your chosen PMIDs, or a references file in RIS format
  4. Help_tm_upload_file

  5. Input a PubMed search phrase. All PubMed operators will work.
  6. Help_tm_upload_search_check

    Check the number of articles returned by this search with the "Check query size" button

Submit your project and wait for confirmation

Help_tm_flash

Importing From Embase Icon_linkout

Importing documents from Embase into Pubget is possible by using Embase's "export" feature

First, search and select your documents for export, either using the checkboxes...

Help_tm_upload_embase_select1

...or using the mass select buttons

Help_tm_upload_embase_select2

Then, export to RIS format

Help_tm_upload_embase_export1



Help_tm_upload_embase_export2

Import this file into Pubget just as we do above

Managing Existing Projects Icon_linkout

Once your project is running, you can see the status of all projects you have created. By clicking the "everyone" radio button, you can also see those of your colleages

Help_tm_status_overview

What each column means:

  • Name: Project name
  • Email: User to be emailed upon completion
  • Count: Number of documents in initial set (final count may be decreased for recency on email-extraction accounts)
  • Full text: Documents in project with full text found
  • Abstract only: Documents in project with abstracts found
  • Status: Status code indicating progress of listed projects
  • Holdings: Documents in project with no full text that should be available through subscriptions
  • Delivered: Number of articles delivered in resultant XML or Excel Doc (full text and abstract-only)

Status codes

  • Finished: Project has completed and user has been notified via email
  • Loading: Articles are being loading into Pubget's systems for download
  • Queued: Articles have been loaded, but project has not started downloading yet
  • Running: Project is currently downloading
  • Paused: Project is paused for system maintenance
  • Stalled: There is an error with the system that is causing downloading to stall; Pubget support is automatically notified when this happpens

Individual projects can be cancelled, if they are not finished, with the cancel button

Click an individual project name to show the project status view

Project Status View

Help_tm_status

What each column means

  • PMID: link to article on Pubget
  • PDF path: actual path to PDF online
  • Confidence: accuracy of conversion to XML based on comparison to Pubget's abstract
  • Word count: full text words converted from PDF

Creating Projects via API

In addition to creating projects via a web form, you can use the XML API to do the same

To do, pass an XML string to the API URL

The XML fields are:

<project_list>
  <project>
    <project_name>default2011</project_name>
    <user_email>matt@pubget.com</user_email>
    <pmids>
      <pmid>1001</pmid>
      <pmid>1002</pmid>
    </pmids>
    <query/>
  </project>
  <project>
    <project_name>default2012</project_name>
    <user_email>matt@pubget.com</user_email>
    <pmids>
      <pmid>1001</pmid>
      <pmid>1002</pmid>
    </pmids>
    <query/>
  </project>
</project_list>

You can add up to 10 projects at once, with up to 50,000 PMIDs total

Here is sample code in ruby Icon_linkout, perl Icon_linkout, and bash Icon_linkout

Note that you will have to handle two possible errors, which will be enumerated in the response:

  • No such user as named by the <user_email> tag
  • Project name has already been used at your institution