Research Profile

Table of contents

   Introduction & overview
   Research Profile screens
      Identified Items
      Automatic Suggestions
      Refused Items
      Manual Search
      Auto Update Preferences
      Document-to-Document Links
      Full-text URLs
         bin/export_fturls_choices script
      Main Research Profile screen
   Automatic research search
      Exact searches
      Additional searches
      Fuzzy name search
         bin/fuzzy_search_table script
   Configuration of the Research Profile and its screens
      Automatic search
      Document-to-document links configuration
      Full-text URLs configuration
      Fuzzy search
   Selected technical details
      Technical: database tables
         resources
         rp_suggestions
         ft_urls
         ft_urls_choices
      Technical: code structure

Introduction & overview

Research profile is one of the main parts of a personal record in ACIS. It lists the research works that the person has authored or otherwise took part in creation of. Research works are usually documents: articles or papers, but it can also be a book or a chapter in a book, a software component, a series, et cetera.

At the same time, research profile is a part of the ACIS web interface which is designed to let users manage their list of research works.

When a person includes a work into his or her RP, we oftern refer to the event as “claiming”; we say, for instance, the user claimed a document.

ACIS maintains its own database of documents and other research items. (We sometimes use a general word “resource” to refer to them.) And users do not have an ability to directly add their own stuff to the document database. The personal RPs can only include items that are already present in the resource database.

Research Profile screens

Identified Items

The identified items screen lists all the currently-claimed works of a person. And it allows to remove items from the list, for example, to fix a mistake of adding a wrong item.

Automatic Suggestions

Automatic search is the main procedure that we execute to find works for a person’s RP. Automatic Suggestions screen is where we show the results of the automatic search and let user either accept them or not, individually.

Refused Items

The refused items is a list of research items which should not be suggested for inclusion into the person’s RP. It is a blacklist of sorts.

The refused items screen lets user review the list and delete items for it, if desired.

Manual Search

While automatic search should find every resource an ACIS service has for a person, sometimes the metadata is not accurate. This and other reasons mean that automatic search is not always absolutely effective. Therefore, we let users do their own search by several different criteria: by the work title, by the author/editor name, by record identifier.

On the Manual search screen users do those searches and handle their results.

Auto Update Preferences

ACIS provides APU — automatic profile update service, which executes automatic research searches for a person even when user is not directly asking that. It may automatically add closely matching items to the person’s RP. But if user doesn’t want that service, he can disable it on the Auto Update Preferences screen.

Document-to-document links is an advanced feature of RP. It lets users connect the works of their RP with each other, specifying the type of relation between them. For instance, many different works may be different versions of the same research report. Some work is a continuation of an earlier one. And so on.

The range of possible relation types is defined by the system administrator.

On the Document-to-document links screen users can review and delete the links they have previously created and they can create new ones.

The links data are then exported in AMF with the user profile (if AMF export is configured with metadata-amf-output-dir). It may look like this

<text ref="repec:wop:cirano:96s14">
  <follow-up xmlns="http://acis.openlib.org/2007/doclinks-relations">
    <text xmlns="http://amf.openlib.org" ref="repec:mit:worpap:382"/>
  </follow-up>
</text>
<text ref="repec:wop:epruwp:9701">
  <isreferencedby>
    <text ref="repec:wop:cirano:97s41"/>
  </isreferencedby>
</text>

Full-text URLs

Another advanced and optional feature of RP. If you have full-text links for your research works (articles, papers, etc.) but the data is not 100% authoritative, you may ask the authors to review and flag right and wrong links. At the same time, you may ask them for their permission to archive the full-text file (if it is correct). Please refer to the Textilshchiki document, section Full-text file recognition for a better description of the rationale for this feature.

The Full-text URLs screen shows the currently known URLs for each of the RP items. (There may be several URLs per item.) And for each URL it shows its current status. If user made no decision about it yet, then the assumed default status is shown. Otherwise it shows the latest user-made decision. Thus user can review his or her previous decisions and change them.

See below instructions on how to configure the feature and on its input data format.

The collected data of users’ decisions can then be exported out of ACIS in a simple format:

bin/export_fturls_choices script

The script is for exporting data from the ft_urls_choices table (and some related fields in other tables). It outputs data on the standard output in a simple tab-delimited one-record-per-line format. The following fields are included (in this order):

The script may optinally accept one or two date parameters on the command line. With such parameters, script would only output decisions taken in the given period. If only one date is supplied, script outputs all data from that day on. The dates are expected in the YYYY-MM-DD format.

Main Research Profile screen

Displays a menu of all the screens with a brief introduction into each and some general status information. Provides a button to force automatic search for the person with her current name variations.

Automatic research search

Exact searches

This is search by the person’s name variations in the names of the document authors (and editors). As its name states, it finds exact matches only.

Additional searches

Fuzzy name search

Features to find mistyped author (editor) names in the document metadata.

This requires running bin/fuzzy_search_table utility every once in a while and some configuration.

Find a detailed explanation of how this is supposed to work in the Textilshchiki document, section Fuzzy searching.

bin/fuzzy_search_table script

The script initializes the database tables which are needed for the fuzzy name search to work. Should be run regularly. Depending on size of your documents database, it may take a while to do its job.

Takes no arguments and prints out its progress (the executed database statements) to standard output.

Configuration of the Research Profile and its screens

See all research profile parameters.

Automatic search

The whole feature has to be enabled with a document-document-links-profile parameter.

The relation types have to be specified in an XML file doclinks.conf.xml in the ACIS installation directory. The file has a simple structure; a self-explanatory example file is supplied in doclinks.conf.xml.eg.

Full-text URLs configuration

The whole feature won’t be there unless you have enabled it with a full-text-urls-recognition parameter.

The input data format is AMF-based. The authoritative URLs:

<text id=".."> 
 <file>
  <url>url</url>
 </file>
</text>

Automatically found URLs:

<text id=".."> 
 <hasversion>
  <text>
   <file>
    <url>url</url>
   </file>
  </text>
 </hasversion>
</text>

If you have full-text URLs data separate from the document data, configure it as a special metadata collection in main.conf. Use FullTextUrlsAMF as its type. E.g. this collection is named ‘URLs’:

metadata-collections="Papers URLs ..."
metadata-Papers-home=/path/to/Papers
metadata-Papers-type=AMF
metadata-URLs-home=/path/to/URLs/data
metadata-URLs-type=FullTextUrlsAMF
...

Before this data becomes available to users, it has to be processed with the update daemon. You have to explicitly request an update (see bin/updareq).

Fuzzy search

Selected technical details

Technical: database tables

resources

+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| id       | varchar(255) |      | PRI |         |       |
| sid      | varchar(15)  |      | MUL |         |       |
| type     | varchar(20)  |      |     |         |       |
| title    | varchar(255) |      | MUL |         |       |
| classif  | varchar(50)  | YES  |     | NULL    |       |
| location | text         | YES  |     | NULL    |       |
| authors  | text         | YES  |     | NULL    |       |
| urlabout | text         | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+

rp_suggestions

+--------+----------+------+-----+---------------------+-------+
| Field  | Type     | Null | Key | Default             | Extra |
+--------+----------+------+-----+---------------------+-------+
| psid   | char(15) |      | PRI |                     |       |
| dsid   | char(15) |      | PRI |                     |       |
| role   | char(15) |      |     |                     |       |
| reason | char(30) |      |     |                     |       |
| time   | datetime |      |     | 0000-00-00 00:00:00 |       |
+--------+----------+------+-----+---------------------+-------+

ft_urls

PRIMARY KEY( dsid, checksum ), index url_i(url(30)), index source_i(source(50))

ft_urls_choices

primary key prim(dsid, checksum, psid), index t_i(time), index psid_i(psid)

Technical: code structure

Core modules:

APU modules:

Document to document links:

Full-text URLs: