Titles of original works starting with an article are left-stripped (#4549)

oddrun · September 20, 2021, 7:18pm

Example:

Advanced search for person=Henrik Ibsen → 1 entry in result list, good!
Click on entry to show original works → Most of his original works were listed, I think (plus some aggregations, but that’s another matter), - also very good.
However, beginning articles are stripped from the title:
Et dukkehjem is shown as Dukkehjem
De unges forbund is shown as Unges forbund
The same bug appeared in v. 1.0, and I wonder if the source of the problem is the number of nonfiling characters encoded in the second indicator in title fields?

filipjakobsen · September 20, 2021, 11:24pm

Thanks for testing and for your comment, @oddrun !

Let me look into this!

oddrun · September 21, 2021, 6:07am

Fine! I suspect this bug is a result of the bibliographic data processing, though, not the UI.

annadis · September 22, 2021, 9:31am

Hi,

@oddrun you’re right:

there is a rule that strip the article from the title in all tags where the skip infiling is present (tags: 130/240/245/730/830/740)
in particular, for the title Et dukkehjem (present in the new SVDE), there are a lot of records with tag 240 where the title is entered without the article.

oddrun · September 22, 2021, 2:10pm

In my opinion, we must distinguish between which characters of a title are used in sorting/filing, and the actual title itself. For example, a title field like
240 #3 $aEt dukkehjem
doesn’t mean that the first 3 characters of the title should be removed, only that the first 3 characters should be ignored in sorting procedures, i.e. when finding its place in an alphabetical index or filing system. There, this title should appear in the ‘D’ part.
However the (preferred) title is still Et dukkehjem, and should be rendered as such. I for one think it is very important to show the preferred titles of original works correctly.

annadis · September 23, 2021, 11:28am

Dear Oddrun, you’re right: the skip is used for the sorting of the titles.
I try to better argue our decision to apply this rule.
We process millions of records (bibliographic and authority) that come from different libraries, organizations and so on. Each one of them apply different cataloging practices.
As I already affirmed, the title of the opus is coming from tag: 130/240/245/730/830/740, but also from $t of tag 1XX/7XX. So, we can have:
• Bibliographic records where the title is in:

240 00$aEt dukkehjem
240 03$aEt dukkehjem
240 10$aDukkehjem
700 1 $aIbsen, Henrik,$d1828-1906$tDukkehjem

• Authority records where the title is in:
5. 130 #0 $aDukkehjem
6. 130 #0 $aEt Dukkehjem
7. 130 #3 $aEt Dukkehjem
8. ‎ 100 1# ‡aIbsen, Henrik‏ ‎‡d (1828-1906).‏‡tDukkehjem

Analyzing a large amount of record, we noticed that in most of them the article is not expressed in the related indicator for the skip (see case 1 and 6) and also it’s not reported in the title (see case 3 and 5).
Moreover many opus titles come from $t in tag NT (1XX, 7XX), so in order to reconcile all entities, we decided to clusterize the title without article.

oddrun · September 24, 2021, 7:30am

I see (I think). So, if every reference (in all the incoming records) to this title had been “Et dukkehjem”, then you would have kept that title in svde also - including the article? Does this mean that it’s not the eventual number in indicator 2 that causes the exclusion of articles, but the fact that some of the incoming data encode titles without a starting article?
I think this is not quite satisfactory in the long run, and it will cause a lot of J.Cricketing… Giving precedence to the records from the country of the author (or the country of first publication) as source of title could solve some of it.

annadis · October 1, 2021, 11:27am

Dear Oddrun,
I opened a development ticket to find a solution to this request. Please, in the meantime, can you send a list of norvegian stop words that we can use to improve our rules and algorithm?

oddrun · October 13, 2021, 6:00am

Hi, Here is a stopword list for Norwegian (bokmål and nynorsk): Stopwords in Norwegian (by Snowball)

annadis · October 13, 2021, 7:44am

Thanks very much @oddrun .

Topic		Replies	Views
Title search <exactly matches> Bugs	4	440	April 22, 2022
Diacritics for Romanized Japanese Bugs	1	249	July 13, 2022
Usage of $b in tag 245 to cluster the Opus Bugs clustering , to-do	1	262	October 13, 2021
Search result ranking to-do , search-api	2	329	November 9, 2021
Multiple entries of same Opus (in list of original works) Bugs front-end	2	270	September 26, 2022

Titles of original works starting with an article are left-stripped (#4549)

Related topics