This search interface uses Solr as backend, therefore it accepts everything that Solr does as search queries.In these instructions two terms will be used frequently:
To search for a word, you can type word into the search bar and the search engine will find hits with that word in its passage.
When doing simple searches for a single word you can use the word alone, but you can also specify a field (which is also necessary for some more advanced searches).
E.g. typing text:word into the search field will find hits that contain the word word in the field text.
The available fields depend on whether you're searching for hits or clusters. To search for clusters, you must specify it from the advanced search.
Below is a list of all available fields for searching, followed by examples that show how some different search terms can be combined for more advanced queries.
Available fields when searching for hits:
cluster_id - Specifies this hit's cluster.
country - Country of the issue.
date - Full date of the issue.
doc_id - The exact ID of the page.
length - The length of the hit.
location - The city of the issue.
text - The text of the hit.
title - The title of the issue.
year - Year of the issue.
Available fields when searching for clusters:
all_countries - All countries the cluster spread to.
all_locations- All cities the cluster spread to.
average_length - The average length of all hits in the cluster.
cluster_id - The ID of the cluster.
count - The count of unique hits in the cluster.
crossed - true / false. True: if the cluster spanned across two or more countries.
ending_country - The country of the last hit in the cluster.
ending_date - The date of the last hit in the cluster.
ending_location - The city of the last hit in the cluster.
first_text - The text of the first hit in the cluster.
gap - The biggest gap in the cluster, i.e. the maximum difference in publishing date of two subsequent hits.
locations - The amount of different unique locations in the cluster.
starting_country - The country of the first hit in the cluster.
starting_date - The date of the first hit in the cluster.
starting_location - The city of the first hit in the cluster.
starting_year - The year of the first hit in the cluster.
in_city - The incoming city of this cluster.
in_country - The incoming country of this cluster.
in_date - Upcoming.
out_city - The port city of this cluster.
out_country - The port country of this cluster.
out_date - Upcoming.
timespan - The amount of days between the first and last hit in the cluster.
titles - The amount of different unique titles in the cluster.
virality_score - The virality score of the cluster.
This engine uses Solr's default query parser, lucene.
Below are some of the most common terms that can be used.
More info can be found on Solr's documentation here.
Boolean+text:word -- the word 'word' must appear in the text field of the hit.
-text:word -- the word 'word' must not appear in the text field of the hit.
These two can also be combined:
+text:word -text:cat - The word word must appear in the text field and the world cat must not.
Phrase searchDifferent solr terms are separated using whitespace, so if you want to search for a multi word phrase, you must wrap it in quotation marks:
text:"this is a word" - "This is a word" must appear in the passage.
Fuzzy matchingSolr can perform fuzzy matching, where words that are very similar to the query word are accepted.
text:dog~ - Words that are similar to dog are accepted. E.g. dag.
This can be useful to find hits in the database, as sometimes the OCR process may have degraded the quality so that exact matches arent sufficient anymore.
Range queries:count:[50 TO *] - Shows clusters that have more than 50 hits.
locations:[4 TO 5] - Shows clusters that have spread to 4 or 5 different unique locations.
Wildcards:word* - Search for words that start with word and then any possible endings.
word? - Search for word word where there is one extra character at the end.
Real examples queries:
If you want to find and see all the hits and/or clusters, type: *:*
brand* AND Åbo
Finds hits or clusters with different forms of the word brand (for example, branden) and the word Åbo which occur in the same text
brand* AND Åbo NOT Brandenburg
Finds hits or clusters with different forms of the word brand (for example, branden) and the word Åbo but excludes the word Brandenburg
Finds all hits from locations starting Mal-.
Finds all hits for the paper Vårt Land and also all clusters, including hits from Vårt Land.
timespan:[* TO 10]
Finds clusters with a timespan from 0 to 10. This is preprints within the same day (0) to those within ten days.
Finds a particular cluster.
count:[* TO 50]
Finds clusters with a count from 2 to 50. (The minimum count is 2, basically a text with one reprint.)
crossed:true AND count:[100 TO *]
Finds clusters that spanned two or more countries and with a count of 100 or more.
locations:[10 TO *]
Finds clusters which contain 10 or more printing locations.
all_locations:(Umeå AND Oulu)
Finds all clusters which have Umeå and Oulu in its printing locations.
all_locations:(Umeå OR Oulu)
Finds all clusters which have either Umeå or Oulu in their printing locations.
all_countries:(Finland AND Norway)
Finds all clusters which have been printed in both Finland and Norway.
all_countries:(Finland OR "United States")
Finds all clusters which have been printed in either Finland or in the United States.
Finds clusters that started on 1 October 1904.
starting_date:[1804-10-01T00:00:00Z TO 1904-10-01T00:00:00Z]
Finds clusters that have started between 1 October 1804 and 1 October 1904.
Combinations:in_city:Turku AND all_locations:(Oulu AND Vaasa)
Finds all clusters where the first printing in Finland occurred in Turku (incoming city) and that have Oulu and Vaasa among their printing locations.
virality_score: [90 TO 100]
Finds the clusters with the highest viral scores. The viral score is a number from 0 to 100.
gap:[100 TO *]
Finds clusters with reprints containing a largest time distance (a gap) of 100 years or more.