Lost in Translatiøn

Lost in Translatiøn
Photo by Bit Cloud / Unsplash

TL;DR Our API vessel search just got a lot more forgiving. We now recognize special characters in names lost in translation or corrupted during transcription. We also say hei to 5,511 new 🇳🇴Norwegian vessel friends.

As someone who only watches films where at least one thing explodes, I get a lot of joy from cinephiles filmsplaining non-s’plody movies to me. Let’s take Sofia Coppola’s 2003 Lost in Translation. It’s a film everyone says they watched, but there is rarely consensus on the plot. Public datasets work in the same way. They are potent resources that seldom reconcile with the source data once they reach a user's desktop. For example, ‘ø’ can quickly become ‘o’. Character formatting between systems is a hellscape built from bad copy-paste, typos, special characters, and corrupted formatting, kinda like film reviews or your opinion of my analogies. They can be wrong.

Our focus this week was on improving our country-specific datasets. What has held us back were the accent marks (ê, ø, é) present in our existing country data, which were often omitted in client searches, returning a null response from the API despite the vessel indeed existing. In addition to ASCII symbols, Larry, you may recognize these from your tricked-out MySpace profile: °❦▒ ↳ꍏ☈☈☿ ▒❦°. These are generally present when the data being checked was transcribed from one language to another, and that system did not recognize the text. These characters can mess up search results for “exact match” systems like ours and make it hard for developers to plan for.

Norway was the perfect dataset to begin resolving these problems. They are one of the largest seafood providers in the world for herring, cod, mackerel, and haddock, in addition to a booming aquaculture industry. To add countries like Norway, which, by the way, is now live on the API, we needed to resolve all the potential character mismatches in our Variants logic; here are a few examples. Let’s search for the vesselNAME “KILVÆRFJORD II”. With the new input settings, we now recognize variations for things like Roman numerals and regional accent marks:

Example 1:

KILVÆRFJORD II
KILVÆRFJORD 2
KILVRFJORD II
KILVRFJORD 2
KILVAERFJORD II
KILVAERFJORD 2
KILVARFJORD II
KILVARFJORD 2
KILVERFJORD II
KILVERFJORD 2


Example 2:

VARDØYFISK II
VARDØYFISK II
VARDØYFISK 2
VARDYFISK II
VARDYFISK 2
VARDOYFISK II
VARDOYFISK 2

Ex. Sandbar

If you don't have an account with us and want to try it yourself, we added some boats from Norway’s full vessel registry to our public Sandbar. Try searching with the vesselName: “REINSBÅEN”,  “REINSBEN”, or “REINSBAEN”.

That’s it. Until next week, remember for relaxing times, make it Suntory time—your թɾօժմcԵ Եҽαต.