More than 6 800 languages and dialects are spoken worldwide: among these, only 600 are written.
Vision Objects consistently develops new languages through a perfectly mastered industrial process based on the collection of complete writing samples which has proved its worth, as today, accuracy rates beat all records.
Vision Objects supports more than 97 languages in charcater by chracter recognition and 54 languages in cursive handwriting including the world’s most widely used spoken languages:
At the heart of MyScript handwriting recognition technology
Different types of handwriting
Vision Objects’ technology recognizes all types of writing styles using the world’s most widely used alphabets: Arabic, Chinese, Cyrillic, Devanagari, Greek, Hebrew, Japanese, Korean, Latin, Tamil and Thai.
In order to be recognized, handwriting first needs to be segmented into characters, words and sentences. This segmentation differs according to the style of writing. We identify three different types of handwriting:
- Isolated characters: each character is written separately in boxed fields and the segmentation of consecutive characters is explicit. This is often used in form processing where high recognition accuracy is required.
- Handprinted characters: letters do not touch each other and the pen is lifted between two consecutive characters. Segmentation is implicit and needs to be computed by the software.
- Cursive handwriting: the most difficult type to recognize. It requires the additional use of data formats, lexicons and language models for reliable, accurate recognition.
Natural handwriting is a mix of handprinted and cursive handwriting as some letters are connected to each other and some are disconnected:
In some languages, such as Chinese, natural handwriting looks completely different compared to standardized handwriting. Dealing with these variations in writing styles is a real challenge for the MyScript recognizer:
Chinese characters of "Semi-cursive Script" in
regular script (left) and semi-cursive script (right).
The recognizer takes into account a large amount of language specificities:
- characters: some languages are based on alphabets (e.g. English, Greek, Cyrillic), others use ideograms (e.g. Chinese or Japanese),
- Mix of Asian languages with English cursive handwriting: MyScript recognizes cursive English when combined with Asian languages (Simplified and traditional Chinese, Japanese, Korean)
- writing directions:language can be written in different directions (left to right, right to left).
- Japanese vertical writing: MyScript identifies and converts Japanese vertical writing
Variation in writing styles
The main challenge in handwriting recognition is to deal with individual handwriting styles, including writing slants and shapes.
Also, handwriting varies from country to country. English, for example, is spoken in many countries (UK, USA, Canada, etc.) but the vocabulary and writing styles may vary from one place to another.
Handwritten text analysis
To manage the complex nature of a language, MyScript takes into account a large range of linguistic information:
- Lexicons: narrow down the recognition possibilities and therefore increase accuracy.
- DataFormats: describe the expected format for specific information (e.g. phone numbers, email addresses, etc).
- Language models: supply “linguistic intelligence” to the recognition engine. They describe statistically how common language is formed and the probability of written words occurring together.
MyScript language offering
To handle variation in handwriting style, Vision Objects collects handwriting samples written by thousands of individuals for each different language and country. These handwriting samples are used to train the MyScript handwriting recognition engine.
To ensure the most efficient recognition for each application, Vision Objects offers two lines of linguistic resources:
- MyScript Lingo: MyScript Lingo is a set of 54 language packs available with MyScript Builder Standard Edition, MyScript Builder Mobile Edition or MyScript Builder Embedded Edition software development kits. MyScript Lingo is designed particularly for note taking applications requiring cursive handwriting, including for embedded platforms. MyScript Lingo includes language models and takes into account variations in writing styles and the linguistic context in order to provide highly accurate recognition of all types of handwriting: from isolated characters to natural handwriting.
- MyScript Letra: MyScript Letra provides resources for more than 97 languages and enables the recognition of isolated and possibly handprinted characters. It does not provide any advanced linguistic resources or language models using lower memory footprint than MyScript Lingo. MyScript Letra is particularly suitable for embedded devices.
Each language pack contains its own unique group of resources. These are used by different parts of the MyScript Builder Software Development Kits for the recognition process.