In the middle of the highway, my father turned to our car navigation system for directions. Before chalking out a route from Calcutta to Santiniketan, the system asked our language preference — would we want to be guided through the alleys, bridges and villages in English, Bengali, Hindi, Marathi, Malayalam, the list went on and on. We have come a long way since the days when the only way to communicate with devices was via the keyboard or keypad.
But how does a computer, or any system built on technology, think, understand, and, in the case of car navigation systems, even speak in regional languages? Artificial intelligence, of course — one of the most important parts of which is the concept of natural language processing or NLP. It explains how a machine can understand and interpret human language.
To teach a computer one or more languages, it is fed with certain rules called “grammars”. Based on a certain grammar, the computer builds an understanding of the corresponding language. Sounds exactly like what we humans do, right? No wonder Alan Turing, father of theoretical computer science and artificial intelligence, claimed, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.”
And now they can. In 2017, The New Yorker published a Donald Trump speech composed entirely by a computer. They fed 2,70,000 words previously spoken by the current President of the United States of America into an algorithm, which analysed patterns in the language, and generated words, phrases and sentences based on the data it interpreted. It really sounded like a Trump speech.
Just like a large amount of training data is fed to an algorithm, humans read books, listen to and participate in conversations. The algorithm identifies patterns in the language input the same way that we recognise the behaviour of words and the relationships words and phrases have. Based on that acquired knowledge, we — a computer or a human — present our own permutations and combinations of the language — that is, speak in it. This highlights the two broad phases of any text mining algorithm, namely, analysis and generation.
I recently came across a meme saying, “Once you read a dictionary, everything else is simply a remix.” From the perspective of computational linguistics, it is quite right.
Numerous fields in today’s world are dependent on computational linguistics, NLP and text mining — such as translation applications, sentiment analysis, literature and library databases, and even psychology and neuroscience.
Shail Shah, pursuing his master’s degree in Data Science from the Illinois Institute of Technology in Chicago, US, recently published a work dealing with a Query-Based Text Summariser. “Given a document, we analyse how accurately the system can pick up a relevant answer,” he said. This concept has already been standardised by the Google search engine.
Language systems are based on statistical models, where each model is built on a humongous volume of data (therein comes the name, “big data”). Computational linguistics is of use in social media analytics, where, using machine-learning tools, a computer can determine the sentiment attached to a tweet, or Facebook post. The data collected for this is dynamic — it is constantly updating itself — facilitated by each one of us, adding to the data sets, or text corpus, with every update, text and hashtag we use.
The aim of Bodhisattwa Majumder, PhD student at the University of California, San Diego, US, is to enable “sample-efficient methods to improve machine reading”. One of the areas he is working on is structured prediction. In fact, NLP forms the basis of various software that have a prediction module embedded — for example, predictive text in mobile phone keypads. When we key in words, the sequence we use — and their frequencies — get added to the database. An analysis of this input allows the mobile software to predict what we are about to type. We can also see this feature while typing an email on our personal computers —where data are extracted from various fields —such as email subject, contacts, past usage — and the program predicts the sentence we are typing. “In the US, from the industry to academia, there is a surge of work involving NLP,” says Majumder, who has worked with Walmart Labs and Google AI. “Working with languages will play a key role in building artificial general intelligence and common sense AI. This will allow us to improve machine understanding,” he adds.
Any student interested in the field should follow Google Careers. They have internship opportunities, summer programmes and full-time positions. When hiring for computational linguistics positions in India, they wanted applicants who were fluent in certain regional languages. The NLP wing of the company seems to be developing their different modules — virtual assistant, navigation system and so on — to be accessible to everyone, in whichever part of the world they might be located. This allows everyone to make the technology their own. That, I believe, is the key to progress.