Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

10 years later, deep learning 'revolution' rages on, say AI pioneers ...

natural language processing nyu :: Article Creator

What Is NLP? Natural Language Processing Explained

Natural language processing is a branch of AI that enables computers to understand, process, and generate language just as people do — and its use in business is rapidly growing.

Natural language processing definition

Natural language processing (NLP) is the branch of artificial intelligence (AI) that deals with training computers to understand, process, and generate language. Search engines, machine translation services, and voice assistants are all

While the term originally referred to a system's ability to read, it's since become a colloquialism for all computational linguistics. Subcategories include natural language generation (NLG) — a computer's ability to create communication of its own — and natural language understanding (NLU) — the ability to understand slang, mispronunciations, misspellings, and other variants in language.

The introduction of transformer models in the 2017 paper "Attention Is All You Need" by Google researchers revolutionized NLP, leading to the creation of generative AI models such as Bidirectional Encoder Representations from Transformer (BERT) and subsequent DistilBERT — a smaller, faster, and more efficient BERT — Generative Pre-trained Transformer (GPT), and Google Bard.


From our editors straight to your inbox

Get started by entering your email address below.

How natural language processing works

NLP leverages machine learning (ML) algorithms trained on unstructured data, typically text, to analyze how elements of human language are structured together to impart meaning. Phrases, sentences, and sometimes entire books are fed into ML engines where they're processed using grammatical rules, people's real-life linguistic habits, and the like. An NLP algorithm uses this data to find patterns and extrapolate what comes next. For example, a translation algorithm that recognizes that, in French, "I'm going to the park" is "Je vais au parc" will learn to predict that "I'm going to the store" also begins with "Je vais au." All the algorithm then needs is the word for "store" to complete the translation task.

NLP applications

Machine translation is a powerful NLP application, but search is the most used. Every time you look something up in Google or Bing, you're helping to train the system. When you click on a search result, the system interprets it as confirmation that the results it has found are correct and uses this information to improve search results in the future.

Chatbots work the same way. They integrate with Slack, Microsoft Messenger, and other chat programs where they read the language you use, then turn on when you type in a trigger phrase. Voice assistants such as Siri and Alexa also kick into gear when they hear phrases like "Hey, Alexa." That's why critics say these programs are always listening; if they weren't, they'd never know when you need them. Unless you turn an app on manually, NLP programs must operate in the background, waiting for that phrase.

Transformer models take applications such as language translation and chatbots to a new level. Innovations such as the self-attention mechanism and multi-head attention enable these models to better weigh the importance of various parts of the input, and to process those parts in parallel rather than sequentially.

Rajeswaran V, senior director at Capgemini, notes that Open AI's GPT-3 model has mastered language without using any labeled data. By relying on morphology — the study of words, how they are formed, and their relationship to other words in the same language — GPT-3 can perform language translation much better than existing state-of-the-art models, he says.

NLP systems that rely on transformer models are especially strong at NLG.

Natural language processing examples

Data comes in many forms, but the largest untapped pool of data consists of text — and unstructured text in particular. Patents, product specifications, academic publications, market research, news, not to mention social media feeds, all have text as a primary component and the volume of text is constantly growing. Apply the technology to voice and the pool gets even larger. Here are three examples of how organizations are putting the technology to work:

  • Edmunds drives traffic with GPT: The online resource for automotive inventory and information has created a ChatGPT plugin that exposes its unstructured data — vehicle reviews, ratings, editorials — to the generative AI. The plugin enables ChatGPT to answer user questions about vehicles with its specialized content, driving traffic to its website.
  • Eli Lilly overcomes translation bottleneck: With global teams working in a variety of languages, the pharmaceutical firm developed Lilly Translate, a home-grown NLP solution, to help translate everything from internal training materials and formal, technical communications to regulatory agencies. Lilly Translate uses NLP and deep learning language models trained with life sciences and Lilly content to provide real-time translation of Word, Excel, PowerPoint, and text for users and systems.
  • Accenture uses NLP to analyze contracts: The company's Accenture Legal Intelligent Contract Exploration (ALICE) tool helps the global services firm's legal organization of 2,800 professionals perform text searches across its million-plus contracts, including searches for contract clauses. ALICE uses "word embedding" to go through contract documents paragraph by paragraph, looking for keywords to determine whether the paragraph relates to a particular contract clause type.
  • Natural language processing software

    Whether you're building a chatbot, voice assistant, predictive text application, or other application with NLP at its core, you'll need tools to help you do it. According to Technology Evaluation Centers, the most popular software includes:

  • Natural Language Toolkit (NLTK), an open-source framework for building Python programs to work with human language data. It was developed in the Department of Computer and Information Science at the University of Pennsylvania and provides interfaces to more than 50 corpora and lexical resources, a suite of text processing libraries, wrappers for natural language processing libraries, and a discussion forum. NLTK is offered under the Apache 2.0 license.
  • Mallet, an open-source, Java-based package for statistical NLP, document classification, clustering, topic modeling, information extraction, and other ML applications to text. It was primarily developed at the University of Massachusetts Amherst.
  • SpaCy, an open-source library for advanced natural language processing explicitly designed for production use rather than research. Licensed by MIT, SpaCy was made with high-level data science in mind and allows deep data mining.
  • Amazon Comprehend. This Amazon service doesn't require ML experience. It's intended to help organizations find insights from email, customer reviews, social media, support tickets, and other text. It uses sentiment analysis, part-of-speech extraction, and tokenization to parse the intention behind the words.
  • Google Cloud Translation. This API uses NLP to examine a source text to determine language and then use neural machine translation to dynamically translate the text into another language. The API allows users to integrate the functionality into their own programs.
  • Natural language processing courses

    There's a wide variety of resources available for learning to create and maintain NLP applications, many of which are free. They include:

  • NLP – Natural Language Processing with Python from Udemy. This course provides an introduction to natural language processing in Python, building to advanced topics such as sentiment analysis and the creation of chatbots. It consists of 11.5 hours of on-demand video, two articles, and three downloadable resources. The course costs $94.99, which includes a certificate of completion.
  • Data Science: Natural Language Processing in Python from Udemy. Aimed at NLP beginners who are conversant with Python, this course involves building a number of NLP applications and models, including a cipher decryption algorithm, spam detector, sentiment analysis model, and article spinner. The course consists of 12 hours of on-demand video and costs $99.99, which includes a certificate of completion.
  • Natural Language Processing Specialization from Coursera. This intermediate-level set of four courses is intended to prepare students to design NLP applications such as sentiment analysis, translation, text summarization, and chatbots. It includes a career certificate.
  • Hands On Natural Language Processing (NLP) using Python from Udemy. This course is for individuals with basic programming experience in any language, an understanding of object-oriented programming concepts, knowledge of basic to intermediate mathematics, and knowledge of matrix operations. It's completely project-based and involves building a text classifier for predicting sentiment of tweets in real-time, and an article summarizer that can fetch articles and find the summary. The course consists of 10.5 hours of on-demand video and eight articles, and costs $19.99, which includes a certificate of completion.
  • Natural Language Processing in TensorFlow by Coursera. This course is part of Coursera's TensorFlow in Practice Specialization, and covers using TensorFlow to build natural language processing systems that can process text and input sentences into a neural network. Coursera says it's an intermediate-level course and estimates it will take four weeks of study at four to five hours per week to complete.
  • NLP salaries

    Here are some of the most popular job titles related to NLP and the average salary (in US$) for each position, according to data from PayScale.

  • Computational linguist: $60,000 to $126,000
  • Data scientist: $79,000 to $137,000
  • Data science director: $107,000 to $215,000
  • Lead data scientist: $115,000 to $164,000
  • Machine learning engineer: $83,000 to $154,000
  • Senior data scientist: $113,000 to $177,000
  • Software engineer: $80,000 to $166,000

  • Studies In Natural Language Processing

    View description

    Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes. This fascinating problem offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis. This comprehensive introduction to the topic takes a natural-language-processing point of view to help readers understand the underlying structure of the problem and the language constructs commonly used to express opinions, sentiments, and emotions. The book covers core areas of sentiment analysis and also includes related topics such as debate analysis, intention mining, and fake-opinion detection. It will be a valuable resource for researchers and practitioners in natural language processing, computer science, management sciences, and the social sciences.In addition to traditional computational methods, this second edition includes recent deep learning methods to analyze and summarize sentiments and opinions, and also new material on emotion and mood analysis techniques, emotion-enhanced dialogues, and multimodal emotion analysis.

    Natural Language Processing Creates An Audit Trail For Risk Adjustment

    Photo: Kiyoshi Hijiki/Getty Images

    The best way for insurers to make sure they're in compliance with the mandates of risk adjustment is to use natural language processing for accurate documentation and auditing, according to Dr. Calum Yacoubian, director of Healthcare Strategy for Linguamatics, an IQVIA company that offers an NLP-based AI platform.

    Last week's publication of the final rule for risk adjustment data validation (RADV) comes after increasingly high profile instances of apparent over coding from Medicare Advantage Organizations, Yacoubian said.

    There must be an audit trail, he said. 

    NLP identifies gaps in care from unstructured notes in the clinical record. It enables the creation of a longitudinal patient record from multiple providers.


    In value-based care arrangements, payers need accurate risk adjustment to ensure they are properly compensated for assuming greater financial risk for patients. These savings are shared with providers.

    When payers don't capture the full spectrum of a patient's diagnosis, they may be at risk for cost overruns associated with treating those unidentified conditions.

    "There is a huge amount of medical record review for risk adjustment, to look at missed diagnoses," Yacoubian said.

    The coding must be correct, as payment amounts are determined by risk scores associated with various Hierarchical Condition Categories or groups of medical codes linked to specific clinical diagnoses.

    "As the population continues to age, the Medicare Advantage population, and therefore burden of care, is also increasing – and only set to get larger," Yacoubian said. "For the payers who are claiming appropriately, these new rules pose an increased burden upon them to ensure their submissions are audit proof."

    NLP can also be used for other predictive risk modeling, such as identifying patients at risk for hospital admission or readmission, he said.

    NLP has gone from something relatively niche and researched-focused to its being used by more than 50% of healthcare organizations in the United States, Yacoubian said.


    On January 30, the Centers for Medicare and Medicaid Services finalized risk adjustment policies in a final rule to prevent overpayments to Medicare Advantage Organizations.

    The Medicare Advantage Risk Adjustment Data Validation program is CMS's primary audit and oversight tool of MAO program payments. 

    As required by law, CMS' payments to MAOs are adjusted based on the health status of enrollees, as determined through medical diagnoses. 

    Studies and audits done separately by CMS and the Health and Human Services Office of Inspector General have shown that Medicare Advantage enrollees' medical records do not always support the diagnoses reported by MAOs, which leads to billions of dollars in overpayments to plans and increased costs to the Medicare program as well as taxpayers, CMS said.

    The Risk Adjustment Data Validation final rule holds insurers accountable.

    Twitter: @SusanJMorseEmail the writer: [email protected]

    This post first appeared on Autonomous AI, please read the originial post: here

    Share the post

    10 years later, deep learning 'revolution' rages on, say AI pioneers ...


    Subscribe to Autonomous Ai

    Get updates delivered right to your inbox!

    Thank you for your subscription