tl;dr We feature 10 of the top Natural Language Processing (NLP) code repositories on Github from Singapore. The ranking is decided based on the total stars (stargazer count) of the repositories.

10. A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

Architecture of the multilayer convolutional model with seven encoder and seven decoder layers. Retrieved from the official paper.

Code and model files for the paper: “A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction” (Published at AAAI-18).

The authors improve the automatic correction of grammatical, orthographic, and collocation errors in text using a multilayer convolutional encoder-decoder neural network.

Repositorynusnlp/mlconvgec2018
LicenseGNU General Public License v3.0
AuthorNUS NLP Group (nusnlp)
VocationNLP Group at National University of Singapore
LanguageStarsForksOpen Issues
Python160731

9. Project Insight: NLP as a Service

Project Insight: NLP as a Service. Retrieved from Github.

Project Insight is an NLP as a service project with a frontend UI (streamlit) and backend server (FastApi) serving transformers models on various downstream NLP task.

The downstream NLP tasks covered are:

  • News Classification
  • Entity Recognition
  • Sentiment Analysis
  • Summarization
Repositoryabhimishra91/insight
LicenseGNU General Public License v3.0
AuthorAbhishek Kumar Mishra (abhimishra91)
VocationOperations Innovation Lead at IHS Markit
LanguageStarsForksOpen Issues
Python225294

8. An Unsupervised Neural Attention Model for Aspect Extraction

An example of an Attention-based Aspect Extraction (ABAE) structure. Retrieved from the official paper.

Code for the ACL2017 paper “An Unsupervised Neural Attention Model for Aspect Extraction”.

Aspect extraction is one of the key tasks in sentiment analysis. It aims to extract entity aspects on which opinions have been expressed. For example, in the sentence “The beef was tender and melted in my mouth”, the aspect term is “beef”. Experimental results on real-life datasets demonstrate that our approach discovers more meaningful and coherent aspects, and substantially outperforms baseline methods on several evaluation tasks. Aspect-Based Sentiment analysis (ABSA) can then be performed on the set of aspects in the downstream task.

Repositoryruidan/Unsupervised-Aspect-Extraction
LicenseApache License 2.0
AuthorRuidan He (ruidan)
VocationNLP scientist at Alibaba DAMO Academy. Ph.D. from NUS.
LanguageStarsForksOpen Issues
Python26510716

7. ✍🏻 gpt2-client: Easy-to-use TensorFlow Wrapper for GPT-2 🤖 📝

Exploring GPT-2 models in less than five lines of code. Retrieved from Github.

GPT-2 is a Natural Language Processing model developed by OpenAI for text generation. The model has 4 versions - 117M, 345M, 774M, and 1558M - that differ in terms of the amount of training data fed to it and the number of parameters they contain.

gpt2-client is a wrapper around the original gpt-2 repository that features the same functionality but with more accessiblity, comprehensibility, and utilty. You can play around with all four GPT-2 models in less than five lines of code.

Repositoryrish-16/gpt2client
LicenseMIT License
AuthorRishabh Anand (rish-16)
VocationCS + ML Research at Advanced Robotics Center @ NUS
LanguageStarsForksOpen Issues
Python291595

6. A hierarchical CNN based model to detect Big Five personality traits

Architecture of the implemented network from the paper. Retrieved from the official paper.

This code implements the model discussed in Deep Learning-Based Document Modeling for Personality Detection from Text for detection of Big-Five personality traits, namely:

  • Extroversion
  • Neuroticism
  • Agreeableness
  • Conscientiousness
  • Openness
RepositorySenticNet/personality-detection
LicenseMIT License
AuthorSenticNet (SenticNet)
VocationComputational Intelligence Lab (CIL) in the School of Computer Science and Engineering (SCSE) of Nanyang Technological University (NTU).
LanguageStarsForksOpen Issues
Python31512622

5. SymSpell: Very Fast Spell Checking in Python

Performance of SymSpell (C# version) vs other edit distance/spell check algorithms. Retrieved from Github.

A Python port of SymSpell, a 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm.

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.

Repositorymammothb/symspellpy
LicenseMIT License
Authormmb L (mammothb)
VocationDon’t Know
LanguageStarsForksOpen Issues
Python3917510

4. Textstat: Python Package to Calculate Readability Statistics of Text

Example code to use the Textstat library.

Textstat is an easy to use library to calculate statistics from text. It helps determine readability, complexity, and grade level.

It supports various statistics including: Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Fog Scale (Gunning FOG Formula), SMOG Index, Automated Readability Index, Coleman-Liau Index, Linsear Write Formula and the Dale-Chall Readability Score.

Repositoryshivam5992/textstat
LicenseMIT License
AuthorShivam Bansal (shivam5992)
VocationData Scientist | Natural Language Processing + Machine Learning Enthusiast | Data Stories + Visuals | Programmer + Coder | at H2O.ai
LanguageStarsForksOpen Issues
Python55410718

3. Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc.

Flowchart of the majority voting ensemble used. Retrieved from the official report.

Sentiment classification on twitter dataset. The authors use a number of machine learning and deep learning methods to perform sentiment analysis (Naive Bayes, SVM, CNN, LSTM, etc.). The authors finally use a majority vote ensemble method with 5 of our best models to achieve the classification accuracy of 83.58% on kaggle public leaderboard.

Repositoryabdulfatir/twitter-sentiment-analysis
LicenseMIT License
AuthorAbdul Fatir (abdulfatir)
VocationCS PhD Student at National University of Singapore
LanguageStarsForksOpen Issues
Python91844222

2. RASA NLU for Chinese

Natural Language Understanding outputs from the RASA NLU Chinese model. Retrieved from Github.

A fork from the RASA (contextual AI assistant/chatbot) NLU repository. Focused on the Natural Language Understanding (NLU) task (中文自然语言理解). Turn a chinese natural language sentence/utterance into structured data.

Repositorycrownpku/Rasa_NLU_Chi
LicenseApache License 2.0
AuthorGuan Wang (crownpku)
VocationSenior Data Scientist, VP at Swiss Re
LanguageStarsForksOpen Issues
Python1,10335874

1. Chinese Named Entity Recognition and Relation Extraction

Visualization of 3x3, 7x7 and 15x15 receptive fields produced by 1, 2 and 4 dilated convolutions by the IDCNN model.

An NLP repository including state-of-art deep learning methods for various tasks in chinese/mandarin language (中文): named entity recognition (NER/实体识别), relation extraction (RE/关系提取) and word segmentation.

Repositorycrownpku/Information-Extraction-Chinese
LicenseNot Specified
AuthorGuan Wang (crownpku)
VocationSenior Data Scientist, VP at Swiss Re
LanguageStarsForksOpen Issues
Python1,692748102

More NLP repositories

Visit machinelearning.sg to view a full list of ML repositories from Singapore.