Abdessamad Nafissi | AI Language Engineer & Computational Linguist

About Me

Bridging Two Intellectual Worlds

Pioneering solutions at the intersection of human linguistics and machine computation.

I’m an AI Language Engineer and NLP Researcher who bridges linguistics and machine learning to build reliable language systems. With 10+ years of experience in translation, proofreading, and localization (Arabic, French, and English), I developed a deep intuition for how language works, down to morphology, syntax, semantics, and the details that most systems struggle with, like Arabic diacritics and dialect variation. I also pursue independent NLP research, with a focus on Arabic data and real-world language use.

Today, I focus on LLM training and evaluation data, dataset quality, and language tooling. I use Python, SQL, and regex to extract and validate high-signal data, and I build practical NLP workflows with tools like Hugging Face, PyTorch, spaCy, NLTK, and scikit-learn. My goal is simple: turn linguistic precision into measurable improvements in AI quality, especially for underrepresented and high-complexity languages like Arabic, while contributing research on topics like Arabic romanization (Arabizi) and dialectal variation.

10+

Years Experience

Translation & Localization

Core Languages

Arabic, French, English

10+

NLP Systems

Pipelines, Summarizers, MT

Published Volume

Manuscript Preparation

Skills Matrix

My Technical Toolkit

A customized combination of expert linguistic analysis, software engineering, and machine learning.

Linguistic Engineering

Translation & Proofreading 95%

Software Localization 90%

Morphological & Syntactic Parsing 90%

Multilingual Terminology DBs 85%

Natural Language Processing

Custom Preprocessing Pipelines 90%

Sentiment Analysis (VAD/Categorical) 85%

Transformers & MT (Seq2Seq) 80%

Named Entity Recognition (NER) 80%

Machine Learning & Code

Python (Pandas, Numpy, Scikit) 85%

PyTorch & Neural Networks 75%

Hugging Face APIs & Fine-Tuning 80%

Azure Cloud & ML Ops 70%

Timeline

My Professional Journey

Tracing my academic education and dual professional accomplishments over the years.

Education

2011 - 2013

Master of Translation (MIT)

École Supérieure Marocaine de Traduction et d'Interprétation (ESMTI)

Specialized in advanced multilingual translation techniques, linguistics methodologies, and legal/economic terminology preparation across English, French, and Arabic.

2008 - 2011

Bachelor of Arts (BA) in English Studies

Faculty of Letters and Humanities Mohammedia - University Hassan II

Thorough training in structural linguistics, grammatical synthesis, semantics, translation protocols, and literature studies.

2021 - Present

Self-Directed Advanced NLP & DL Specialization

Continuous Academic Study

Rigorous self-directed curriculum in advanced calculus, probability, neural network architecture (RNNs, LSTMs, Transformers), NLP pipeline design, and machine translation tuning.

Experience

Sep 2024 - May 2026

AI Language Engineer

Senior Data Linguist

Apple

Promoted to lead linguistic asset curation, localization QA, and data evaluation workflows for Siri voice features. Engineered custom NLP workflows, developed core terminology structures, and collaborated closely with machine learning engineers to turn linguistic precision into measurable Siri quality improvements.

May 2021 - Dec 2021

Data Linguist

Apple

Analyzed complex morphological, syntactic, and semantic patterns to build voice assistant systems. Designed large-scale text datasets and resolved systemic edge cases including Arabic diacritics, dialectal variations, and cultural nuances to train NLU Siri classifiers.

2013 - May 2021

Freelance Translator & Localization Specialist

Self-employed / Professional Translation Platforms

Provided expert localization and translation services across English, French, and Arabic. Specialized in app/web software localization and dense subject matters (ecology, economics, law, AI/IT) using CAT tools like Trados Studio and MemoQ.

My Portfolio

Featured Research & Projects

Explore the tangible outputs of my work in advanced computational linguistics and machine learning.

Research Project Maghribi Arabizi Lexical Mapping NLP Decoders

Maghribi Arabizi De-Romanization into Arabic Script

This project investigates how Maghribi Arabizi, an informal Romanized form of Moroccan Arabic, can be automatically converted into Arabic script. I benchmarked three de-romanization approaches: rule-based character mapping, statistical MLE word mapping, and a neural character-level Seq2Seq model. The results show that the MLE approach performs strongest on the held-out test set, highlighting the value of data-driven lexical mappings for noisy low-resource dialect text.

Under Review (Coming Soon) Try Sandbox Demo

Modular Social Media NLP pipeline architecture flowchart

Python API Pipeline Data Cleaning Social Media NLP

Modular Social Media Text Pipeline

A highly customizable Python NLP pipeline designed to scrape, clean, and structure unstructured social media texts. Built-in elements parse custom URLs, extract base domain names, sanitize characters, and convert visual emojis into high-semantic text tokens without compromising grammatical syntax or context.

GitHub Repository

ACADEMIC PRESS

Coviability of Social & Ecological Systems

RABIAA • SOUGRI • NAFISSI

Book Translation Scientific Terminology Ecology

Book Translation: Ecological Coviability

Co-provided professional manuscript preparation and technical translation for the comprehensive academic volume "Coviability of Social and Ecological Systems: Reconnecting Mankind to the Biosphere". Meticulously resolved dense economic, regulatory, and ecological terminologies between English, French, and Arabic.

Publication Details

Research Sandbox

Arabizi De-romanization

Test the hybrid MLE and RegEx fallback decoding system for Moroccan Arabizi (Darija) proposed in our coming research.

Arabizi is a romanized writing system where Latin characters and numerals are used to transcribe Arabic dialects (especially in chat messages). Because Moroccan Darija is highly morphologically rich and phonetic, general Seq2Seq models often struggle due to low-resource training data.

Our research proposes a hybrid approach combining a Maximum Likelihood Estimation (MLE) model with a robust phonological RegEx Fallback decoder. On my held-out blind test split, the MLE system achieved the strongest performance among the three implemented baselines.

Notebook Experiment Results

Internal evaluation on a held-out blind test split from my research notebook.

7.24

Rule-Based

35.44

MLE Prediction

9.62

Seq2Seq NMT

Note: Results are from an internal research experiment and are not a published benchmark.

Dataset Credit: Powered by the UBC-NLP/nilechat-arabizi-mor dataset, which is based on the original NileChat corpus.

Arabizi-Decoder v1.0

Test Examples

Input Arabizi (Latin script)

Loading MLE dictionary & evaluating phonetics...

Get in Touch

Start a Conversation

Looking to collaborate on LLM training data, custom NLP tooling, or advanced Arabic linguistics? Send me a message below.

Email Me

abdessamadnfs@gmail.com

Location

Tampa Bay Area, FL

Connect Internationally

For open-source developments, resume deep-dives, or professional network inquiries, find me on these networks.

Full Name

Email Address

Subject

Message

Abdessamad Nafissi AI Language Engineer