Language Understanding APIs

Profanity & Toxicity Detection for User-Generated Content
(Hate Speech Detection for User-generated Content)

Products -> Profanity & Toxicity Detection for User-Generated Content

Unicorn Profanity & Toxicity for User-generated Content

Consume this API (RapidAPI link)

Profanity & Toxicity for User-generated Content / Hate Speech Detection for User-generated Content

You can use the public SaaS version (RapidAPI), or contact us to get custom-made private API, or get the On-Premise version with 0 cents / text,
Set of dedicated semantic models regarding toxic and aggressive content (profanity words, toxicity, obscene words, threats, insults, identity hates, and others),
Trained and tested on tens of thousands of comments, reviews and other user-generated content,
Get a custom-made version that is tailored to your needs, trained&tested on your data (you get ready-to-use technology that you can integrate into your product in days).

What is inside Profanity & Toxicity Detection for User-Generated Content / Hate Speech Detection for User-generated Content

Profanity & Toxicity Detection for User-Generated Content is a set of dedicated semantic models regarding toxic and aggressive content. It was made on a various type of user-generated content (comments, forums, tweets, fb, etc.).

You can use the publicly available version on RapidAPI, or Contact Us to get more information about custom-made products tailored to your requirements and to your type of texts.

In a public version of API, we detect this type of information. Forgive us the language, we are just quoting creative users.

Profanity words

We detect 2000+ of profanity words and its variations.
Yes, humans are very creative.
Below, are just some examples of them.
pimp, boner, slut, ballsack, shit, dick, 2000+ more...

Toxicity

Listen you dumbass
You make me sick
Are you gay or what?
Go back to your boring life, etc.

Severe Toxcity

Go f**k yourself
Suck my d**k
White trash
Hope you die, etc.

Obscene

Cunt
Out of your ass
Ass f**k
Jerk off, etc.

Threat

Last warning
I will kill you
I will track you down
I'm gonna find you and your family
Watch your steps, etc.

Insult

You piece of s**t
Bloody liar
Stupid dummy
Moron of yourself
Admin is a pratt, etc.

Identity Hate

I hate you, etc.
Screw you, Nazi filth
Are you f**ing stupid?? Hey dickhead, etc.

Other

Other important information to you
Custom-made semantic models based on your data

If you want to get custom-made private API with different models, or receive more info about the custom On-Premise version with 0 cents/text

How it works on a single text

Example text (We apologize for the language):

Nice opinion. Go back to your boring life you idiot. I will find where you live. Beware of the dark.

Simplified output after processing by Profanity & Toxicity Detection for User-Generated Content:

1.	Nice opinion
2.	Go back to your boring life you idiot	Profanity Toxicity
4.	I will find where you live	Threat
5.	Beware of the dark	Threat

Use the SaaS version - pay per text (hosted via rapidAPI & Amazon AWS on a highly-scalable architecture)

Link to the SaaS version (RapidAPI)

If you want to test it or see how it works on your data, send us a dataset
(we do not collect your data)

Test it on your data

How we differ?

You can use the SaaS version, or get the On-Premise version with 0 cents / text! (BIG DATA Compatible)
High Resolution/richness of output - Dedicated Semantic Models in each product
Very high Fact Coverage of information contained within the review (state-of-the-art: 90%)
Very high, Human-like Accuracy (state-of-the-art: precision=90-95%, recall=70-85%)
Trained and tested on your data - we make sure that above parameters are maintained
Very fast and cpu-effective! On-Premise version on Amazon AWS medium-cpu instance - up to 1 million reviews a day.

You can use the SaaS version, or get the On-Premise version with 0 cents / text! (BIG DATA Compatible)
High Resolution/richness of output - 120+ Semantic Models in each product
Very high Fact Coverage of information contained within the review (state-of-the-art: 90%)
Very high, Human-like Accuracy (state-of-the-art: precision=90-95%, recall=70-85%)
Trained and tested on your data - we make sure that above parameters are maintained
Very fast and cpu-effective! On-Premise version on Amazon AWS medium-cpu instance - up to 1 million reviews a day.

We specialize in creating dedicated Language Understanding APIs for specific reviews or other user-generated content. We focused in travel, food, apps, surveys, profanity & toxicity detection and prepared a set of Language Understanding APIs for these domains. We also develop private custom-made Language Understanding APIs for any kind of text (reviews, comments, or other user-generated content). Contact Us to discuss what is possible, or Send Us Your Dataset to see how it works on your data.

We stand by the statement that we detect more information than Watson or than any other Deep Learning solution and we are more accurate than Google in a specific domain. We are also much more cost-effective. And we allow the processing of as much data as you want with no additional cost (0 cents / text).

This is possible because we created the next-generaion process of developing NLU systems in which we can build Semantic Models 10 x faster. It is designed from the ground up to process reviews and other user-generated content, not proper english texts. It is the result of 13 years of experience in NLP/NLU, 8 years in processing reviews, and more than 50 implementations (2007-2020) in companies, corporations, and startups of NLP/NLU systems and Language Understanding APIs to process reviews and other user-generated content.

We are passionate and proud of our technology and aim to provide the optimal technology for reviews and other user-generated content. We will be happy to help you implement your brilliant ideas and discover what is possible.

1. Sentiment Analysis & Concept/Keyword/Topic Analysis - Current tech available

"Breakfast was tasty"	=>	Breakfast: positive
"Breakfast was huge"	=>	Breakfast: positive
"Breakfast was not included"	=>	Breakfast: negative
"Breakfast was very tasty but limited choices"	=>	Breakfast: neutral
"Breakfast was delicious but you had to pay extra"	=>	Breakfast: neutral

2. Semantic Analysis - Unicorn NLP (new Language Understanding APIs)

"Breakfast was tasty"	=>	Breakfast: tasty
"Breakfast was huge"	=>	Breakfast: plenty options
"Breakfast was not included"	=>	Breakfast: not included
"Breakfast was very tasty but limited choices"	=>	Breakfast: tasty \| poor, limited
"Breakfast was delicious but you had to pay extra"	=>	Breakfast: tasty \| not included

Language has colors. Do not reduce it to black & white.

Technological differences of Unicorn NLP Cognitive Learning solution (compared to other NLP/NLU/ML/DeepLearning solutions):

High Resolution/depth of output, richness of output, what different type of information we capture out of reviews - state-of-the-art: 120+ dedicated semantic models in each Semantic Analysis Product
Very high Fact Coverage (state-of-the-art: 90%) - percentage of pertinent information we extract from reviews
Very high, Human-like Accuracy - state-of-the-art F1: precision=90-95%, recall=70-85%
Detailed, easy-to-use, and human-friendly Semantic models (e.g., comfy room, spacious room, tasty breakfast, will come back) instead of statistical models (e.g. room - confidence: 0.84623, score: 0.735267) or grammatic models (e.g. NS, VP, ADV)
Very fast! We are much much faster than any deep learning or any other solution. Highly optimized technology was very important for us from the beginning. To show you how much, take this example: On a Amazon AWS medium-cpu instance we process up to 1 million reviews a day.

Other differences:

You can get the on-premise version with no maintanance fees and 0 cents/text
No linguistic or machine learning expertise required to implement it into your product/system! Output is simple enough to be used by anyone and even simple enough to be displayed to the user directly (no post-processing needed, no configuration required, most solutions can be implemented in days)
All data extracted by our tech makes sense - we do not provide hard to understand keywords with confidence and some of them do not make sense. Each Semantic Model is actionable data (tasty breakfast, wifi does not work, staff was unfriendly, elevator not working, etc.). Each Semantic Model was built and tested on hundreds of thousands of hotel reviews. Each Semantic model consist of between a hundred and a thousand of ways to express concrete situation. That is why you do not have to configure it to your data.

More about Unicorn NLP Technology

What is a Semantic Model

The semantic model is the newest approach to information extraction models where boundaries of those models are not determined by the structure of a language (or grammar, or used words), but the type of information you are detecting. It does not matter how the user writes an opinion about a specific object, e.g. “breakfast was a joke”, “Muffin and cold coffee is not a breakfast”, “Breakfast wasn’t included as written”, our semantic model still detects it and provides structured-data. With "breakfast" example, it is relatively easy to provide a shallow analysis (e.g. Sentiment Analysis) with the current NLP techniques. If you are looking for more sophisticated information (e.g. is it safe?, is it handicap friendly?, will someone come back?, Is it sustainable? Is it pet-friendly?) the situation gets more complicated. Current techniques provide a part of this information and their accuracy relies on the keywords used in reviews. Users are very creative in the reviews and they are not using a keyword approach. Let’s take one example. If we want to answer the question “is it safe?”, you need to capture a lot of information from reviews without common keywords safe/dangerous. In practice, they write this in tens or hundreds of ways: e.g. “there was a lot of drunken people outside”, “I did not feel good because of the neighbors”, “someone yelled in the night and the front desk lady did nothing”. We believe that if the human can understand what is written in the review, then technology should detect it as well. That is why we call them Semantic Models, not information extraction models.

Where and how we develop Semantic Models

Our Unicorn NLP Cognitive Learning Environment is a place where we design and precisely craft all the semantic models. It is a perfect combination of human and computer intelligence. It is the result of 13 years of experience in the NLP/NLU field and 50+ implementation in 2007-2020. It is the secret sauce of our technology, and it allows us to deliver the best technology in the world for reviews and other user-generated content. Statistical algorithms provide the data based on the processing of hundreds of thousands of reviews, but the humans are making all the crucial decisions. Human does not only annotate the data (like in machine learning approach), but also create semantic boundaries, semantic rules, domain-specific dictionaries, describe common domain-specific misspellings and grammar errors, and whatever it takes to achieve precision=90-95%. Given the pace of AI development, this is a significant advantage in information extraction (in NLP/NLU), and it will be appropriate for the next decade. Think about our environment as a way of creating tens of thousands of very precise, hand-crafted semantic rules but with an AI as an assistant rigorously validating new ideas and marking parts of algorithms and semantic code that needs improvement. AI assistant provides new ideas based on constant simulations on real data and tens of statistical tools.

We designed a unique semantic programming language (QL4Reviews) to extract information from reviews and other user-generated content. With this programming language, you build semantic models ten times faster than in other current technologies. Every Semantic Model is programmed in QL4Reviews and consists of 50-2000 lines of semantic code. Heavily optimized engine allows processing through 150+ Semantic Models on 1 medium-CPU Amazon AWS instance - 1 million reviews a day. The speed of the core engine was essential for us from the beginning. It helps us daily speed the process of development and make several iterations and validation on data a day.

How to process non-English Reviews

Our technology is designed from the ground up to process reviews and other user-generated content, not proper-grammar texts. It's resistance to errors allows processing also machine-translated reviews/texts (via Google Translate or Microsoft Translate). This approach provides almost as good results as processing English reviews but there is no need to rewrite and maintain semantic models into other languages. Consequently, it is a much more cost-effective solution, and it is easier to maintain and scale (using Google Translation API allows you to process 105 languages on day one).

Are we another NLP/NLU API?

This is not just another limited black box that you still need to configure to your data, and learn how to use it. You get dedicated Language Understanding API with a human-like accuracy, which is train&tested on your data. We redesigned every layer of technology and we wanted to address the biggest challenge in the NLP/NLU industry. We wanted to change the way you use and think about Natural Language Understanding API. In our opinion, if you are using NLP API, you should not get your hands dirty and spend inordinate time thinking about how to best implement this API into your application. We wanted to hide all the NLP complexity inside the system and provide you ready-to-use data. We believe that NLP API should be simple and easy to use without linguistic, machine learning or other expertise. You do not need to spend weeks on improving the accuracy of "domain-independent" NLP API, to adjust to Your domain or to make decisions about complex output parameters. Getting output parameters like confidence (e.g. 0.7852) or polarity (e.g. 0.4352) or keywords detected (e.g. friendly, 0.6786; disappointed, -0.7564) gives every developer problems and frankly, they are not standard or predictable. They are rather a consequence of used technology inside, and putting these parameters for the output in our opinion, is the way to leave configuration and responsibility for decision-making to You. We believe in simple human-like output where a human can understand what this API detected. We believe that we have developed excellent semantic models for specific domains, so you can get tailored results and focus on your business. We position ourselves as a read-to-use technology, not a standard API that is a tool that you need to configure and train to your data.

What's next