Preventing users from posting toxic content

I'm not 100% sure if this fits here, but I would guess it does...

I'm building an app where users can post and talk to each other anonymously. With that comes a lot of responsibility as I feel like some people will be tempted to post toxic content as their name isn't attached to it.

To prevent this, I have come up with a few solutions, and I wonder if you guys can give me your feedback whether you think this will have a positive or negative impact on the users that use the app (i.e will they be more inclined to post better content, or try and game the system and come up with creative ways to post worse content).

So here's what I have:

PerspectiveAPI uses machine learning to identify how toxic a sentence is. It's not 100% reliable but it works pretty well I'd say.

  1. Prevent users with an account age of less than 3 days from posting content that is considered more than 0.8 toxic score (according to Perspective)

For example:

You seem like a pretty dull person.

Has a toxic score of 0.78...

  1. My app has a reputation score (similar to this website, where the better content you post the higher score you have). Users can post any toxic score after 3 days but if the toxic score is greater than 0.6 or 0.7 notify them that if they post this they will lose 10 reputation, and ask them if they are still willing to post, or if they'd like to revise their post.

  2. Disallow users entirely from posting really bad words like the n-word.

What do you guys think of this? Is it too much? Too little? Do you think people would be more inclined or less inclined to post toxic content?

I want people to feel like they can post freely and openly, but within reason, of course.

Any feedback would be amazing! Thank you.