Criteria for what qualifies as a heuristic violation?

I've been conducting a heuristic evaluation on a web application with roughly 15 evaluators, all of whom are single experts (i.e. familiar with the application but not usability experts). We're using NN/g's 10 heuristics.

In short, some of the violations they've reported are hard to categorize as a violation.

As an example, if an evaluator says that the 'X' icon inside a search input doesn't clear a user's inputted string, is that a violation?

Arguably, it could fall under:

  1. Heuristic 1 - Visibility of system status: for not providing feedback by clearing the input's string
  2. Heuristic 2 - Match between system and the real world: for not following the convention that most users expect when clearing an input
  3. Heuristic 4 - Consistency and standards: for simply not following other standards elsewhere in the app where the 'X' icon does clear an input's value.

I'm inclined to categorize these types of issues as mistakes in not adhering to acceptance criteria or poor testing. Otherwise, my concern is that introducing these kinds of submissions turns the heuristic evaluation into a bug list as opposed to an evaluation of an app's user experience.

What are your thoughts? Does anyone know of articles where NN/g talks about what criteria makes a usability issue a heuristic violation?