Skip to content

Add tests for StandardHtmlEncodingDetector content-encoding, EncodingResult fields, and markLimit#2917

Open
vasiliy-mikhailov wants to merge 1 commit into
apache:mainfrom
vasiliy-mikhailov:add-StandardHtmlEncodingDetector-tests
Open

Add tests for StandardHtmlEncodingDetector content-encoding, EncodingResult fields, and markLimit#2917
vasiliy-mikhailov wants to merge 1 commit into
apache:mainfrom
vasiliy-mikhailov:add-StandardHtmlEncodingDetector-tests

Conversation

@vasiliy-mikhailov

Copy link
Copy Markdown

Adds four JUnit 5 tests to StandardHtmlEncodingDetectorTest covering the charset-from-content-encoding detection path, full EncodingResult field verification (charset, confidence, label, DECLARATIVE result type), the default markLimit (65536), and custom setMarkLimit behavior including a meta tag placed beyond the configured limit. The tests reuse the existing assertCharset/detectCharset helpers and exercise previously uncovered branches in the detector.

metric before after
mutation score 80% 93%
test methods added n/a 4

The additions are append-only (no existing test is modified) and pass against the current code.


How this was produced

This PR was generated with an AI-assisted pipeline built around mutation testing (PIT). The pipeline mutates the target class (flipping conditions and changing boundary/edge cases) and runs the existing tests against each mutant. Where a mutant survives (the existing tests do not catch that edge case), it writes a focused test for that case and reruns PIT to confirm the new test actually kills that specific mutant. So every added test is verified to catch a concrete edge case the suite missed before, rather than being speculative or redundant. The change is additive only (no production code modified), and the module builds green under its CI JDK.

@tballison

Copy link
Copy Markdown
Contributor

From claude's review:

  1. The comments are mutation-tooling exhaust and will rot. Lines like // kills the surviving mutants on lines 70-71 (EQUAL_ELSE + removed call...), // 
  InlineConstant (1.0 -> 2.0) ... on lines 81-82, // line 94 (NO_COVERAGE) describe why PIT generated the test and hard-code production line numbers. The moment
  StandardHtmlEncodingDetector.java shifts by a line, those comments are wrong. I'd ask the contributor to rewrite them to state the behavior under test (e.g.
  "charset comes from Content-Encoding when Content-Type is absent") and drop the mutant/line-number references entirely.
  2. customMarkLimit's comment is slightly off. "the meta tag beyond 100 bytes won't be found" — the tag actually starts at byte 80 (inside the limit); it's the
  charset value that gets truncated at byte 100. The test logic is correct; only the wording misleads.
  3. Minor/stylistic: assertEquals(1.0f, getConfidence()) has no delta. It compiles (JUnit 5 has the (float,float) overload) and passes because production
  hard-codes 1.0f, so it's fine — a purist might add a delta.

These make sense to me.

Thank you for opening this and improving our unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants