package Test::Spelling::Stopwords;

$Test::Spelling::Stopwords::VERSION   = '0.02';
$Test::Spelling::Stopwords::AUTHORITY = 'cpan:MANWAR';

use strict;
use warnings;

use Cwd           qw(abs_path);
use Exporter      qw(import);
use File::Find    qw(find);
use File::Spec    qw();
use Test::Builder qw();

our @EXPORT = qw(
    all_pod_files_spelling_ok
    pod_file_spelling_ok
    set_stopwords_file
    set_spell_lang
    set_spell_dirs
    get_stopwords_file
);

=head1 NAME

Test::Spelling::Stopwords - POD spell-checking with project-specific stopwords

=head1 VERSION

Version 0.02

=head1 SYNOPSIS

Minimal - just drop this into your F<xt/> directory:

    # xt/spell-pod.t
    use Test::More;
    use Test::Spelling::Stopwords;

    unless ($ENV{AUTHOR_TESTING} || $ENV{RELEASE_TESTING} || $ENV{CI}) {
        plan skip_all => 'Spelling tests only run under AUTHOR_TESTING';
    }

    all_pod_files_spelling_ok();

Or with explicit configuration:

    use Test::Spelling::Stopwords;

    set_spell_lang('en_US');
    set_stopwords_file('xt/.stopwords');
    set_spell_dirs('lib', 'bin');

    all_pod_files_spelling_ok();

Or with per-call overrides:

    all_pod_files_spelling_ok(
        lang           => 'en_US',
        stopwords_file => 'xt/.stopwords',
        dirs           => ['lib', 'bin'],
    );

Check a single file:

    use Test::Spelling::Stopwords;
    use Test::More tests => 1;

    pod_file_spelling_ok('lib/My/Module.pm');

=head1 DESCRIPTION

C<Test::Spelling::Stopwords> is a drop-in POD spell-checker that integrates
project-specific stopword files with B<aspell>. It is designed to work
alongside the companion L<gen-stopwords> script, which auto-generates a
F<.stopwords> file containing only the vocabulary unique to your project -
after filtering out the common Perl ecosystem terms already covered by
L<Pod::Wordlist>.

=head2 How it differs from L<Test::Spelling>

L<Test::Spelling> is the established CPAN module for POD spell-checking.
C<Test::Spelling::Stopwords> does not replace it - it complements it by
addressing two specific gaps:

=over 4

=item Automatic stopwords file discovery

L<Test::Spelling> requires you to call C<add_stopwords()> explicitly or
maintain a C<__DATA__> section in your test.
C<Test::Spelling::Stopwords> automatically discovers and loads a
F<.stopwords> file from your project root (or any path you configure),
so your test file contains no project-specific content and can be reused
across projects unchanged.

=item Line-number reporting

When L<Test::Spelling> finds a misspelled word it tells you the word but
not where it is.  C<Test::Spelling::Stopwords> reports the exact line
number(s) in the source file where each misspelling appears, making
failures fast to locate and fix.

=back

=head2 Two-layer stopword architecture

The module merges two sources of known words before checking any file:

=over 4

=item Layer 1 - L<Pod::Wordlist>

The CPAN-maintained vocabulary of common Perl and technical terms (C<ok>,
C<undef>, C<dbi>, C<CPAN>, C<accessor>, C<mutators>, etc.). This mirrors
what C<gen-stopwords> filters out when building F<.stopwords>, so the
module and the generator always agree on what counts as a known word.

Without this layer the test is stricter than the generator and flags words
that L<Pod::Wordlist> covers - causing false failures even on a freshly
generated F<.stopwords>.

=item Layer 2 - F<.stopwords>

Project-specific vocabulary generated by C<gen-stopwords>. Contains only
terms not already covered by L<Pod::Wordlist>.

=back

=head2 Stopwords file format

The F<.stopwords> file is a plain text file with one word per line.
Lines beginning with C<#> and blank lines are ignored.

    # Auto-generated stopwords for en_GB
    dbic
    mojolicious
    resultset
    myauthor

Generate it with the companion C<gen-stopwords> script:

    gen-stopwords --dir lib --dir bin

=head2 Freshness check

On every run, C<Test::Spelling::Stopwords> compares the modification time
of your F<.stopwords> file against your source files.  If any source file
is newer, it emits a C<diag> warning:

    # ------------------------------------------------------------
    # WARNING: .stopwords is out of date!
    # Run gen-stopwords to regenerate it.
    # ------------------------------------------------------------

This is advisory only - the test continues to run.

=head2 POD cleaning

Before passing each line to aspell, all POD formatting codes are stripped
B<entirely>:

    E<gt>           removed  (not 'gt', preventing the 'Egt' artefact)
    L<Some::Module> removed
    C<code>         removed
    B<bold>         removed

This is more aggressive than simple content extraction and prevents a
class of false positives caused by POD entity fragments appearing as
bare words.

=head2 Environment variables

All defaults can be overridden without editing the test file:

=over 4

=item C<SPELL_LANG>

Aspell language code.  Default: C<en_GB>.

=item C<STOPWORD_FILE>

Path to the stopwords file.  Default: C<.stopwords>.

=item C<SPELL_DIRS>

Colon- or comma-separated list of directories to scan.
Default: C<lib:bin:script>.

=item C<ASPELL_CMD>

Complete aspell command string, including all flags.
Default: C<aspell list -l $LANG --run-together>.

=back

=cut

my $TB = Test::Builder->new;

my %_config = (
    lang           => $ENV{SPELL_LANG}    || 'en_GB',
    stopwords_file => $ENV{STOPWORD_FILE} || '.stopwords',
    dirs           => do {
        $ENV{SPELL_DIRS}
            ? [ split /[:,]/, $ENV{SPELL_DIRS} ]
            : [qw(lib bin script)]
    },
);

my %_prune = map { $_ => 1 } qw(
    .git .svn .hg .build
    blib _build local extlib cover_db
    node_modules vendor
);

my $SOURCE_RE = qr/\.(pm|pod|pl|t)$/;

=head1 CONFIGURATION API

=head2 set_spell_lang

    set_spell_lang('en_US');

Sets the aspell language code. May also be set via the C<SPELL_LANG>
environment variable.

=cut

sub set_spell_lang { $_config{lang} = $_[0] }

=head2 set_stopwords_file

    set_stopwords_file('xt/.stopwords');

Sets the path to the stopwords file. May also be set via the
C<STOPWORD_FILE> environment variable.

=cut

sub set_stopwords_file { $_config{stopwords_file} = $_[0] }

=head2 set_spell_dirs

    set_spell_dirs('lib', 'bin', 'script');
    set_spell_dirs( ['lib', 'bin'] );

Sets the list of directories to search for POD files. Accepts either a
list or an arrayref. May also be set via the C<SPELL_DIRS> environment
variable.

=cut

sub set_spell_dirs { $_config{dirs} = ref $_[0] ? $_[0] : [@_] }

=head2 get_stopwords_file

    my $path = get_stopwords_file();

Returns the currently configured stopwords file path.

=cut

sub get_stopwords_file { $_config{stopwords_file} }

=head1 EXPORTED FUNCTIONS

=head2 all_pod_files_spelling_ok

    all_pod_files_spelling_ok();

    all_pod_files_spelling_ok(
        lang           => 'en_US',
        stopwords_file => 'xt/.stopwords',
        dirs           => ['lib', 'bin'],
    );

Finds all Perl source files (F<.pm>, F<.pl>, F<.pod>, F<.t>) under the
configured source directories, and runs a spell-check on the POD in each
one.  Emits one TAP pass/fail per file.

Misspelled words are reported via C<diag> with their line numbers:

    not ok 1 - POD spelling: lib/My/Module.pm
    #   'serialiisable'  line(s): 42
    #   'Egtconnect'     line(s): 17, 83

Accepts an optional hash of per-call overrides (C<lang>, C<stopwords_file>,
C<dirs>) that take precedence over the module-level configuration for the
duration of the call.

Skips gracefully (via C<skip_all>) if:

=over 4

=item * aspell is not installed or not on C<$PATH>

=item * The stopwords file does not exist

=item * No POD files are found in the configured directories

=back

=cut

sub all_pod_files_spelling_ok {
    my %args = @_;

    local $_config{lang}           = $args{lang}           if exists $args{lang};
    local $_config{stopwords_file} = $args{stopwords_file} if exists $args{stopwords_file};
    local $_config{dirs}           = $args{dirs}           if exists $args{dirs};

    unless (_check_aspell()) {
        $TB->plan(skip_all => 'aspell is not installed or not on $PATH');
        return;
    }

    unless (-e $_config{stopwords_file}) {
        $TB->plan(
            skip_all =>
            "No $_config{stopwords_file} found. Run gen-stopwords to create one."
        );
        return;
    }

    _freshness_check();

    my $stopwords = _load_stopwords();
    my @files     = _pod_files();

    unless (@files) {
        $TB->plan(skip_all => 'No POD files found to check');
        return;
    }

    $TB->plan(tests => scalar @files);

    for my $file (@files) {
        pod_file_spelling_ok($file, $stopwords);
    }
}

=head2 pod_file_spelling_ok

    pod_file_spelling_ok($file);
    pod_file_spelling_ok($file, \%stopwords);
    pod_file_spelling_ok($file, \%stopwords, $test_name);

Spell-checks the POD in a single file. Emits one pass or fail.

If C<\%stopwords> is omitted the configured stopwords file is loaded
automatically. C<$test_name> defaults to C<"POD spelling: $file">.

Returns true if the file passes, false otherwise.

=cut

sub pod_file_spelling_ok {
    my ($file, $stopwords, $test_name) = @_;

    $stopwords //= _load_stopwords();
    $test_name //= "POD spelling: $file";

    my ($passed, $errors) = _check_file($file, $stopwords);

    if ($passed) {
        $TB->ok(1, $test_name);
    }
    else {
        $TB->ok(0, $test_name);
        for my $word (sort keys %$errors) {
            $TB->diag(sprintf "  '%s'  line(s): %s",
                $word, join ', ', @{ $errors->{$word} });
        }
    }

    return $passed;
}

#
#
# Internal Helpers

sub _aspell_cmd {
    return $ENV{ASPELL_CMD}
        || "aspell list -l $_config{lang} --run-together";
}

sub _check_aspell {
    my $out = `aspell --version 2>&1`;
    return $? == 0 && $out =~ /aspell/i;
}

# Build the combined stopword lookup — two layers:
#
#   Layer 1: Pod::Wordlist  - the shared Perl community vocabulary (~1000 words)
#   Layer 2: .stopwords     - project-specific terms only
#
# This mirrors gen-stopwords exactly: gen-stopwords filters Pod::Wordlist words
# OUT of .stopwords, so we must add them back here at runtime.  Without Layer 1
# the test is stricter than the generator and flags words like 'ok', 'undef',
# 'dbi', 'CPAN' that Pod::Wordlist covers — causing false failures on a freshly
# generated .stopwords file.
sub _load_stopwords {
    my %words;

    # Layer 1 - Pod::Wordlist
    if (eval 'use Pod::Wordlist; 1') {
        my $wl = do { no strict 'refs'; \%{'Pod::Wordlist::Wordlist'} };
        $words{ lc $_ } = 1 for keys %$wl;
    }
    else {
        $TB->diag(
            'Pod::Wordlist not found - install it for best results '
            . '(cpanm Pod::Wordlist).'
        );
    }

    # Layer 2 - project .stopwords file
    my $file = $_config{stopwords_file};
    if (-e $file) {
        open my $fh, '<', $file
            or do { $TB->diag("Cannot open $file: $!"); return \%words };

        while (<$fh>) {
            chomp;
            next if /^#/ || /^\s*$/;
            $words{ lc $_ } = 1;
        }
        close $fh;
    }

    return \%words;
}

# Warn via diag if any source file is newer than the stopwords file.
sub _freshness_check {
    my $file = $_config{stopwords_file};
    return unless -e $file;

    my $stop_mtime       = (stat $file)[9];
    my $latest_src_mtime = 0;
    my @search_dirs      = grep { -d } ('.', @{ $_config{dirs} });

    find({
        wanted => sub {
            if (-d $_ && $_prune{$_}) { $File::Find::prune = 1; return }
            return unless -f $_ && /$SOURCE_RE/;
            return if $_ eq $file;
            my $m = (stat _)[9];
            $latest_src_mtime = $m if $m > $latest_src_mtime;
        },
        no_chdir => 0,
    }, @search_dirs);

    if ($latest_src_mtime > $stop_mtime) {
        $TB->diag('-' x 60);
        $TB->diag("WARNING: $file is out of date!");
        $TB->diag('Run gen-stopwords to regenerate it.');
        $TB->diag('-' x 60);
    }
}

# Collect all POD files from the configured source directories.
sub _pod_files {
    my @dirs  = grep { -d } @{ $_config{dirs} };
    my @files;

    return () unless @dirs;

    find({
        wanted => sub {
            if (-d $_ && $_prune{$_}) { $File::Find::prune = 1; return }
            return unless -f $_ && /$SOURCE_RE/;
            push @files, $File::Find::name;
        },
        no_chdir => 0,
    }, @dirs);

    return sort @files;
}

# Spell-check a single file.
# Returns ( $passed, \%errors ) where %errors is word => [ line numbers ].
sub _check_file {
    my ($file, $stopwords) = @_;

    my %errors;
    my $in_pod  = 0;
    my $line_no = 0;
    my $cmd     = _aspell_cmd();

    open my $fh, '<', $file
        or return (0, { _open_error => ["Cannot open: $!"] });

    while (my $line = <$fh>) {
        $line_no++;

        $in_pod = 1 if $line =~ /^=(?:head\d|item|over|back|pod|begin|for|method|attr)\b/;
        $in_pod = 0 if $line =~ /^=cut\b/;
        next unless $in_pod;

        # Strip POD formatting codes entirely - prevents 'Egt' artefacts
        $line =~ s/[A-Z]<[^>]+>//g;
        $line =~ s/[<>]//g;

        (my $escaped = $line) =~ s/'/'\\''/g;
        my $misspelled = `echo '$escaped' | $cmd 2>/dev/null`;
        next unless $misspelled;

        for my $word (split /\n/, $misspelled) {
            $word =~ s/^\s+|\s+$//g;
            next unless length $word;

            my $clean = lc $word;
            $clean    =~ s/'s$//;

            next if $stopwords->{$clean};

            push @{ $errors{$word} }, $line_no;
        }
    }

    close $fh;

    return (!%errors, \%errors);
}

=head1 RECOMMENDED WORKFLOW

=over 4

=item 1.

Install dependencies:

    cpanm -vS Test::Spelling::Stopwords

This also installs the companion C<gen-stopwords> script.

=item 2.

Generate your project's stopwords file:

    gen-stopwords --dir lib --dir bin

This scans your source files, runs aspell, filters out terms already
covered by L<Pod::Wordlist>, and writes a lean F<.stopwords> containing
only project-specific vocabulary.

=item 3.

Create F<xt/spell-pod.t>:

    use Test::More;
    use Test::Spelling::Stopwords;

    unless ($ENV{AUTHOR_TESTING} || $ENV{RELEASE_TESTING} || $ENV{CI}) {
        plan skip_all => 'Spelling tests only run under AUTHOR_TESTING';
    }

    all_pod_files_spelling_ok();

=item 4.

Run:

    AUTHOR_TESTING=1 prove -lv xt/spell-pod.t

=item 5.

After adding or editing source files, regenerate:

    gen-stopwords

The test will warn you if you forget.

=back

=head1 DEPENDENCIES

=over 4

=item * L<Test::Builder> (core via L<Test::More>)

=item * L<File::Find> (core)

=item * L<File::Spec> (core)

=item * L<Cwd> (core)

=item * L<Pod::Wordlist> (strongly recommended - C<cpanm Pod::Wordlist>)

=item * B<aspell> - must be installed on the system and available on C<$PATH>

=back

=head1 BUGS AND LIMITATIONS

=over 4

=item * aspell must be installed externally. The module skips gracefully
if it is absent but cannot install it for you.

=item * The shell pipe to aspell (via backticks) means Windows is not
currently supported. Patches welcome.

=item * The freshness check uses file modification times, which are reset
by C<git checkout> and similar operations. It is advisory only.

=back

=head1 SEE ALSO

=over 4

=item * L<Test::Spelling> - the established base module this complements

=item * L<Pod::Wordlist> - the community Perl vocabulary list

=item * L<Pod::Spell> - POD-aware text extraction for spell-checking

=item * L<gen-stopwords> - companion script for generating F<.stopwords>

=back

=head1 AUTHOR

Mohammad Sajid Anwar C<< <mohammad.anwar@yahoo.com> >>

=head1 REPOSITORY

L<https://github.com/manwar/Test-Spelling-Stopwords>

=head1 BUGS

Please report any bugs or feature requests through the web interface at L<https://github.com/manwar/Test-Spelling-Stopwords/issues>.
I will  be notified and then you'll automatically be notified of progress on your
bug as I make changes.

=head1 SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Test::Spelling::Stopwords

You can also look for information at:

=over 4

=item * Bug Report

L<https://github.com/manwar/Test-Spelling-Stopwords/issues>

=item * CPAN Ratings

L<http://cpanratings.perl.org/d/Test-Spelling-Stopwords>

=item * Search MetaCPAN

L<https://metacpan.org/dist/Test-Spelling-Stopwords/>

=back

=head1 LICENSE AND COPYRIGHT

Copyright (C) 2026 Mohammad Sajid Anwar.

This program  is  free software; you can redistribute it and / or modify it under
the  terms  of the the Artistic License (2.0). You may obtain a  copy of the full
license at:
L<http://www.perlfoundation.org/artistic_license_2_0>
Any  use,  modification, and distribution of the Standard or Modified Versions is
governed by this Artistic License.By using, modifying or distributing the Package,
you accept this license. Do not use, modify, or distribute the Package, if you do
not accept this license.
If your Modified Version has been derived from a Modified Version made by someone
other than you,you are nevertheless required to ensure that your Modified Version
 complies with the requirements of this license.
This  license  does  not grant you the right to use any trademark,  service mark,
tradename, or logo of the Copyright Holder.
This license includes the non-exclusive, worldwide, free-of-charge patent license
to make,  have made, use,  offer to sell, sell, import and otherwise transfer the
Package with respect to any patent claims licensable by the Copyright Holder that
are  necessarily  infringed  by  the  Package. If you institute patent litigation
(including  a  cross-claim  or  counterclaim) against any party alleging that the
Package constitutes direct or contributory patent infringement,then this Artistic
License to you shall terminate on the date that such litigation is filed.
Disclaimer  of  Warranty:  THE  PACKAGE  IS  PROVIDED BY THE COPYRIGHT HOLDER AND
CONTRIBUTORS  "AS IS'  AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED
WARRANTIES    OF   MERCHANTABILITY,   FITNESS   FOR   A   PARTICULAR  PURPOSE, OR
NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS
REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL,  OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE
OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

=cut

1; # End of Test::Spelling::Stopwords
