Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 21 additions & 21 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Example usage
=============

Daachorse contains some search options, ranging from basic matching with the Aho-Corasick algorithm
to trickier matching. All of them will run very fast based on the double-array data structure and
can be easily plugged into your application as shown below.
Daachorse contains some search options, ranging from standard matching with the Aho-Corasick
algorithm to more advanced matching. All of them run efficiently, powered by the double-array data
structure, and can be easily plugged into your application, as shown below.

Finding overlapped occurrences
------------------------------
Finding overlapping occurrences
-------------------------------

To search for all occurrences of registered patterns that allow for positional overlap in the input
text, use ``find_overlapping()``. When you instantiate a new automaton, unique identifiers are
Expand All @@ -21,11 +21,11 @@ occurrence and its identifier.
>>> pma.find_overlapping(b'abcd')
[(0, 1, 2), (0, 2, 1), (1, 4, 0)]

Finding non-overlapped occurrences with standard matching
---------------------------------------------------------
Finding non-overlapping occurrences with standard matching
----------------------------------------------------------

If you do not want to allow positional overlap, use ``find()`` instead. It performs the search on
the Aho-Corasick automaton and reports patterns first found in each iteration.
To disallow positional overlap, use ``find()`` instead. It performs the search on the Aho-Corasick
automaton and reports the first matching pattern found at each search position.

.. code-block:: python

Expand All @@ -35,11 +35,11 @@ the Aho-Corasick automaton and reports patterns first found in each iteration.
>>> pma.find(b'abcd')
[(0, 1, 2), (1, 4, 0)]

Finding non-overlapped occurrences with longest matching
--------------------------------------------------------
Finding non-overlapping occurrences with longest matching
---------------------------------------------------------

If you want to search for the longest pattern without positional overlap in each iteration, use
``MATCH_KIND_LEFTMOST_LONGEST`` in the construction.
To search for the longest pattern without positional overlap in each iteration, use
``MATCH_KIND_LEFTMOST_LONGEST`` during construction.

.. code-block:: python

Expand All @@ -49,14 +49,14 @@ If you want to search for the longest pattern without positional overlap in each
>>> pma.find(b'abcd')
[(0, 4, 2)]

Finding non-overlapped occurrences with leftmost-first matching
---------------------------------------------------------------
Finding non-overlapping occurrences with leftmost-first matching
----------------------------------------------------------------

If you want to find the the earliest registered pattern among ones starting from the search
position, use ``MATCH_KIND_LEFTMOST_FIRST``.
To search for the earliest registered pattern among those starting from the search position,
use ``MATCH_KIND_LEFTMOST_FIRST``.

This is so-called *the leftmost first match*, a bit tricky search option. For example, in the
following code, ab is reported because it is the earliest registered one.
This semantics is so-called *the leftmost first match*, a tricky search option. For example,
in the following code, ``ab`` is reported because it is the earliest registered one.

.. code-block:: python

Expand All @@ -66,8 +66,8 @@ following code, ab is reported because it is the earliest registered one.
>>> pma.find(b'abcd')
[(0, 2, 0)]

Find patterns on a string
-------------------------
Finding patterns on a string
----------------------------

To build an automaton for strings, use ``CharwiseDoubleArrayAhoCorasick`` instead.

Expand Down
88 changes: 48 additions & 40 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,20 @@ impl DoubleArrayAhoCorasick {

/// Returns a list of non-overlapping matches in the given haystack.
///
/// According to the ``match_kind`` option you specified in the construction, the behavior is
/// changed for multiple possible matches, as follows.
/// This function searches from the beginning of the input bytes, adding a pattern immediately
/// to the resulting list when a pattern is found. The next search resumes from the end of the
/// previously found pattern. When the end of the input string is reached, it returns the
/// resulting list.
///
/// * If you set ``MATCH_KIND_STANDARD`` (default), the automaton searches from the beginning of
/// the input string, yielding a value immediately when a pattern is found.
/// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, the automaton reports matches corresponding to
/// the longest pattern.
/// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, the automaton reports matches corresponding to
/// the pattern earlier registered to the automaton.
/// Depending on the ``match_kind`` option specified during construction, the behavior differs
/// for multiple possible matches, as follows.
///
/// The next search resumes from the end of the previously found pattern.
/// * If you set ``MATCH_KIND_STANDARD`` (default), it reports the match correspinding to the
/// shortest pattern.
/// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, it reports the match corresponding to the
/// longest pattern.
/// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, it reports matches corresponding to the pattern
/// that was registered earlier in the automaton.
///
/// Example 1: Standard semantics
/// >>> import daachorse
Expand Down Expand Up @@ -84,7 +87,7 @@ impl DoubleArrayAhoCorasick {
/// >>> pma.find(b'abcd')
/// [(0, 2, 0)]
///
/// :param haystack: Bytes to search for.
/// :param haystack: Bytes to search in.
/// :type haystack: bytes
/// :return: A list of matches. Each match is a tuple consisting of the start position, end
/// position, and pattern ID.
Expand All @@ -108,12 +111,13 @@ impl DoubleArrayAhoCorasick {

/// Returns a list of overlapping matches in the given haystack.
///
/// The automaton follows the standard behavior of the Aho-Corasick algorithm. It searches from
/// the beginning of the input string, and upon reaching a given position, it yields the
/// patterns ending at that position in descending order of length.
/// This function follows the standard behavior of the Aho-Corasick algorithm. It searches from
/// the beginning of the input bytes, and upon reaching a given position, it adds the patterns
/// ending at that position to the resulting list in descending order of length. When the end of
/// the input bytes is reached, it returns the resulting list.
///
/// If the pattern set contains duplicate patterns, they are yielded in the order they were
/// registered.
/// If the pattern set contains duplicate patterns, they are added to the list in the order they
/// were registered.
///
/// Examples:
/// >>> import daachorse
Expand All @@ -122,7 +126,7 @@ impl DoubleArrayAhoCorasick {
/// >>> pma.find_overlapping(b'abcd')
/// [(0, 1, 2), (0, 2, 1), (1, 4, 0)]
///
/// :param haystack: Bytes to search for.
/// :param haystack: Bytes to search in.
/// :type haystack: bytes
/// :return: A list of matches. Each match is a tuple consisting of the start position, end
/// position, and pattern ID.
Expand All @@ -143,9 +147,9 @@ impl DoubleArrayAhoCorasick {

/// Returns a list of overlapping matches without suffixes in the given haystack.
///
/// The behavior of the automaton is similar to ``find_overlapping()``, except that upon
/// reaching a given position, it yields only the single longest pattern ending at that
/// position.
/// The behavior of this function is similar to ``find_overlapping()``, except that upon
/// reaching a given position, it adds only the single longest pattern ending at that position
/// to the resulting list.
///
/// Examples:
/// >>> import daachorse
Expand All @@ -154,7 +158,7 @@ impl DoubleArrayAhoCorasick {
/// >>> pma.find_overlapping_no_suffix(b'abcd')
/// [(0, 3, 2), (1, 4, 0)]
///
/// :param haystack: Bytes to search for.
/// :param haystack: Bytes to search in.
/// :type haystack: bytes
/// :return: A list of matches. Each match is a tuple consisting of the start position, end
/// position, and pattern ID.
Expand Down Expand Up @@ -275,17 +279,20 @@ impl CharwiseDoubleArrayAhoCorasick {

/// Returns a list of non-overlapping matches in the given haystack.
///
/// According to the ``match_kind`` option you specified in the construction, the behavior is
/// changed for multiple possible matches, as follows.
/// This function searches from the beginning of the input string, adding a pattern immediately
/// to the resulting list when a pattern is found. The next search resumes from the end of the
/// previously found pattern. When the end of the input string is reached, it returns the
/// resulting list.
///
/// * If you set ``MATCH_KIND_STANDARD`` (default), the automaton searches from the beginning of
/// the input string, yielding a value immediately when a pattern is found.
/// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, the automaton reports matches corresponding to
/// the longest pattern.
/// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, the automaton reports matches corresponding to
/// the pattern earlier registered to the automaton.
/// Depending on the ``match_kind`` option specified during construction, the behavior differs
/// for multiple possible matches, as follows.
///
/// The next search resumes from the end of the previously found pattern.
/// * If you set ``MATCH_KIND_STANDARD`` (default), it reports the match correspinding to the
/// shortest pattern.
/// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, it reports the match corresponding to the
/// longest pattern.
/// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, it reports matches corresponding to the pattern
/// that was registered earlier in the automaton.
///
/// Example 1: Standard semantics
/// >>> import daachorse
Expand Down Expand Up @@ -314,7 +321,7 @@ impl CharwiseDoubleArrayAhoCorasick {
/// >>> pma.find('abcd')
/// [(0, 2, 0)]
///
/// :param haystack: String to search for.
/// :param haystack: String to search in.
/// :type haystack: str
/// :return: A list of matches. Each match is a tuple consisting of the start position, end
/// position, and pattern ID.
Expand Down Expand Up @@ -363,12 +370,13 @@ impl CharwiseDoubleArrayAhoCorasick {

/// Returns a list of overlapping matches in the given haystack.
///
/// The automaton follows the standard behavior of the Aho-Corasick algorithm. It searches from
/// the beginning of the input string, and upon reaching a given position, it yields the
/// patterns ending at that position in descending order of length.
/// This function follows the standard behavior of the Aho-Corasick algorithm. It searches from
/// the beginning of the input string, and upon reaching a given position, it adds the patterns
/// ending at that position to the resulting list in descending order of length. When the end of
/// the input string is reached, it returns the resulting list.
///
/// If the pattern set contains duplicate patterns, they are yielded in the order they were
/// registered.
/// If the pattern set contains duplicate patterns, they are added to the list in the order they
/// were registered.
///
/// Examples:
/// >>> import daachorse
Expand All @@ -377,7 +385,7 @@ impl CharwiseDoubleArrayAhoCorasick {
/// >>> pma.find_overlapping('abcd')
/// [(0, 1, 2), (0, 2, 1), (1, 4, 0)]
///
/// :param haystack: String to search for.
/// :param haystack: String to search in.
/// :type haystack: str
/// :return: A list of matches. Each match is a tuple consisting of the start position, end
/// position, and pattern ID.
Expand Down Expand Up @@ -414,9 +422,9 @@ impl CharwiseDoubleArrayAhoCorasick {

/// Returns a list of overlapping matches without suffixes in the given haystack.
///
/// The behavior of the automaton is similar to ``find_overlapping()``, except that upon
/// reaching a given position, it yields only the single longest pattern ending at that
/// position.
/// The behavior of this function is similar to ``find_overlapping()``, except that upon
/// reaching a given position, it adds only the single longest pattern ending at that position
/// to the resulting list.
///
/// Examples:
/// >>> import daachorse
Expand All @@ -425,7 +433,7 @@ impl CharwiseDoubleArrayAhoCorasick {
/// >>> pma.find_overlapping_no_suffix('abcd')
/// [(0, 3, 2), (1, 4, 0)]
///
/// :param haystack: String to search for.
/// :param haystack: String to search in.
/// :type haystack: str
/// :return: A list of matches. Each match is a tuple consisting of the start position, end
/// position, and pattern ID.
Expand Down
Loading