diff --git a/docs/source/examples.rst b/docs/source/examples.rst index 99b374c..7b38536 100644 --- a/docs/source/examples.rst +++ b/docs/source/examples.rst @@ -1,12 +1,12 @@ Example usage ============= -Daachorse contains some search options, ranging from basic matching with the Aho-Corasick algorithm -to trickier matching. All of them will run very fast based on the double-array data structure and -can be easily plugged into your application as shown below. +Daachorse contains some search options, ranging from standard matching with the Aho-Corasick +algorithm to more advanced matching. All of them run efficiently, powered by the double-array data +structure, and can be easily plugged into your application, as shown below. -Finding overlapped occurrences ------------------------------- +Finding overlapping occurrences +------------------------------- To search for all occurrences of registered patterns that allow for positional overlap in the input text, use ``find_overlapping()``. When you instantiate a new automaton, unique identifiers are @@ -21,11 +21,11 @@ occurrence and its identifier. >>> pma.find_overlapping(b'abcd') [(0, 1, 2), (0, 2, 1), (1, 4, 0)] -Finding non-overlapped occurrences with standard matching ---------------------------------------------------------- +Finding non-overlapping occurrences with standard matching +---------------------------------------------------------- -If you do not want to allow positional overlap, use ``find()`` instead. It performs the search on -the Aho-Corasick automaton and reports patterns first found in each iteration. +To disallow positional overlap, use ``find()`` instead. It performs the search on the Aho-Corasick +automaton and reports the first matching pattern found at each search position. .. code-block:: python @@ -35,11 +35,11 @@ the Aho-Corasick automaton and reports patterns first found in each iteration. >>> pma.find(b'abcd') [(0, 1, 2), (1, 4, 0)] -Finding non-overlapped occurrences with longest matching --------------------------------------------------------- +Finding non-overlapping occurrences with longest matching +--------------------------------------------------------- -If you want to search for the longest pattern without positional overlap in each iteration, use -``MATCH_KIND_LEFTMOST_LONGEST`` in the construction. +To search for the longest pattern without positional overlap in each iteration, use +``MATCH_KIND_LEFTMOST_LONGEST`` during construction. .. code-block:: python @@ -49,14 +49,14 @@ If you want to search for the longest pattern without positional overlap in each >>> pma.find(b'abcd') [(0, 4, 2)] -Finding non-overlapped occurrences with leftmost-first matching ---------------------------------------------------------------- +Finding non-overlapping occurrences with leftmost-first matching +---------------------------------------------------------------- -If you want to find the the earliest registered pattern among ones starting from the search -position, use ``MATCH_KIND_LEFTMOST_FIRST``. +To search for the earliest registered pattern among those starting from the search position, +use ``MATCH_KIND_LEFTMOST_FIRST``. -This is so-called *the leftmost first match*, a bit tricky search option. For example, in the -following code, ab is reported because it is the earliest registered one. +This semantics is so-called *the leftmost first match*, a tricky search option. For example, +in the following code, ``ab`` is reported because it is the earliest registered one. .. code-block:: python @@ -66,8 +66,8 @@ following code, ab is reported because it is the earliest registered one. >>> pma.find(b'abcd') [(0, 2, 0)] -Find patterns on a string -------------------------- +Finding patterns on a string +---------------------------- To build an automaton for strings, use ``CharwiseDoubleArrayAhoCorasick`` instead. diff --git a/src/lib.rs b/src/lib.rs index 81fd786..781cb61 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -45,17 +45,20 @@ impl DoubleArrayAhoCorasick { /// Returns a list of non-overlapping matches in the given haystack. /// - /// According to the ``match_kind`` option you specified in the construction, the behavior is - /// changed for multiple possible matches, as follows. + /// This function searches from the beginning of the input bytes, adding a pattern immediately + /// to the resulting list when a pattern is found. The next search resumes from the end of the + /// previously found pattern. When the end of the input string is reached, it returns the + /// resulting list. /// - /// * If you set ``MATCH_KIND_STANDARD`` (default), the automaton searches from the beginning of - /// the input string, yielding a value immediately when a pattern is found. - /// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, the automaton reports matches corresponding to - /// the longest pattern. - /// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, the automaton reports matches corresponding to - /// the pattern earlier registered to the automaton. + /// Depending on the ``match_kind`` option specified during construction, the behavior differs + /// for multiple possible matches, as follows. /// - /// The next search resumes from the end of the previously found pattern. + /// * If you set ``MATCH_KIND_STANDARD`` (default), it reports the match correspinding to the + /// shortest pattern. + /// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, it reports the match corresponding to the + /// longest pattern. + /// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, it reports matches corresponding to the pattern + /// that was registered earlier in the automaton. /// /// Example 1: Standard semantics /// >>> import daachorse @@ -84,7 +87,7 @@ impl DoubleArrayAhoCorasick { /// >>> pma.find(b'abcd') /// [(0, 2, 0)] /// - /// :param haystack: Bytes to search for. + /// :param haystack: Bytes to search in. /// :type haystack: bytes /// :return: A list of matches. Each match is a tuple consisting of the start position, end /// position, and pattern ID. @@ -108,12 +111,13 @@ impl DoubleArrayAhoCorasick { /// Returns a list of overlapping matches in the given haystack. /// - /// The automaton follows the standard behavior of the Aho-Corasick algorithm. It searches from - /// the beginning of the input string, and upon reaching a given position, it yields the - /// patterns ending at that position in descending order of length. + /// This function follows the standard behavior of the Aho-Corasick algorithm. It searches from + /// the beginning of the input bytes, and upon reaching a given position, it adds the patterns + /// ending at that position to the resulting list in descending order of length. When the end of + /// the input bytes is reached, it returns the resulting list. /// - /// If the pattern set contains duplicate patterns, they are yielded in the order they were - /// registered. + /// If the pattern set contains duplicate patterns, they are added to the list in the order they + /// were registered. /// /// Examples: /// >>> import daachorse @@ -122,7 +126,7 @@ impl DoubleArrayAhoCorasick { /// >>> pma.find_overlapping(b'abcd') /// [(0, 1, 2), (0, 2, 1), (1, 4, 0)] /// - /// :param haystack: Bytes to search for. + /// :param haystack: Bytes to search in. /// :type haystack: bytes /// :return: A list of matches. Each match is a tuple consisting of the start position, end /// position, and pattern ID. @@ -143,9 +147,9 @@ impl DoubleArrayAhoCorasick { /// Returns a list of overlapping matches without suffixes in the given haystack. /// - /// The behavior of the automaton is similar to ``find_overlapping()``, except that upon - /// reaching a given position, it yields only the single longest pattern ending at that - /// position. + /// The behavior of this function is similar to ``find_overlapping()``, except that upon + /// reaching a given position, it adds only the single longest pattern ending at that position + /// to the resulting list. /// /// Examples: /// >>> import daachorse @@ -154,7 +158,7 @@ impl DoubleArrayAhoCorasick { /// >>> pma.find_overlapping_no_suffix(b'abcd') /// [(0, 3, 2), (1, 4, 0)] /// - /// :param haystack: Bytes to search for. + /// :param haystack: Bytes to search in. /// :type haystack: bytes /// :return: A list of matches. Each match is a tuple consisting of the start position, end /// position, and pattern ID. @@ -275,17 +279,20 @@ impl CharwiseDoubleArrayAhoCorasick { /// Returns a list of non-overlapping matches in the given haystack. /// - /// According to the ``match_kind`` option you specified in the construction, the behavior is - /// changed for multiple possible matches, as follows. + /// This function searches from the beginning of the input string, adding a pattern immediately + /// to the resulting list when a pattern is found. The next search resumes from the end of the + /// previously found pattern. When the end of the input string is reached, it returns the + /// resulting list. /// - /// * If you set ``MATCH_KIND_STANDARD`` (default), the automaton searches from the beginning of - /// the input string, yielding a value immediately when a pattern is found. - /// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, the automaton reports matches corresponding to - /// the longest pattern. - /// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, the automaton reports matches corresponding to - /// the pattern earlier registered to the automaton. + /// Depending on the ``match_kind`` option specified during construction, the behavior differs + /// for multiple possible matches, as follows. /// - /// The next search resumes from the end of the previously found pattern. + /// * If you set ``MATCH_KIND_STANDARD`` (default), it reports the match correspinding to the + /// shortest pattern. + /// * If you set ``MATCH_KIND_LEFTMOST_LONGEST``, it reports the match corresponding to the + /// longest pattern. + /// * If you set ``MATCH_KIND_LEFTMOST_FIRST``, it reports matches corresponding to the pattern + /// that was registered earlier in the automaton. /// /// Example 1: Standard semantics /// >>> import daachorse @@ -314,7 +321,7 @@ impl CharwiseDoubleArrayAhoCorasick { /// >>> pma.find('abcd') /// [(0, 2, 0)] /// - /// :param haystack: String to search for. + /// :param haystack: String to search in. /// :type haystack: str /// :return: A list of matches. Each match is a tuple consisting of the start position, end /// position, and pattern ID. @@ -363,12 +370,13 @@ impl CharwiseDoubleArrayAhoCorasick { /// Returns a list of overlapping matches in the given haystack. /// - /// The automaton follows the standard behavior of the Aho-Corasick algorithm. It searches from - /// the beginning of the input string, and upon reaching a given position, it yields the - /// patterns ending at that position in descending order of length. + /// This function follows the standard behavior of the Aho-Corasick algorithm. It searches from + /// the beginning of the input string, and upon reaching a given position, it adds the patterns + /// ending at that position to the resulting list in descending order of length. When the end of + /// the input string is reached, it returns the resulting list. /// - /// If the pattern set contains duplicate patterns, they are yielded in the order they were - /// registered. + /// If the pattern set contains duplicate patterns, they are added to the list in the order they + /// were registered. /// /// Examples: /// >>> import daachorse @@ -377,7 +385,7 @@ impl CharwiseDoubleArrayAhoCorasick { /// >>> pma.find_overlapping('abcd') /// [(0, 1, 2), (0, 2, 1), (1, 4, 0)] /// - /// :param haystack: String to search for. + /// :param haystack: String to search in. /// :type haystack: str /// :return: A list of matches. Each match is a tuple consisting of the start position, end /// position, and pattern ID. @@ -414,9 +422,9 @@ impl CharwiseDoubleArrayAhoCorasick { /// Returns a list of overlapping matches without suffixes in the given haystack. /// - /// The behavior of the automaton is similar to ``find_overlapping()``, except that upon - /// reaching a given position, it yields only the single longest pattern ending at that - /// position. + /// The behavior of this function is similar to ``find_overlapping()``, except that upon + /// reaching a given position, it adds only the single longest pattern ending at that position + /// to the resulting list. /// /// Examples: /// >>> import daachorse @@ -425,7 +433,7 @@ impl CharwiseDoubleArrayAhoCorasick { /// >>> pma.find_overlapping_no_suffix('abcd') /// [(0, 3, 2), (1, 4, 0)] /// - /// :param haystack: String to search for. + /// :param haystack: String to search in. /// :type haystack: str /// :return: A list of matches. Each match is a tuple consisting of the start position, end /// position, and pattern ID.