Skip to content

C++ data structures update#47

Merged
DavidLapous merged 90 commits intoDavidLapous:mainfrom
hschreiber:update_from_gudhi
Dec 16, 2025
Merged

C++ data structures update#47
DavidLapous merged 90 commits intoDavidLapous:mainfrom
hschreiber:update_from_gudhi

Conversation

@hschreiber
Copy link
Copy Markdown
Contributor

In this PR

  • Update of Gudhi classes (matrix, simplex tree etc.)
  • 1-critical and k-critical filtration values were merged together in a single class with therefore same interface for both uses. A template argument of the class can be set to true to ensure that the value remains 1-critical (to optimize some loops and throw if the user tries to add a generator which breaks this property).
  • There are now three different multi-parameter filtration value classes: Dynamic_multi_parameter_filtration (corresponding to the two old ones merged), Multi_parameter_filtration (same than the other, but the generators are aligned in a single vector instead of a vector of generators) and Degree_rips_filtration (specialized class for Rips bi-filtration of the form $([0,*], [1,*], ..., [k,*])$)
  • Only Dynamic_multi_parameter_filtration is used for now, as the goal of the PR was to integrate the new interface before anything else
  • Re-organization of the following files in the multipers/gudhi folder:
    • gudhi/Simplex_tree_multi.h is now gudhi/Multi_filtration/multi_filtration_utils.h, except for the method linear_projection as I was told to not integrate it into Gudhi yet. So it was moved temporarily into Simplex_tree_multi_interface.h,
    • cubical_to_boundary.h is now build_[complex/slicer]_from_bitmap in gudhi/slicer_helpers.h,
    • mma_interface_coh.h is now gudhi/Multi_persistence/Persistence_interface_cohomology.h and gudhi/Multi_persistence/Multi_parameter_filtered_complex_pcoh_interface.h,
    • mma_interface_h0.h and naive_merge_tree.h are now in the subfolder tmp_h0_pers as they are useless in the current state,
    • mma_interface_matrix.h is now gudhi/Multi_persistence/Persistence_interface_matrix.h,
    • scc_io.h is transferred into gudhi/slicer_helpers.h,
    • truc.h is now decomposed into gudhi/Slicer.h, gudhi/Thread_safe_slicer.h, gudhi/Multi_parameter_filtered_complex.h and gudhi/Projective_cover_kernel.h (and the Truc constructor from a simplex tree is now externalize in gudhi/slicer_helpers.h).
  • Box and Line use a proper Point class now instead of filtration values, which is much cleaner. The new point class has the same properties as a vector, but has additionally all arithmetic properties of the filtration values
  • The simplex tree does not have a is_multi_parameter parameter in its options anymore. We agreed with Vincent and Marc that it was fine to have int number_of_parameters_ as a permanent attribute (i.e. not optional) with default value at 1
  • I had to disable the (de)serialization of the number_of_parameters_ attribute in this PR, as it is used to convert gudhi.SimplexTree to multipers.SimplexTree and therefore they methods have to agree. But once the version with number_of_parameters_ is officially released in Gudhi, the attribute will be properly taken into account
  • All changes made anywhere else were only made to make the new interfaces compile and work. There were no thoughts of optimization or clean up or whatever.

To do before merging (for @DavidLapous)

  • As the last point above says, I just changed the rest such that it works. You should go over it to be sure that I didn't made anything worse at least
  • Go through all class and method names. I had to improvise some, so I am sure that they are not all satisfying...
  • Complete the TODO:s in the doc. There are very few of them. There are also the introduction pages for Gudhi (so not existing in this repo, but here)
  • Look at least quickly at the documentation done, to see if nothing is wrong
  • Test the branch with whatever other tests you have. I just made sure that all unitary tests passes and the notebooks included in this repo
  • This point is more for me: some changes in the simplex tree are still ongoing PR's in the Gudhi repertory, so they will with have to be updated again with high probability

To do for the next PR(s)

  • Decide how and where to use each new filtration value class and where we want to allow all of them (you should also think on how you want to present all this in Python)
  • Decide which of the C++ algorithms should be prepared for Gudhi next (and give me an idea of how they will be used, to see if I should re-think the interfaces, interactions etc.)
  • The new Multi_parameter_filtered_complex class needs some changes. For now, it is very basic, as I wanted just to separate it from the Slicer class, but I would like to experiment with it in the future. Having for example an alternative where all containers are continuous vectors (and not vectors of vectors more or less), which would not be usable with its current interface. But as the class is internal, any changes to it will not have any impact on the interface of Slicer or anything in Python. I also do not think it is priority, the point above is more important.

Copy link
Copy Markdown
Owner

@DavidLapous DavidLapous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Hannah for this PR, I did a first quick pass over the code today.

Comment thread multipers/gudhi/gudhi/Multi_persistence/Box.h Outdated
* @ingroup multi_filtration
*
* @brief Class encoding the different generators, i.e., apparition times, of a \f$ k \f$-critical
* \f$\mathbb R^2\f$-filtration value in a degree-Rips filtration. That is, a \f$ k \f$-critical filtration value is
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a common name yet. Maybe Degree_geometric_bifiltration or Multi_geometric_bifiltration or ...

Suggested change
* \f$\mathbb R^2\f$-filtration value in a degree-Rips filtration. That is, a \f$ k \f$-critical filtration value is
* \f$\mathbb R^2\f$-filtration value in a degree-Rips-like filtration. That is, a \f$ k \f$-critical filtration value is

@odinhg Do you have an idea of a name that encompasses all such filtrations, i.e., multicover, degree-rips, core, etc that can be encoded this way ?

Also, as python can efficiently reparametrize filtrations, this can (without any change in the python interface), any bifiltration could be encoded this way when the cpp type parameter is an integer.
-- > We can also use a generic name like flat_multicritical_bifiltration

I'll update the doc when we settle on something

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think flat_multicritical_bifiltration is fine and generic enough to capture potential new bifiltration on the same form. The ones mentioned are all parameter-free density-based bifiltrations, but naming it by these properties might end up being too specific, and it doesn't really convey anything about the data structure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not a bit too general or does the "flat" convey enough the constrains on the first parameter values? It does not sound very convenient to represent any random bi-filtration such way, only those whose first parameter is somehow parametrizable such that they can be mapped to the sequence of natural numbers.

Copy link
Copy Markdown
Owner

@DavidLapous DavidLapous Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point is the following : for these kinds of $k$-critical filtrations, almost every simplex is (almost) $k$-critical, so you don't lose space storing them that way.
So, as we delegate the reparametrization to python, this structure can efficiently encode any dense (in terms of $k$) $k$-critical filtrations, i.e., filtrations such that simplices are almost all $k$-critical.

So would something like flat_dense_multicritical_bifiltration be ok ? I'm not sure about the dense (nor the flat), as it's not very explicit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think you can omit the multicritical in the name ($k$ could be $1$ and the theory still works even if not used in practice). Flat_dense_bifiltration? Or just Flat_bifiltration in the end? At least the "flat" makes it clear that is does not encode any bifiltration, even if the meaning is not very clear.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well if you want to encode a 1-critical generator at [x,k], you'll need room for [x,1]=inf, [x,2]=inf, ... [x,k-1]=inf, before. So it's really not well suited for 1-critical filtrations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not well-suited indeed, but not wrong. I don't think we should charge the class name too much (as it is an extensive use class, its name can pop up a lot in a single code if no alias was used), if it is not necessary.

Comment on lines +194 to +200
Degree_rips_bifiltration(Underlying_container &&generators, [[maybe_unused]] int number_of_parameters)
: generators_(std::move(generators))
{
if constexpr (Ensure1Criticality) {
if (generators_.size() > 1) throw std::logic_error("Multiparameter filtration value is not 1-critical.");
}
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Degree_rips_bifiltration(Underlying_container &&generators, [[maybe_unused]] int number_of_parameters)
: generators_(std::move(generators))
{
if constexpr (Ensure1Criticality) {
if (generators_.size() > 1) throw std::logic_error("Multiparameter filtration value is not 1-critical.");
}
}
Degree_rips_bifiltration(Underlying_container &&generators, [[maybe_unused]] int number_of_parameters = 2)
: generators_(std::move(generators))
{
if (number_of_parameters != 2){
throw std::invalid_argument("This filtration can only encode. Got "+ std::to_string(number_of_parameters) + ".");
}
if constexpr (Ensure1Criticality) {
if (generators_.size() > 1) throw std::logic_error("Multiparameter filtration value is not 1-critical.");
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you want to consider the use of any other parameter as wrong, not just ignoring it? Should we not this for all the constructors then?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (only for this class)

static Degree_rips_bifiltration inf(int number_of_parameters = 2)
{
if constexpr (Co) {
throw std::logic_error("No biggest value possible for Co-filtrations yet.");
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put -inf by default, unless this cases an issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inf(p) returns the biggest possible value for Co == false, but the smallest possible value for Co == true (behavior we can change if you want) and vice versa for minus_inf(p). And representing the smallest possible negative corner with the underlying format and finitely many generators is not possible...

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I got that. Morally, if you use co (which I think should be called op I don;t know where the co comes from, maybe from co-relation) you're reversing the poset, i.e. $a\le_{co} b \Leftrightarrow b \le a$.
So, apart from int filtration values which would be shifted by one, I dont understand where is the issue.

Anyway, we're not using this yet, so we can deal with that later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly because we reverse the poset, $inf() \leq a$ for all $a$ when Co == true. You cannot achive that with $(0,inf)$ ($(0,-inf)$ would be the biggest), as $(0,inf), (1,inf)$ is smaller and $(0,inf), (1,inf), (2,inf)$ is even smaller etc.

auto d = std::distance(
values.begin(),
std::lower_bound(
values.begin(), values.end(), static_cast<typename OneDimArray::value_type>(generators_[g])));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A later todo:
This projects it to the right (which is sound), but due to float imprecisions I think I now prefer to fix this, with something like

auto temp = std::distance(grid[i].begin(), std::lower_bound(grid[i].begin(), grid[i].end(), pt[i]));
if (std::abs(grid[i][temp] - pt[i]) < std::abs(grid[i][temp - 1] - pt[i])) {
out[i] = temp;
} else {
out[i] = temp - 1;
}

maybe as a template ?

/**
* @brief Builds a new complex by reordering the cells in the given complex with the given permutation map.
*/
friend Multi_parameter_filtered_complex build_permuted_complex(const Multi_parameter_filtered_complex& complex,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
friend Multi_parameter_filtered_complex build_permuted_complex(const Multi_parameter_filtered_complex& complex,
friend Multi_parameter_filtered_complex build_permuted_filtered_complex(const Multi_parameter_filtered_complex& complex,

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the input/output type already explicit on this, so I don't feel the need to have a long method name. But as you prefer.

Comment on lines +719 to +720
// TODO: This doesn't allow for negative dimensions
// Hannah: not sure what this comment means ?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, negative things could be useful, but not yet. So let's ignore this ftm

Comment on lines +176 to +177
if len(default_values) > num_parameters:
del default_values[num_parameters:]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks cursed haha. I'll see what I can do later for this, in particular, multipers.array_api should be able to take care of this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it works :D
But, yes, don't hesitate to make a cleaner version of this. I was surprised how weirdly complicated a resize is in python.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess your issue is that if you do a

default_values = default_values[num_parameters:]

then default_values becomes a local variable and doesn't affect the original one.
There are two options :

  • use the nonlocal keyword, i.e.,
nonlocal default_values
default_values=default_values[num_parameters:]
  • or cheat a little bit with something like:
default_values[:]=default_values[num_parameters:]

In any case, I'll have to put that in the multipers.array_api interface to recover the autodiff in default_values if it exists, so I'll fix that later.

Copy link
Copy Markdown
Contributor Author

@hschreiber hschreiber Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at that line I want to delete default_values[num_parameters:] (i.e., prune all values at indices higher than num_parameters in default_values). default_values=default_values[num_parameters:] is doing the opposite. Or did you meant default_values=default_values[:num_parameters-1]?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I meant the other.

Comment on lines +31 to +33
# TODO: Is there no directer way to convert a T[:] into a vector[T]?
# A memcpy would be much quicker than a python/cython for loop...
# With a continuous memory use, we could also pass the pointers as iterators if we have access to it?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the compiler is smart enough. As you have the nogil flag, this converts to pure cpp code. IIRC, as f is a vector, f[i] converts to *(f.data + i) and the same for filtration[i] with cython (with the same pointer type). I guess the compiler can then guess the memcpy, non ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in the produced cpp file, there is also a for loop. I was more thinking of a bit by bit copy. At least, the loop is a C++ one, not a python one and the vector size is allocated from the beginning. So, it should not make much of a difference. It is just that we are calling this method a lot.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there will be a loop in the cpp file, but this is an "easy optimization" for the compiler, so I hope that with high prob the generated assembly will have a memcpy.

Comment thread multipers/slicer.pyx.tp Outdated
@DavidLapous DavidLapous requested a review from Copilot September 9, 2025 14:54

This comment was marked as off-topic.

Copy link
Copy Markdown
Owner

@DavidLapous DavidLapous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally finished (in particular degree-rips and stuff) but I don't want to manage multiple branches or PRs over this PR so LGTM.

@DavidLapous DavidLapous merged commit 6fe7b26 into DavidLapous:main Dec 16, 2025
5 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants