Skip to content

feat: add c language support #167

Open
chinaux wants to merge 6 commits intomainfrom
feat/add_c_api
Open

feat: add c language support #167
chinaux wants to merge 6 commits intomainfrom
feat/add_c_api

Conversation

@chinaux
Copy link
Collaborator

@chinaux chinaux commented Feb 25, 2026

Summary

Add C language bindings.

Changes

src/c_api/c_api.cc

  • Implement C API wrapper in C++ to expose functionality to C

src/include/zvec/c_api.h

  • Add C-compatible header file defining the public C API

examples/c_api/

  • Add example programs demonstrating usage of the C API

Testing

  • Add C API unit tests in tests/c_api/c_api_test.c

@Cuiyus
Copy link
Collaborator

Cuiyus commented Feb 26, 2026

@greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds comprehensive C language bindings to the zvec project, including a C API wrapper (c_api.cc), public C header (c_api.h), examples, and test suite.

Key additions:

  • Complete C API covering collections, documents, schemas, queries, and configuration
  • 6 example programs demonstrating API usage
  • Comprehensive test suite with 2,350+ lines of test code
  • Proper CMake integration with BUILD_EXAMPLES option

Critical issue:

  • Memory management bug throughout c_api.cc: functions allocate memory with new[] but document/use free() for deallocation, causing undefined behavior. This affects copy_string(), zvec_string_create(), zvec_doc_get_pk_copy(), and all error message handling.

The implementation is otherwise well-structured with thread-safe error handling, proper API versioning, and comprehensive functionality coverage.

Confidence Score: 1/5

  • This PR contains critical memory management bugs that will cause undefined behavior and potential crashes in production
  • Score reflects the severity of memory allocation/deallocation mismatches (mixing new[]/delete[] with malloc/free) that occur in multiple core functions throughout the C API implementation. These bugs will cause undefined behavior on most platforms and must be fixed before merging
  • src/c_api/c_api.cc requires immediate attention to fix memory management issues. All functions that allocate strings need to be updated to use consistent allocation/deallocation methods

Important Files Changed

Filename Overview
src/c_api/c_api.cc Implements C API wrapper with critical memory management bugs - mixing new[]/delete[] with malloc/free causes undefined behavior in string allocation functions
src/include/zvec/c_api.h Comprehensive C API header with proper structure, though documentation inconsistencies exist regarding memory deallocation methods
CMakeLists.txt Added BUILD_EXAMPLES option and examples directory - straightforward and safe changes
examples/c_api/basic_example.c Demonstrates C API usage but contains memory deallocation bug at line 227 (uses free() on memory allocated with new[])
tests/c_api/c_api_test.c Comprehensive test suite covering major C API functionality

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class ZVec_C_API {
        +zvec_initialize()
        +zvec_shutdown()
        +zvec_get_version()
    }
    
    class Collection_API {
        +zvec_collection_create()
        +zvec_collection_open()
        +zvec_collection_insert()
        +zvec_collection_query()
        +zvec_collection_flush()
    }
    
    class Schema_API {
        +zvec_collection_schema_create()
        +zvec_field_schema_create()
        +zvec_field_schema_set_hnsw_index()
        +zvec_field_schema_set_invert_index()
    }
    
    class Document_API {
        +zvec_doc_create()
        +zvec_doc_set_pk()
        +zvec_doc_add_field()
        +zvec_doc_get_pk_copy()
    }
    
    class Index_API {
        +zvec_index_params_hnsw_create()
        +zvec_index_params_invert_create()
    }
    
    class Config_API {
        +zvec_config_data_create()
        +zvec_config_log_create()
    }
    
    class Memory_Management {
        +zvec_free()
        +zvec_free_string()
        +zvec_free_str()
        +zvec_docs_free()
    }
    
    ZVec_C_API --> Collection_API
    ZVec_C_API --> Config_API
    Collection_API --> Schema_API
    Collection_API --> Document_API
    Schema_API --> Index_API
    Document_API --> Memory_Management
    Collection_API --> Memory_Management
Loading

Last reviewed commit: 67ff247

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@feihongxu0824 feihongxu0824 requested a review from egolearner March 4, 2026 02:21
@feihongxu0824 feihongxu0824 self-requested a review March 10, 2026 08:19
This was referenced Mar 16, 2026
@kdy1 kdy1 mentioned this pull request Mar 17, 2026
chinaux and others added 4 commits March 19, 2026 10:12
format c_api.h

fix some code

format c_api code

build rpm

fix some code

fix some code

remove rpm build

Add GitHub Actions release workflow for C API

add release linux-arm64

remove c api version

fix some code

fix some code

fix some code

refact some code

format some code

format some code

format some code

add api reference

// RAII guard for array allocated with new[]
template <typename T>
struct DeleteArrayGuard {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those guards are not used? Can they be removed?
BTW, you can used unique_ptr instead of Delete(Array)Guard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌, removed

/**
* @brief Mutable string structure (owns memory)
*/
typedef struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use c99 flexible array to reduce malloc count(2->1)

struct A {
    int a;
    char data[];   // flexible array member
};

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach restricts usage to the end of a struct only, making it inconvenient to use within other structs.

set_last_error(status.message());
return ZVEC_ERROR_INTERNAL_ERROR;
}
} g_initialized.store(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing new line

cpp_config.log_config = log_config;
}
// Initialize global configuration
auto status = zvec::GlobalConfig::Instance().Initialize(cpp_config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize and error check common logic could be extracted out of if/else.

return ZVEC_OK;)
}

ZVecErrorCode zvec_is_initialized(bool *initialized) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to bool zvec_is_initialized() to simplify usage?


void zvec_config_file_log_destroy(ZVecFileLogConfig *config) {
if (config) {
if (config->dir.data) free((void *)config->dir.data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast is unnecessary, void * can be implicitly casted.

static char *copy_string(const std::string &str) {
if (str.empty()) return nullptr;
char *copy = static_cast<char *>(malloc(str.length() + 1));
strcpy(copy, str.c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use strncpy to support binary str input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌,I've fixed.

for (size_t i = 0; i < *result_count; ++i) {
const std::string pk = i < pks.size() ? pks[i] : std::string();
const std::string message = statuses[i].message();
(*results)[i].pk = copy_string(pk);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ordered style so that pk does not need to be stored in result?

ZVecDoc *zvec_doc_create(void) {
ZVEC_TRY_RETURN_NULL("Failed to create document", {
auto doc_ptr =
new std::shared_ptr<zvec::Doc>(std::make_shared<zvec::Doc>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use plain Doc?

auto* doc = new Doc();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to raw pointers would make interoperability with C++ code more difficult: it would require frequent conversions between shared_ptr and raw pointers, and it would be prone to memory leaks. It's recommended to keep using shared_ptr.

set_last_error("Invalid value size for vector_fp64 type");
return error_code;
}
(*doc_ptr)->set(name, vec);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move(vec) for better performance

reinterpret_cast<const int8_t *>(field->value.vector_value.data),
reinterpret_cast<const int8_t *>(field->value.vector_value.data) +
field->value.vector_value.length);
(*doc_ptr)->set(name, vec);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move(vec) for better performance

@feihongxu0824 feihongxu0824 self-assigned this Mar 20, 2026
include_directories(${PROJECT_ROOT_DIR}/src/include)
include_directories(${PROJECT_ROOT_DIR}/src)
# Add generated headers to global include path
include_directories(${PROJECT_BINARY_DIR}/src/generated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以不用设置为全局的include路径吧?

@@ -0,0 +1,193 @@
name: Release
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在macos/linux的ci里面也加上c_api的执行逻辑吧,可以参考c++ example的写法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants