Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/regression.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
['id']
['Path']
['ToolVersion']
['ASTVersion']
['Modules']['a.b/c']['Dependencies']['a.b/c']
['Modules']['a.b/c/cmdx']['Dependencies']['a.b/c/cmdx']
steps:
Expand Down
51 changes: 49 additions & 2 deletions docs/uniast-en.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Universal Abstract-Syntax-Tree Specification (v0.1.3)
# Universal Abstract-Syntax-Tree Specification (v0.2.0)

Universal Abstract-Syntax-Tree is a LLM-friendly, language-agnostic code context data structure established by ABCoder. It represents a unified abstract syntax tree of a repository's code, collecting definitions of language entities (functions, types, constants/variables) and their interdependencies for subsequent AI understanding and coding-workflow development.

Expand Down Expand Up @@ -370,6 +370,23 @@ Function type AST Node entity, corresponding to [NodeType] as FUNC, including fu

- Vars: Global variables referenced within the current function, including variables and constants

- Extra: Additional information for storing language-specific details or extra metadata


- AnonymousFunctions: Anonymous functions defined in the function, each element is the FileLine of the corresponding function


- File: The filename where it is located


- Line: **Line number of the starting position in the file (starting from 1)**


- StartOffset: **Byte offset of the code starting position relative to the file header**


- EndOffset: **Byte offset of the code ending position relative to the file header**


###### Dependency

Expand All @@ -384,7 +401,10 @@ Represents a dependency relationship, containing the dependent node Id, dependen
"File": "manager.go",
"Line": 140,
"StartOffset": 3547,
"EndOffset": 3564
"EndOffset": 3564,
"Extra": {
"FunctionIsCall": true
}
}
```

Expand All @@ -409,6 +429,12 @@ Represents a dependency relationship, containing the dependent node Id, dependen
- EndOffset: Offset of the ending position of the dependency point (not the dependent node) token relative to the code file


- Extra: Additional information for storing language-specific details or extra metadata


- FunctionIsCall: If the Dependency is a function call, whether it actually executes the function call or just references the function


##### Type

Type definition, [NodeType] is TYPE, including type definitions in specific languages such as structs, enums, interfaces, type aliases, etc.
Expand Down Expand Up @@ -490,6 +516,9 @@ Type definition, [NodeType] is TYPE, including type definitions in specific lang
- Implements: Which interfaces this type implements Identity


- Extra: Additional information for storing language-specific details or extra metadata


##### Var

Global variables, including variables and constants, **but must be global**
Expand Down Expand Up @@ -553,6 +582,24 @@ var x = getx(y db.Data) int {
- Groups: Group definitions, such as `const( A=1, B=2, C=3)` in Go, Groups would be `[C=3, B=2]` (assuming A is the variable itself)


- Extra: Additional information for storing language-specific details or extra metadata


- AnonymousFunctions: Anonymous functions defined in the initialization function of the current variable. Each element is the FileLine of the corresponding function


- File: The filename where it is located


- Line: **Line number of the starting position in the file (starting from 1)**


- StartOffset: **Byte offset of the code starting position relative to the file header**


- EndOffset: **Byte offset of the code ending position relative to the file header**


### Graph

The dependency topology graph of all AST Nodes in the repository. Formatted as Identity => Node mapping, where each Node contains dependency relationships with other nodes.
Expand Down
53 changes: 50 additions & 3 deletions docs/uniast-zh.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Universal Abstract-Syntax-Tree Specification (v0.1.3)
# Universal Abstract-Syntax-Tree Specification (v0.2.0)

Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言无关的代码上下文数据结构,表示某个仓库代码的统一抽象语法树。收集了语言实体(函数、类型、常(变)量)的定义及其相互依赖关系,用于后续的 AI 理解、coding-workflow 开发。

Expand Down Expand Up @@ -371,20 +371,40 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
- Vars: 当前函数内引用的全局量,包括变量和常量


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


- AnonymousFunctions: 函数中所定义的匿名函数,每个元素为对应函数的 FileLine


- File: 所在的文件名


- Line: **起始位置文件的行号(从1开始)**


- StartOffset: 代码起始位置**相对文件头的字节偏移量**


- EndOffset: 代码结束位置**相对文件头的字节偏移量**

###### Dependency

表示一个依赖关系,包含依赖节点 Id、依赖产生位置等信息,方便 LLM 准确识别


```
```json
{
"ModPath": "github.com/cloudwego/localsession",
"PkgPath": "github.com/cloudwego/localsession",
"Name": "transmitSessionIdentity",
"File": "manager.go",
"Line": 140,
"StartOffset": 3547,
"EndOffset": 3564
"EndOffset": 3564,
"Extra": {
"FunctionIsCall": true
}
}
```

Expand All @@ -409,6 +429,12 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
- EndOffset: 依赖点(不是被依赖节点)token 结束位置相对代码文件的偏移


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


- FunctionIsCall: 如果 Dependency 是一个函数调用,是否真正执行了函数调用,而不是只是引用了函数


##### Type

类型定义,【NodeType】为 TYPE,包括具体语言中的类型定义,如 结构体、枚举、接口、类型别名等
Expand Down Expand Up @@ -490,6 +516,9 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
- Implements: 该类型实现了哪些接口 **Identity**


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


##### Var

全局量,包括变量和常量,**但是必须是全局**
Expand Down Expand Up @@ -553,6 +582,24 @@ var x = getx(y db.Data) int {
- Groups: 同组定义, 如 Go 中的 `const( A=1, B=2, C=3)`,Groups 为 `[C=3, B=2]`(假设 A 为变量自身)


- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据


- AnonymousFunctions: 在当前变量的初始化函数中,所定义的匿名函数。每个元素为对应函数的 FileLine


- File: 所在的文件名


- Line: **起始位置文件的行号(从1开始)**


- StartOffset: 代码起始位置**相对文件头的字节偏移量**


- EndOffset: 代码结束位置**相对文件头的字节偏移量**


### Graph

整个仓库的 AST Node 依赖拓扑图。形式为 Identity => Node 的映射,其中每个 Node 包含对其它节点的依赖关系。基于该拓扑图,可以实现**任意节点上下文的递归获取**。
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ go 1.23.4

require (
github.com/Knetic/govaluate v3.0.1-0.20171022003610-9aa49832a739+incompatible
github.com/bytedance/sonic v1.14.1
github.com/cloudwego/eino v0.3.52
github.com/cloudwego/eino-ext/components/model/ark v0.1.16
github.com/cloudwego/eino-ext/components/model/claude v0.1.1
Expand Down Expand Up @@ -43,7 +44,6 @@ require (
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/bytedance/gopkg v0.1.3 // indirect
github.com/bytedance/sonic v1.14.1 // indirect
github.com/bytedance/sonic/loader v0.3.0 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cloudwego/base64x v0.1.6 // indirect
Expand Down
63 changes: 60 additions & 3 deletions lang/golang/parser/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ import (
. "github.com/cloudwego/abcoder/lang/uniast"
)

const (
ExtraKey_FunctionIsCall = "FunctionIsCall"
ExtraKey_AnonymousFunctions = "AnonymousFunctions"
)

func (p *GoParser) parseFile(ctx *fileContext, f *ast.File) error {
cont := true
ast.Inspect(f, func(node ast.Node) bool {
Expand Down Expand Up @@ -121,7 +126,9 @@ func (p *GoParser) parseVar(ctx *fileContext, vspec *ast.ValueSpec, isConst bool

// collect func value dependencies, in case of var a = func() {...}
if val != nil && !isConst {
collects := collectInfos{}
collects := collectInfos{
directCalls: map[FileLine]bool{},
}
ast.Inspect(*val, func(n ast.Node) bool {
return p.parseASTNode(ctx, n, &collects)
})
Expand All @@ -137,6 +144,16 @@ func (p *GoParser) parseVar(ctx *fileContext, vspec *ast.ValueSpec, isConst bool
for _, dep := range collects.tys {
v.Dependencies = InsertDependency(v.Dependencies, dep)
}
if len(collects.directCalls) > 0 {
for i, dep := range v.Dependencies {
if collects.directCalls[dep.FileLine] {
v.Dependencies[i].SetExtra(ExtraKey_FunctionIsCall, true)
}
}
}
if len(collects.anonymousFunctions) > 0 {
v.SetExtra(ExtraKey_AnonymousFunctions, collects.anonymousFunctions)
}
}

if vspec.Type != nil {
Expand Down Expand Up @@ -392,12 +409,19 @@ func (p *GoParser) parseSelector(ctx *fileContext, expr *ast.SelectorExpr, infos
type collectInfos struct {
functionCalls, methodCalls []Dependency
tys, globalVars []Dependency

directCalls map[FileLine]bool
anonymousFunctions []FileLine // record anonymous function
}

func (p *GoParser) parseASTNode(ctx *fileContext, node ast.Node, collect *collectInfos) bool {
switch expr := node.(type) {
case *ast.SelectorExpr:
return p.parseSelector(ctx, expr, collect)
case *ast.CallExpr:
p.parseCall(ctx, expr, collect)
case *ast.FuncLit:
collect.anonymousFunctions = append(collect.anonymousFunctions, ctx.FileLine(expr))
case *ast.Ident:
callName := expr.Name
// println("[parseFunc] ast.Ident:", callName)
Expand Down Expand Up @@ -462,6 +486,22 @@ func (p *GoParser) parseASTNode(ctx *fileContext, node ast.Node, collect *collec
return true
}

// parseCall collect direct call info
func (p *GoParser) parseCall(ctx *fileContext, expr *ast.CallExpr, collect *collectInfos) {
var ident *ast.Ident

switch idt := expr.Fun.(type) {
case *ast.Ident:
ident = idt
case *ast.SelectorExpr:
ident = idt.Sel
}

if ident != nil {
collect.directCalls[ctx.FileLine(ident)] = true
}
}

// parseFunc parses all function declaration in one file
func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Function, bool) {
// method receiver
Expand Down Expand Up @@ -511,7 +551,9 @@ func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Functio
// collect content
content := string(ctx.GetRawContent(funcDecl))

collects := collectInfos{}
collects := collectInfos{
directCalls: map[FileLine]bool{},
}
if funcDecl.Body == nil {
goto set_func
}
Expand All @@ -521,7 +563,6 @@ func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Functio
})

set_func:

if fname == "init" && p.repo.GetFunction(NewIdentity(ctx.module.Name, ctx.pkgPath, fname)) != nil {
// according to https://go.dev/ref/spec#Program_initialization_and_execution,
// duplicated init() is allowed and never be referenced, thus add a subfix
Expand All @@ -544,6 +585,22 @@ set_func:
f.Types = InsertDependency(f.Types, t)
}
f.Signature = string(sig)

if len(collects.directCalls) > 0 {
for i, dep := range f.FunctionCalls {
if collects.directCalls[dep.FileLine] {
f.FunctionCalls[i].SetExtra(ExtraKey_FunctionIsCall, true)
}
}
for i, dep := range f.MethodCalls {
if collects.directCalls[dep.FileLine] {
f.MethodCalls[i].SetExtra(ExtraKey_FunctionIsCall, true)
}
}
}
if len(collects.anonymousFunctions) > 0 {
f.SetExtra(ExtraKey_AnonymousFunctions, collects.anonymousFunctions)
}
return f, false
}

Expand Down
Loading
Loading