Skip to content

Commit a3cc253

Browse files
gfsCopilot
andauthored
Add parameter type signatures for IL and JVM methods (#358)
* Add parameter type signatures for IL methods/calls Extract and propagate parenthesized parameter type signatures to enable overload-precise identification and matching of methods and unresolved call targets. - Extractor: ILExtractor now emits il_method_param_signature and il_call_target_param_signature tuples. - DB schema: Added il_method_param_signature and il_call_target_param_signature to semmlecode.binary.dbscheme. - QL API/AST: Exposed/getters for param signatures across CilInstructions, IR, InstructionSig, TranslatedElement/Function/Instruction and transform layers so signatures flow through translation. - Translated implementations: TranslatedCilMethod and relevant translated call/new-object logic return the extracted signatures; non-CIL backends return wildcards where appropriate. - VulnerableCalls: Expanded the vulnerableCallModel and related predicates to include paramSignature and updated matching logic to accept exact signatures or wildcard '*'. - Models: Updated example YAML models to include a '*' paramSignature for existing entries. This change improves precision when matching overloaded methods for analyses such as vulnerable-call detection. * Add method param signatures and JVM stack metadata Expose a getParamSignature API on InstructionSig (and the TransformInstruction implementation) to return parenthesized parameter-type signatures (e.g. "(System.String,System.Int32)"). Extend the extraction DB schema with il_method_param_signature and il_call_target_param_signature to enable overload-precise method identification, and add jvm_stack_height and jvm_stack_slot tables to record JVM stack heights and map stack slots to producer instructions to simplify stack-based dataflow analysis. * Include same-assembly method definitions in vulnerable method closure For root cause mode analysis, where the vulnerable methods being traced are defined in the same binary being analyzed (not referenced cross-assembly), getAVulnerableMethod needs a base case that matches method definitions by their fully-qualified name and parameter signature. Previously, only cross-assembly calls via ExternalRefInstruction were matched as the base case. Intra-assembly calls are handled by the existing transitive getStaticTarget() clause, but the closure never started because the base case only found external ref call sites. The new clause matches methods defined in the current binary against the model, respecting the paramSignature field (including wildcard '*'). For standard cross-assembly analysis this is a no-op since the model methods won't be defined in the binary being analyzed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Sync JVM extractor dbscheme with ql lib The ql lib dbscheme was updated with il_method_param_signature, il_call_target_param_signature, jvm_stack_height, and jvm_stack_slot tables but the JVM extractor's copy was not updated. This causes a schema mismatch when building a JVM database and then running the binary-ql queries against it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add parameter type signature extraction for JVM bytecode The CIL extractor already emits il_method_param_signature and il_call_target_param_signature for overload-precise method matching. This commit adds the same capability to the JVM bytecode extractor. JVM extractor changes: - ParseParamSignature: converts JVM descriptors (e.g. '(Ljava/lang/Object;JJ)V') to human-readable signatures (e.g. '(Object,long,long)') - ExtractMethod: emits il_method_param_signature for method definitions - ExtractMethodRef: emits il_call_target_param_signature for call sites QL library changes: - JvmMethod: add getParamSignature() backed by il_method_param_signature - JvmInvoke: add getParamSignature() backed by il_call_target_param_signature - TranslatedJvmInvoke: wire getExternalParamSignature to instr.getParamSignature() - TranslatedJvmFunction: use method.getParamSignature() instead of wildcard '*' VulnerableCalls.qll: - VulnerableMethodCall: handle case where extRef lacks param signature (backwards compat for databases built before this change) - Root cause base case: handle functions with wildcard param signature Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix JVM param signature to use JVM-specific dbscheme table il_call_target_param_signature references @il_instruction which is incompatible with JVM's @jvm_instruction type. Add jvm_call_target_param_signature table for JVM call target signatures and update the extractor and QL to use it. Also sync all extractor dbschemes (JVM and CIL) with the canonical ql/lib copy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 3f4d327 commit a3cc253

20 files changed

Lines changed: 356 additions & 22 deletions

File tree

binary/extractor/cil/Semmle.Extraction.CSharp.IL/ILExtractor.cs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,11 @@ private void ExtractMethod(MethodDefinition method, int typeId) {
111111
// Write access flags
112112
trap.WriteTuple("cil_method_access_flags", methodId, (int)method.Attributes);
113113

114+
// Write parameter type signature for overload-precise identification
115+
var methodParamTypes = string.Join(",",
116+
method.Parameters.Select(p => p.ParameterType.FullName.Replace('/', '.')));
117+
trap.WriteTuple("il_method_param_signature", methodId, $"({methodParamTypes})");
118+
114119
if (method.HasBody) {
115120
ExtractMethodBody(method, methodId);
116121
}
@@ -182,6 +187,10 @@ private void ExtractMethodBody(MethodDefinition method, int methodId) {
182187
var targetMethodName = $"{declaringTypeName}.{methodRef.Name}";
183188
trap.WriteTuple("il_call_target_unresolved", instrId, targetMethodName);
184189
trap.WriteTuple("il_number_of_arguments", instrId, methodRef.Parameters.Count);
190+
// Emit parameter type signature for overload-precise matching
191+
var paramTypes = string.Join(",",
192+
methodRef.Parameters.Select(p => p.ParameterType.FullName.Replace('/', '.')));
193+
trap.WriteTuple("il_call_target_param_signature", instrId, $"({paramTypes})");
185194
if(methodRef.MethodReturnType.ReturnType.MetadataType is not Mono.Cecil.MetadataType.Void) {
186195
trap.WriteTuple("il_call_has_return_value", instrId);
187196
}

binary/extractor/cil/semmlecode.binary.dbscheme

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2467,6 +2467,28 @@ il_call_target_unresolved(
24672467
string target_method_name: string ref
24682468
);
24692469

2470+
/**
2471+
* Parameter type signature for method definitions.
2472+
* The param_signature is a parenthesized, comma-separated list of fully-qualified
2473+
* parameter type names, e.g. "(System.String,System.Int32)" or "()" for no parameters.
2474+
* This enables overload-precise identification of methods during export.
2475+
*/
2476+
il_method_param_signature(
2477+
int method: @method ref,
2478+
string param_signature: string ref
2479+
);
2480+
2481+
/**
2482+
* Parameter type signature for unresolved method call targets.
2483+
* The param_signature is a parenthesized, comma-separated list of fully-qualified
2484+
* parameter type names, e.g. "(System.String,System.Int32)" or "()" for no parameters.
2485+
* This enables overload-precise matching of call targets.
2486+
*/
2487+
il_call_target_param_signature(
2488+
int instruction: @il_instruction ref,
2489+
string param_signature: string ref
2490+
);
2491+
24702492
il_field_operand(
24712493
int instruction: @il_instruction ref,
24722494
string declaring_type_name: string ref,
@@ -2990,3 +3012,13 @@ jvm_stack_slot(
29903012
int slot: int ref,
29913013
int producer_id: @jvm_instruction ref
29923014
);
3015+
3016+
/**
3017+
* Parameter type signature for JVM method call targets.
3018+
* The param_signature is a parenthesized, comma-separated list of human-readable
3019+
* parameter type names, e.g. "(Object,long,long)" or "()" for no parameters.
3020+
*/
3021+
jvm_call_target_param_signature(
3022+
int instruction: @jvm_instruction ref,
3023+
string param_signature: string ref
3024+
);

binary/extractor/jvm/Semmle.Extraction.Java.ByteCode/JvmExtractor.cs

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,10 @@ private void ExtractMethod(Method method, int typeId, ClassFile classFile, strin
142142
// Extract access flags as raw bitmask
143143
trap.WriteTuple("jvm_method_access_flags", methodId, (int)method.AccessFlags);
144144

145+
// Write parameter type signature for overload-precise identification
146+
var descriptorUtf8ForSig = classFile.Constants.Get(method.Descriptor);
147+
trap.WriteTuple("il_method_param_signature", methodId, ParseParamSignature(descriptorUtf8ForSig.Value));
148+
145149
// Check if this is a static method (for parameter indexing)
146150
bool isStatic = (method.AccessFlags & AccessFlag.Static) != 0;
147151

@@ -647,6 +651,12 @@ private void ExtractMethodRef(Instruction instr, int instrId, ClassFile classFil
647651
int paramCount = CountParameters(descriptor);
648652
trap.WriteTuple("jvm_number_of_arguments", instrId, paramCount);
649653

654+
// Write parameter type signature for overload-precise matching
655+
if (!string.IsNullOrEmpty(descriptor))
656+
{
657+
trap.WriteTuple("jvm_call_target_param_signature", instrId, ParseParamSignature(descriptor));
658+
}
659+
650660
if (!IsVoidReturn(descriptor))
651661
{
652662
trap.WriteTuple("jvm_call_has_return_value", instrId);
@@ -782,6 +792,66 @@ private static int CountParameters(string descriptor)
782792
return count;
783793
}
784794

795+
/// <summary>
796+
/// Converts a JVM method descriptor to a parenthesized, comma-separated
797+
/// parameter type signature, e.g. "(Ljava/lang/Object;JJ)V" becomes
798+
/// "(Object,long,long)".
799+
/// </summary>
800+
private static string ParseParamSignature(string descriptor)
801+
{
802+
if (!descriptor.StartsWith("("))
803+
return "(*)";
804+
805+
int closeParenIdx = descriptor.IndexOf(')');
806+
if (closeParenIdx < 0)
807+
return "(*)";
808+
809+
var paramPart = descriptor.Substring(1, closeParenIdx - 1);
810+
var types = new System.Collections.Generic.List<string>();
811+
int i = 0;
812+
while (i < paramPart.Length)
813+
{
814+
int arrayDims = 0;
815+
while (i < paramPart.Length && paramPart[i] == '[')
816+
{
817+
arrayDims++;
818+
i++;
819+
}
820+
821+
if (i >= paramPart.Length)
822+
break;
823+
824+
string baseType;
825+
char c = paramPart[i];
826+
switch (c)
827+
{
828+
case 'B': baseType = "byte"; i++; break;
829+
case 'C': baseType = "char"; i++; break;
830+
case 'D': baseType = "double"; i++; break;
831+
case 'F': baseType = "float"; i++; break;
832+
case 'I': baseType = "int"; i++; break;
833+
case 'J': baseType = "long"; i++; break;
834+
case 'S': baseType = "short"; i++; break;
835+
case 'Z': baseType = "boolean"; i++; break;
836+
case 'L':
837+
int semiIdx = paramPart.IndexOf(';', i);
838+
if (semiIdx < 0) semiIdx = paramPart.Length;
839+
// Extract class name, convert / to ., strip leading L
840+
baseType = paramPart.Substring(i + 1, semiIdx - i - 1).Replace('/', '.');
841+
i = semiIdx + 1;
842+
break;
843+
default:
844+
baseType = "?";
845+
i++;
846+
break;
847+
}
848+
849+
types.Add(baseType + new string('[', arrayDims) + new string(']', arrayDims));
850+
}
851+
852+
return "(" + string.Join(",", types) + ")";
853+
}
854+
785855
private static bool IsVoidReturn(string descriptor)
786856
{
787857
return descriptor.EndsWith(")V");

binary/extractor/jvm/semmlecode.binary.dbscheme

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2467,6 +2467,28 @@ il_call_target_unresolved(
24672467
string target_method_name: string ref
24682468
);
24692469

2470+
/**
2471+
* Parameter type signature for method definitions.
2472+
* The param_signature is a parenthesized, comma-separated list of fully-qualified
2473+
* parameter type names, e.g. "(System.String,System.Int32)" or "()" for no parameters.
2474+
* This enables overload-precise identification of methods during export.
2475+
*/
2476+
il_method_param_signature(
2477+
int method: @method ref,
2478+
string param_signature: string ref
2479+
);
2480+
2481+
/**
2482+
* Parameter type signature for unresolved method call targets.
2483+
* The param_signature is a parenthesized, comma-separated list of fully-qualified
2484+
* parameter type names, e.g. "(System.String,System.Int32)" or "()" for no parameters.
2485+
* This enables overload-precise matching of call targets.
2486+
*/
2487+
il_call_target_param_signature(
2488+
int instruction: @il_instruction ref,
2489+
string param_signature: string ref
2490+
);
2491+
24702492
il_field_operand(
24712493
int instruction: @il_instruction ref,
24722494
string declaring_type_name: string ref,
@@ -2966,3 +2988,37 @@ jvm_method_access_flags(
29662988
unique int method: @method ref,
29672989
int flags: int ref
29682990
);
2991+
2992+
/**
2993+
* Stack height at entry to a JVM instruction.
2994+
* This is computed by abstract interpretation during extraction.
2995+
*/
2996+
jvm_stack_height(
2997+
unique int instr: @jvm_instruction ref,
2998+
int height: int ref
2999+
);
3000+
3001+
/**
3002+
* Maps a stack slot at a specific instruction to the instruction that produced the value.
3003+
* slot 0 is the top of the stack, slot 1 is below that, etc.
3004+
* producer_id is the instruction ID that pushed this value onto the stack.
3005+
*
3006+
* This allows QL to determine data flow through the operand stack without
3007+
* expensive recursive CFG traversal.
3008+
*/
3009+
#keyset[instr, slot]
3010+
jvm_stack_slot(
3011+
int instr: @jvm_instruction ref,
3012+
int slot: int ref,
3013+
int producer_id: @jvm_instruction ref
3014+
);
3015+
3016+
/**
3017+
* Parameter type signature for JVM method call targets.
3018+
* The param_signature is a parenthesized, comma-separated list of human-readable
3019+
* parameter type names, e.g. "(Object,long,long)" or "()" for no parameters.
3020+
*/
3021+
jvm_call_target_param_signature(
3022+
int instruction: @jvm_instruction ref,
3023+
string param_signature: string ref
3024+
);

binary/ql/lib/semmle/code/binary/ast/internal/CilInstructions.qll

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,9 @@ class CilMethod extends @method {
141141
result.getIndex() = i
142142
}
143143

144+
/** Gets the parenthesized parameter type signature, e.g. `(System.String,System.Int32)`. */
145+
string getParamSignature() { il_method_param_signature(this, result) }
146+
144147
CilType getDeclaringType() { methods(this, _, _, result) }
145148

146149
Location getLocation() { none() } // TODO: Extract
@@ -430,6 +433,9 @@ abstract class CilCallOrNewObject extends CilInstruction {
430433
final int getNumberOfArguments() { il_number_of_arguments(this, result) }
431434

432435
final string getExternalName() { il_call_target_unresolved(this, result) }
436+
437+
/** Gets the parenthesized parameter type signature, e.g. `(System.String,System.Int32)`. */
438+
final string getParamSignature() { il_call_target_param_signature(this, result) }
433439
}
434440

435441
abstract class CilCall extends CilCallOrNewObject {

binary/ql/lib/semmle/code/binary/ast/internal/JvmInstructions.qll

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ class JvmMethod extends @method {
6464

6565
private string getSignature() { methods(this, _, result, _) }
6666

67+
/** Gets the parenthesized parameter type signature, e.g. `(Object,long,long)`. */
68+
string getParamSignature() { il_method_param_signature(this, result) }
69+
6770
predicate isVoid() { this.getSignature().matches("%)V") }
6871

6972
JvmInstruction getAnInstruction() { jvm_instruction_method(result, this) }
@@ -1209,6 +1212,8 @@ class JvmPutfield extends @jvm_putfield, JvmFieldStore { }
12091212
abstract class JvmInvoke extends JvmInstruction {
12101213
string getCallTarget() { jvm_call_target_unresolved(this, result) }
12111214

1215+
string getParamSignature() { jvm_call_target_param_signature(this, result) }
1216+
12121217
int getNumberOfArguments() { jvm_number_of_arguments(this, result) }
12131218

12141219
predicate hasReturnValue() { jvm_call_has_return_value(this) }

binary/ql/lib/semmle/code/binary/ast/ir/IR.qll

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ private module FinalInstruction {
2525

2626
predicate isPublic() { super.isPublic() }
2727

28+
/** Gets the parenthesized parameter type signature, e.g. `(System.String,System.Int32)`. */
29+
string getParamSignature() { result = super.getParamSignature() }
30+
2831
/**
2932
* Gets the fully qualified name of this method in the format:
3033
* "Namespace.ClassName.MethodName".
@@ -302,6 +305,9 @@ private module FinalInstruction {
302305
class ExternalRefInstruction extends Instruction instanceof Instruction::ExternalRefInstruction {
303306
string getExternalName() { result = super.getExternalName() }
304307

308+
/** Gets the parenthesized parameter type signature, e.g. `(System.String,System.Int32)`. */
309+
string getExternalParamSignature() { result = super.getExternalParamSignature() }
310+
305311
cached
306312
predicate hasFullyQualifiedName(string namespace, string className, string methodName) {
307313
exists(string s, string r |

binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Function.qll

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,5 +25,8 @@ class Function extends TFunction {
2525

2626
predicate isPublic() { f.isPublic() }
2727

28+
/** Gets the parenthesized parameter type signature, e.g. `(System.String,System.Int32)`. */
29+
string getParamSignature() { result = f.getParamSignature() }
30+
2831
Type getDeclaringType() { result.getAFunction() = this }
2932
}

binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Instruction.qll

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,9 @@ class ExternalRefInstruction extends Instruction {
176176

177177
string getExternalName() { result = te.getExternalName(tag) }
178178

179+
/** Gets the parenthesized parameter type signature, e.g. `(System.String,System.Int32)`. */
180+
string getExternalParamSignature() { result = te.getExternalParamSignature(tag) }
181+
179182
final override string getImmediateValue() { result = this.getExternalName() }
180183
}
181184

binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/TranslatedElement.qll

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,12 @@ abstract class TranslatedElement extends TTranslatedElement {
263263
*/
264264
string getExternalName(InstructionTag tag) { none() }
265265

266+
/**
267+
* Gets the parameter type signature for an external call with the given tag, e.g.
268+
* `(System.String,System.Int32)`. This `tag` must refer to an `ExternalRef` instruction.
269+
*/
270+
string getExternalParamSignature(InstructionTag tag) { none() }
271+
266272
/**
267273
* Gets the name of the field referenced by an instruction with the given tag. This `tag` must refer to
268274
* a `FieldAddress` instruction (that is, an instruction for which

0 commit comments

Comments
 (0)