[clang][diagnostics] Stable IDs for Clang diagnostics#168153
Conversation
SARIF diagnostics require that each rule have a stable `id` property to identify that rule across runs, even when the compiler or analysis tool has changed. We were previously setting the `id` property to the numeric value of the enum value for that diagnostic within the Clang implementation; this value changes whenever an unrelated diagnostic is inserted or removed earlier in the list. This change sets the `id` property to the _text_ of that same enum value. This value would only change if someone renames the enum value for that diagnostic, which should happen much less frequently than renumbering. For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for `deprecatedIds`, which let a rule specify the IDs by which it was previously known. This would let us rename, split, or combine diagnostics while still being able to correlate the new diagnostic IDs with older SARIF logs and/or suppressions. Nothing in this change affects how warnings are configured on the command line or in `#pragma clang diagnostic`. Those still use warning groups, not the stable IDs.
|
@llvm/pr-subscribers-clangd @llvm/pr-subscribers-clang Author: Dave Bartolomeo (dbartol) ChangesSARIF diagnostics require that each rule have a stable This change sets the For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for Nothing in this change affects how warnings are configured on the command line or in Full diff: https://github.com/llvm/llvm-project/pull/168153.diff 4 Files Affected:
diff --git a/clang/include/clang/Basic/DiagnosticIDs.h b/clang/include/clang/Basic/DiagnosticIDs.h
index 06446cf580389..9fc49325205a2 100644
--- a/clang/include/clang/Basic/DiagnosticIDs.h
+++ b/clang/include/clang/Basic/DiagnosticIDs.h
@@ -332,6 +332,9 @@ class DiagnosticIDs : public RefCountedBase<DiagnosticIDs> {
/// Given a diagnostic ID, return a description of the issue.
StringRef getDescription(unsigned DiagID) const;
+ /// Given a diagnostic ID, return the stable ID of the diagnostic.
+ std::string getStableID(unsigned DiagID) const;
+
/// Return true if the unmapped diagnostic levelof the specified
/// diagnostic ID is a Warning or Extension.
///
diff --git a/clang/lib/Basic/DiagnosticIDs.cpp b/clang/lib/Basic/DiagnosticIDs.cpp
index a1d9d0f34d20d..e3903a3edadfd 100644
--- a/clang/lib/Basic/DiagnosticIDs.cpp
+++ b/clang/lib/Basic/DiagnosticIDs.cpp
@@ -51,6 +51,22 @@ const StaticDiagInfoDescriptionStringTable StaticDiagInfoDescriptions = {
#undef DIAG
};
+struct StaticDiagInfoStableIDStringTable {
+#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \
+ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY) \
+ char ENUM##_id[sizeof(#ENUM)];
+#include "clang/Basic/AllDiagnosticKinds.inc"
+#undef DIAG
+};
+
+const StaticDiagInfoStableIDStringTable StaticDiagInfoStableIDs = {
+#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \
+ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY) \
+ #ENUM,
+#include "clang/Basic/AllDiagnosticKinds.inc"
+#undef DIAG
+};
+
extern const StaticDiagInfoRec StaticDiagInfo[];
// Stored separately from StaticDiagInfoRec to pack better. Otherwise,
@@ -63,6 +79,14 @@ const uint32_t StaticDiagInfoDescriptionOffsets[] = {
#undef DIAG
};
+const uint32_t StaticDiagInfoStableIDOffsets[] = {
+#define DIAG(ENUM, CLASS, DEFAULT_SEVERITY, DESC, GROUP, SFINAE, NOWERROR, \
+ SHOWINSYSHEADER, SHOWINSYSMACRO, DEFERRABLE, CATEGORY) \
+ offsetof(StaticDiagInfoStableIDStringTable, ENUM##_id),
+#include "clang/Basic/AllDiagnosticKinds.inc"
+#undef DIAG
+};
+
enum DiagnosticClass {
CLASS_NOTE = DiagnosticIDs::CLASS_NOTE,
CLASS_REMARK = DiagnosticIDs::CLASS_REMARK,
@@ -95,6 +119,7 @@ struct StaticDiagInfoRec {
uint16_t Deferrable : 1;
uint16_t DescriptionLen;
+ uint16_t StableIDLen;
unsigned getOptionGroupIndex() const {
return OptionGroupIndex;
@@ -107,6 +132,14 @@ struct StaticDiagInfoRec {
return StringRef(&Table[StringOffset], DescriptionLen);
}
+ StringRef getStableID() const {
+ size_t MyIndex = this - &StaticDiagInfo[0];
+ uint32_t StringOffset = StaticDiagInfoStableIDOffsets[MyIndex];
+ const char *Table =
+ reinterpret_cast<const char *>(&StaticDiagInfoStableIDs);
+ return StringRef(&Table[StringOffset], StableIDLen);
+ }
+
diag::Flavor getFlavor() const {
return Class == CLASS_REMARK ? diag::Flavor::Remark
: diag::Flavor::WarningOrError;
@@ -159,7 +192,8 @@ const StaticDiagInfoRec StaticDiagInfo[] = {
SHOWINSYSMACRO, \
GROUP, \
DEFERRABLE, \
- STR_SIZE(DESC, uint16_t)},
+ STR_SIZE(DESC, uint16_t), \
+ STR_SIZE(#ENUM, uint16_t)},
#include "clang/Basic/DiagnosticCommonKinds.inc"
#include "clang/Basic/DiagnosticDriverKinds.inc"
#include "clang/Basic/DiagnosticFrontendKinds.inc"
@@ -434,6 +468,15 @@ StringRef DiagnosticIDs::getDescription(unsigned DiagID) const {
return CustomDiagInfo->getDescription(DiagID).GetDescription();
}
+/// getIDString - Given a diagnostic ID, return the stable ID of the diagnostic.
+std::string DiagnosticIDs::getStableID(unsigned DiagID) const {
+ if (const StaticDiagInfoRec *Info = GetDiagInfo(DiagID))
+ return Info->getStableID().str();
+ assert(CustomDiagInfo && "Invalid CustomDiagInfo");
+ // TODO: Stable IDs for custom diagnostics?
+ return std::to_string(DiagID);
+}
+
static DiagnosticIDs::Level toLevel(diag::Severity SV) {
switch (SV) {
case diag::Severity::Ignored:
diff --git a/clang/lib/Frontend/SARIFDiagnostic.cpp b/clang/lib/Frontend/SARIFDiagnostic.cpp
index ac27d7480de3e..ac72a7af05429 100644
--- a/clang/lib/Frontend/SARIFDiagnostic.cpp
+++ b/clang/lib/Frontend/SARIFDiagnostic.cpp
@@ -46,7 +46,9 @@ void SARIFDiagnostic::emitDiagnosticMessage(
if (!Diag)
return;
- SarifRule Rule = SarifRule::create().setRuleId(std::to_string(Diag->getID()));
+ std::string StableID =
+ Diag->getDiags()->getDiagnosticIDs()->getStableID(Diag->getID());
+ SarifRule Rule = SarifRule::create().setRuleId(StableID);
Rule = addDiagnosticLevelToRule(Rule, Level);
diff --git a/clang/test/Frontend/sarif-diagnostics.cpp b/clang/test/Frontend/sarif-diagnostics.cpp
index 767c5802ca13d..3f7adb80c67fd 100644
--- a/clang/test/Frontend/sarif-diagnostics.cpp
+++ b/clang/test/Frontend/sarif-diagnostics.cpp
@@ -34,35 +34,35 @@ void f1(t1 x, t1 y) {
// Omit filepath to llvm project directory
// CHECK: test/Frontend/sarif-diagnostics.cpp"},"mimeType":"text/plain","roles":["resultFile"]}],"columnKind":"unicodeCodePoints","results":
// CHECK: [{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
-// CHECK: {"endColumn":1,"startColumn":1,"startLine":12}}}],"message":{"text":"'main' must return 'int'"},"ruleId":"{{[0-9]+}}","ruleIndex":0},
+// CHECK: {"endColumn":1,"startColumn":1,"startLine":12}}}],"message":{"text":"'main' must return 'int'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":0},
// CHECK: {"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":11,"startColumn":11,"startLine":13}}}],"message":{"text":"use of undeclared identifier
-// CHECK: 'hello'"},"ruleId":"{{[0-9]+}}","ruleIndex":1},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: 'hello'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":1},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":17,"startColumn":17,"startLine":15}}}],"message":{"text":"invalid digit 'a' in decimal
-// CHECK: constant"},"ruleId":"{{[0-9]+}}","ruleIndex":2},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: constant"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":2},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":5,"startColumn":5,"startLine":19}}}],"message":{"text":"misleading indentation; statement is not part
-// CHECK: of the previous 'if'"},"ruleId":"{{[0-9]+}}","ruleIndex":3},{"level":"note","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: of the previous 'if'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":3},{"level":"note","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":3,"startColumn":3,"startLine":17}}}],"message":{"text":"previous statement is
-// CHECK: here"},"ruleId":"{{[0-9]+}}","ruleIndex":4},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
+// CHECK: here"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":4},{"level":"warning","locations":[{"physicalLocation":{"artifactLocation":
// CHECK: {"index":0,"uri":"file://{{.+}}"},"region":{"endColumn":10,"startColumn":10,"startLine":18}}}],"message":{"text":"unused variable
-// CHECK: 'Yes'"},"ruleId":"{{[0-9]+}}","ruleIndex":5},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
+// CHECK: 'Yes'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":5},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":12,"startColumn":12,"startLine":21}}}],"message":{"text":"use of undeclared identifier
-// CHECK: 'hi'"},"ruleId":"{{[0-9]+}}","ruleIndex":6},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
+// CHECK: 'hi'"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":6},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":1,"startColumn":1,"startLine":23}}}],"message":{"text":"extraneous closing brace
-// CHECK: ('}')"},"ruleId":"{{[0-9]+}}","ruleIndex":7},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
+// CHECK: ('}')"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":7},{"level":"error","locations":[{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":6,"endLine":27,"startColumn":5,"startLine":27}}},{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":10,"endLine":27,"startColumn":9,"startLine":27}}},{"physicalLocation":{"artifactLocation":{"index":0,"uri":"file://
// CHECK: {"endColumn":7,"startColumn":7,"startLine":27}}}],"message":{"text":"invalid operands to binary expression ('t1' and
-// CHECK: 't1')"},"ruleId":"{{[0-9]+}}","ruleIndex":8}],"tool":{"driver":{"fullName":"","informationUri":"https://clang.llvm.org/docs/
+// CHECK: 't1')"},"ruleId":"{{[A-Za-z0-9_]+}}","ruleIndex":8}],"tool":{"driver":{"fullName":"","informationUri":"https://clang.llvm.org/docs/
// CHECK: UsersManual.html","language":"en-US","name":"clang","rules":[{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"note","rank":-1},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
-// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[0-9]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"note","rank":-1},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"warning","rank":-1},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
+// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":{"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""},{"defaultConfiguration":
// CHECK: {"enabled":true,"level":"error","rank":50},"fullDescription":
-// CHECK: {"text":""},"id":"{{[0-9]+}}","name":""}],"version":"{{[0-9]+\.[0-9]+\.[0-9]+[^" ]*}}"}}}],"version":"2.1.0"}
+// CHECK: {"text":""},"id":"{{[A-Za-z0-9_]+}}","name":""}],"version":"{{[0-9]+\.[0-9]+\.[0-9]+[^" ]*}}"}}}],"version":"2.1.0"}
// CHECK: 2 warnings and 6 errors generated.
|
Sirraide
left a comment
There was a problem hiding this comment.
The implementation of this seems fine to me, but the RFC has to be accepted first before this can be merged.
I don’t think there is a good way to deal w/ custom diag ids. Those shouldn’t be used anywhere in Clang anyway (I know there are a few places that use them, but those should all be updated to use proper diagnostics instead).
|
The RFC has been approved to proceed by the Clang area team. I've updated the PR description to note a few open questions from the RFC, and how this PR chooses to answer each of those questions. |
|
From the Area Team meeting notes, regarding diagnostic IDs:
Should we automatically strip the prefix when generating the stable ID (either in the SARIF emitter, or in tablegen)? |
+1, those custom diagnostics should be using real diagnostics instead.
I think we may want an actual mapping instead; the problem is, we want the ability to rename any part of the diagnostic identifier because it is most readable when it matches the text of the diagnostic to some extent. The prefix happens to be something we change somewhat often as an example. But we sometimes reword diagnostics either because the diagnostic is now being used for more circumstances or because someone just found much better phrasing and it would be bad if our "stable" IDs either 1) weren't actually stable, or 2) were stable but prevented us from improving maintenance costs. In the PR summary you discuss this a bit with an idea of how we could improve it in the future, but I think that's going to be a problem sooner rather than later and we could consider addressing it up front. Because these IDs are not being exposed to the user (either documented on a webpage, displayed as part of the diagnostic text to the user, etc), should we auto-generate a UUID for the diagnostic entries which don't have one already associated with them yet (which is all of them at first), and use that as the stable ID? Then it doesn't matter how the diagnostic changes, the underlying UUID is still the same. I'm not certain of how we would persist this across builds, though, so it might be a terrible idea. But I'm hoping we can find something that 1) doesn't require the developer to manually pick an ID and put it in the .td file, 2) allows us to modify anything about the diagnostic we want without impacting the stable ID. WDYT? |
To persist across builds, the IDs would either have to be in the The SARIF stable ID doesn't have to be meaningful. If we went with UUIDs, someone adding a new diagnostic could just generate one easily enough. We'd just need a check somewhere to prevent duplicates in case of copy-paste mistakes. We could also just use explicitly-assigned, dense-ish stable integer IDs, MSVC-style. Those are easier to use in both written and oral conversation than UUIDs, but they might require slightly more process. We'd do an initial pass to add them to the I suppose we could split the difference and use one of those code phrase generators to make diagnostic IDs. Easier to use in conversation than UUIDs, but less likely to collide than manually-assigned dense integers. The downside is that at some point, we'll find ourselves writing something like "We're getting too many false positive reports for purple-monkey-dishwasher, so we'll need to tune the heuristcs." Does any one of the above approaches seem workable? |
Yeah, and we don't have a way to persist information across builds that I'm aware of.
Yeah, that's why I didn't think we'd want to use dense integers. With UUIDs, downstreams have far less risk of conflicts.
Yeah, that does split the middle, but seems like a nightmare for people adding new diagnostics because they have to figure out a code phrase for each one, and I'm a bit worried about new contributor experience from that too.
Maybe, but what about this as an idea: we have a script that generates a UUID and gives it a What do you think of something along those lines? It's a bit heavier of a process to add a diagnostic than it is today, but we could actually make this a nicer script that does more than just SARIF IDs. e.g., maybe this script generates the full diagnostic for you. Like: (the tool could even automatically add the diagnostic text to the correct .td files directly instead of making people copy and paste it, or have a Maybe I'm scope-creeping too. :-D |
|
@AaronBallman Your idea of having a tool to add a diagnostic got me thinking. Please bear with me... I'll start with a couple of claims: Warnings onlyWe should only really care about stable IDs for warnings, not errors/fatals/notes. While SARIF is capable of representing errors, I don't expect anyone to try to track instances of errors over time. Either your code compiles successfully or it doesn't. If it has errors, you have to fix them. The only reason why I think we should emit errors to the SARIF log at all is that having both errors and warnings in the same file makes it easier to use SARIF as a common structured diagnostic format to drive IDE support, CI reports, etc. So, while SARIF requires us to emit a We seem to use notes only to annotate other warnings/errors. In SARIF, those are supposed to show up as Focus on stability and evolutionThe SARIF ID doesn't need to meaningful; it just needs to be unique and stable. So, instead of focusing on how to choose a stable ID for a warning, we should focus on how to ensure that the ID remains stable. We should make creation of a stable ID trivial, but then detect when new warnings are added or removed, and ensure that the developer has taken any necessary steps to preserve the stable IDs or to record how the stable IDs have evolved. ProposalFor errors, fatals, and notes, we use the in-source identifier of the diagnostic as its SARIF ID. We do not expect this to be stable, but we do not expect SARIF tools to care about the stability of error IDs over time. For warnings, we use the in-source identifier of the diagnostic as its SARIF ID by default. We also allow two additional properties: The effect on common contributor scenarios is as follows:
EnforcementIt should be easy enough to create a CI job that compares the set of diagnostics between the base commit of a PR and the head commit of that PR, and, if anything suspicious changes have occurred, posts instructions for how to deal with renaming/merging/splitting warnings. To give you an idea of how often this would trigger, I wrote a simple script to crawl our Git history, run
(details here) So, the bot would fire ~5 times a month, and actually provoke additional work from the contributor maybe once every month or two. Other than that, the whole diagnostic process would remain the same as today, which still seems like less work than managing GUIDs for every diagnostic. |
This change modifies the one existing Clang test for SARIF output to use the same `normalize_sarif` sed script that we use for diffing SARIF in the Clang Static Analyzer tests, rather than using a complicated set of `CHECK` statements. This did require changing the SARIF output to include line breaks.
|
Pushed an additional commit to reuse the Static Analyzer's |
|
✅ With the latest revision this PR passed the Python code formatter. |
|
@AaronBallman I went ahead and implemented the support for explicitly specifying the stable ID, and for listing previous stable IDs, in a separate PR against this PR. Any thoughts about this approach to managing stable IDs over time? |
I'm not certain I agree with this; there are reasons to track instances of errors over time (for example, gathering statistics about how often users run into a particular error, automatic reverts for certain kinds of errors, etc).
I think we should make the error IDs stable from the start, along with warnings. Notes and remarks, however, don't seem to really have any kind of ID that I think would make sense to track. At least, I can't think of anything!
+1
+1
I think this should apply to warnings and errors both.
I think this makes sense. One question I have is: are these stable IDs usually displayed to the user in tools consuming SARIF? If so, are we sure these default IDs are reasonable for those cases? Or do we have to worry about tools expecting stable IDs to be short (so our long identifiers might get cut off such that the user only sees
Thank you, that's a fantastic analysis of the impacts! I think that's quite reasonable. |
I didn't do a full review, but based on what I saw, I think this is heading in a reasonable direction. But because diagnostics are so central to the compiler, I'd love some second opinions about the design here. I can't quite tell whether we're in RFC territory for this or not, too. CC @erichkeane @Sirraide @cor3ntin @rnk @Xazax-hun for some additional opinions (feel free to tag others if you think they'd interested too). |
|
Just catching up to this, sorry if I'm repeating above conversation:
Honestly, this seems like a reasonable approach to these. All 3 answers are appropriate i think.
We often have error/warning/ext versions of the same thing, so stripping it might be problematic.
I don't think I agree with this. One of the uses of SARIF I think is that you could use it to keep track of a project that you know DOESN'T compile (like, for example, Clang-Linux, which failed for a very long time before it was successful).
This I agree on.
I disagree a bit here? See above.
I think I'd be ok making Tablegen enforce adding a new 'tag' to each diagnostic that has its
I think this approach is a little subtle. And unless we have a way of enforcing when this happened, it'll be missed. IMO, we should just bite the bullet, require StableId via tablegen, and just come up iwth a heuristic for the 'initial set' of errors+ warnings.
I'd rather tablegen enforce the existence, a CI job/etc to do it would be acceptable.
|
|
Both of you make good arguments for keeping error IDs stable, so I'm on board with that. I don't think that decision changes this PR or the tblgen PR I linked to, since the new properties are supported on any
If the initial StableId tag is just auto-generated from the existing definition of the diagnostic, is there any point in actually adding all those initial StableId tags to the
Tblgen can only enforce rules that only depend on the current version of the |
Yes, there IS. This removes an entire class of 'rename problems' (AND the most likely ones!) to the point that PR-time enforcement will catch orders-of-magnitude fewer things. OF your list above, 96 + 15 + 240 all seem like 'not a problem'. This would enforce that breaking the SARIF name is an actual, active, intentional thing, which would remove all of the 13. I would still want periodic checking for these, but it would remove our risk of breaking them by almost 100%.
Right, I'm aware. Bur enforcing that we have those initial values means that the 'implicit' names don't cause an accidental breakage, and catch errors 'earlier' (even before the person submits their patch!). IMO, this will result in a much more stable set of names with less work on our parts. |
OK, let's do that. A couple details: Second, can I add those explicit stable IDs in a follow-up PR? So the PR sequence would be something like:
I'm kind of dreading when PRs 3 and 4 hit my (or anyone else's) downstream fork, but I think having them as separate PRs will make that easier. It would be possible to merge PR 3 (which will probably have some merge conflicts), then go add stable IDs to all the downstream-only diagnostics, then merge PR 4. |
Yep, agreed. I suspect changing
I think a better order: Add support for
Yes, downstreams are going to have some pain here :) Perhaps share the script you use for the explicit stable-id values/adding them would be a friendly thing to do, but that is the risk of being a downstream :) |
|
Separate bikeshed thread on what the right name for
My preference: Either |
OK, I like |
I slightly prefer |
That's enough of a tie-breaker for me. Let's go with I'll merge my tblgen changes into this PR today, and then I'll merge once the implementation has been reviewed. |
… diagnostics This change adds tblgen support for two new properties on `Diagnostic` objects: - `StableId<id>` - Explicitly specifies the stable ID for a diagnostic. If not specified, the stable ID defaults to the name of the diagnostic's enum member as before. The stable ID is what gets emitted as the `id` property of the diagnostic's `Rule` object in SARIF. - `LegacyStableIds<[id1, id2, ...]> - Specifies a list of stable IDs by which this diagnostic was previously known. These are emitted in the `deprecatedIds` property of the diagnostic's `Rule` object in SARIF. This property is used when the diagnostic's stable ID is renamed, when a diagnostic is split into multiple diagnostics, or when multiple diagnostics are merged into a single diagnostic. Most of the code changes are in tblgen, in the code that loads the new properties from their static data tables, and in the SARIF emitter code. There were some further mechanical changes wherever the `DIAG` macro was invoked, since it now has two new parameters. To test these changes, I found a warning that was recently renamed, and added an `LegacyStableIds` list with the previous name, which is what we would have to do for any renamed warning in the future. I then added a test that ensures that when this warning is emitted, the SARIF has the right `id` and `deprecatedIds` properties.
steakhal
left a comment
There was a problem hiding this comment.
Hey, I made it! I left a couple of tiny comments inline, but overall the PR looks very solid and well thought through! I was pleased to review this.
This PR adds support to `clang-tblgen` to enforce that all warnings and extensions have explicitly-specified stable IDs, based on [discussion](llvm#168153 (comment)) in a previous PR. The clang-tblgen option to control this is `-enforce-stable-ids=<error|warning|none>`. To control this as part of the Clang build, specify `CLANG_ENFORCE_STABLE_IDS=<error|warning>` to CMake. The default, for now, is to not enforce explicit stable IDs at all. I'll ramp this up to a warning in an upcoming PR, where I will also add stable IDs for all existing upstream warnings and extensions. We should give downstream forks some time to fix up their private diagnostics before we start enforcing explicit stable IDs with an error. Clang-tblgen will emit fix suggestions for missing stable IDs, both as fixits to the console and as a clang-tidy-style YAML file to a path specified by `-export-fixes=`. The fixes from the entire Clang build can be bulk-applied via clang-apply-replacements; clang-tblgen will emit an additional warning specifying the exact command to run. Re-using the existing YAML diagnostic/fixit support required a bit of minor refactoring. The `clang::tooling` library that contains that support also depends on parts of Clang itself, which would introduce a circular dependency if used in clang-tblgen. The cleanest way I could find to make this work was to ensure that the subset of the tooling library that is now needed in clang-tblgen does not `#include` any Clang headers. Those tooling headers _do_ still depend on some Clang types, but not for the subset of functionality used by clang-tblgen. So, I just used forward declarations of those Clang types, and as long as clang-tblgen doesn't call the affected functions, clang-tblgen will still compile. To make the consumption from clang-tblgen header-only, I had to move a few simple functions to be `inline` in the header as well. The actual changes to clang-tblgen are pretty straightforward, although I did have to plumb the support for YAML fixits through from the command line. Most of the remaining changes are adding explicit `llvm::` qualifications that are now needed because one of the Clang headers that is no longer being included must have had a stray `using namespace llvm;`.
c839c40 to
909afc3
Compare
steakhal
left a comment
There was a problem hiding this comment.
This looks good to me. Thanks for addressing my feedback.
@AaronBallman how to you feel about this?
(I am also affiliated by the same company as Dave; I think we usually wait for external approvals around here)
Aaron is away until the new year, but I think it is good idea to wait for him here. I did a quick review and don't see anything I'm concerned about, but he's much more familiar with Sarif/diagnostics. |
|
Sneak preview of the PR to add enforcement of explicit stable IDs, including auto-fixes. |
|
@AaronBallman, could you take a look at this, please? If I can land this PR, I've got several less controversial SARIF changes stacked up behind it. |
Whoops, this completely fell off my radar, thank you for the ping! I'll review shortly. |
| std::string DiagnosticIDs::getStableID(unsigned DiagID) const { | ||
| if (const StaticDiagInfoRec *Info = GetDiagInfo(DiagID)) | ||
| return Info->getStableID().str(); | ||
| assert(CustomDiagInfo && "Invalid CustomDiagInfo"); |
There was a problem hiding this comment.
Isn't this assert going to trigger on all custom diagnostics? (Same below.)
There was a problem hiding this comment.
For a custom diagnostic, CustomDiagInfo will always be non-null, so the assert should not trigger. See DiagnosticIDs::getDescription() immediately above (not changed in this PR) for an existing example handling the exact same case.
There was a problem hiding this comment.
I had a colleague double-check my logic here, and he concurred with my reasoning. I'm going to merge based on your conditional approval, but do ping me if there's something you want me to follow up on.
There was a problem hiding this comment.
Thanks for digging into that, I think I had my internal logic backwards. LG to land!
|
The only concern I had was regarding the behavior with custom diagnostics; so long as we don't trip the assert on every custom diagnostic, I think this is in good shape. |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/32553 Here is the relevant piece of the build log for the reference |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/45066 Here is the relevant piece of the build log for the reference |
This PR adds support to `clang-tblgen` to enforce that all warnings and extensions have explicitly-specified stable IDs, based on [discussion](llvm#168153 (comment)) in a previous PR. The clang-tblgen option to control this is `-enforce-stable-ids=<error|warning|none>`. To control this as part of the Clang build, specify `CLANG_ENFORCE_STABLE_IDS=<error|warning>` to CMake. The default, for now, is to not enforce explicit stable IDs at all. I'll ramp this up to a warning in an upcoming PR, where I will also add stable IDs for all existing upstream warnings and extensions. We should give downstream forks some time to fix up their private diagnostics before we start enforcing explicit stable IDs with an error. Clang-tblgen will emit fix suggestions for missing stable IDs, both as fixits to the console and as a clang-tidy-style YAML file to a path specified by `-export-fixes=`. The fixes from the entire Clang build can be bulk-applied via clang-apply-replacements; clang-tblgen will emit an additional warning specifying the exact command to run. Re-using the existing YAML diagnostic/fixit support required a bit of minor refactoring. The `clang::tooling` library that contains that support also depends on parts of Clang itself, which would introduce a circular dependency if used in clang-tblgen. The cleanest way I could find to make this work was to ensure that the subset of the tooling library that is now needed in clang-tblgen does not `#include` any Clang headers. Those tooling headers _do_ still depend on some Clang types, but not for the subset of functionality used by clang-tblgen. So, I just used forward declarations of those Clang types, and as long as clang-tblgen doesn't call the affected functions, clang-tblgen will still compile. To make the consumption from clang-tblgen header-only, I had to move a few simple functions to be `inline` in the header as well. The actual changes to clang-tblgen are pretty straightforward, although I did have to plumb the support for YAML fixits through from the command line. Most of the remaining changes are adding explicit `llvm::` qualifications that are now needed because one of the Clang headers that is no longer being included must have had a stray `using namespace llvm;`.

Part of the implementation of [RFC] Emitting Auditable SARIF Logs from Clang
SARIF diagnostics require that each rule have a stable
idproperty to identify that rule across runs, even when the compiler or analysis tool has changed. We were previously setting theidproperty to the numeric value of the enum value for that diagnostic within the Clang implementation; this value changes whenever an unrelated diagnostic is inserted or removed earlier in the list.This change sets the
idproperty to the text of that same enum value. This value would only change if someone renames the enum value for that diagnostic, which should happen much less frequently than renumbering.For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for
deprecatedIds, which let a rule specify the IDs by which it was previously known. This would let us rename, split, or combine diagnostics while still being able to correlate the new diagnostic IDs with older SARIF logs and/or suppressions.Nothing in this change affects how warnings are configured on the command line or in
#pragma clang diagnostic. Those still use warning groups, not the stable IDs.Potential discussion topics
From @AaronBallman on the RFC:
As a starting point, this PR proposes the following answers to those open questions:
diagtoolin the future if users find it relevant.