-
Notifications
You must be signed in to change notification settings - Fork 2.9k
NIFI-15448 - Add option for using predefined schemas in GenerateRecord #10752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for proposing this improvement @pvillard31.
The concept of a predefined schema sounds helpful in some scenarios, but I'm somewhat concerned about the choice of various values. Particularly for certain things like stock symbols or cloud provider regions, there appear to be a number of specific choices in this implementation.
As an alternative, it seems like maintain example schemas somewhere else, like the Confluence Wiki pages, would avoid putting a lot of these particular choices in project code.
If there is some other established generic types, that could be another approach.
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also raises questions about field names themselves. Schema.org is one general place for community-based definition of many things, so that could be one potential pattern.
a8a9f49 to
517871b
Compare
|
Thanks for the review @exceptionfactory. I made some changes to:
I would definitely not go with the suggestion of having some Avro schemas in some place (even if it was in an additionalDetails page on the processor) because an avro schema does not rely on the Faker's providers. It only looks at the field types and would generate something completely random which is really not great. Besides the goal is really to make it dead easy to generate some data for users. I find myself wasting a lot of time configuring this processor when I want to generate some basic data for quick demos/tests. I think this would be very helpful for users to have this option. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going with the schema.org approach @pvillard31, it looks like a reasonable way forward. In addition to rebasing, I noted a couple minor recommendations for the test class, otherwise I think this is close to completion.
| public void testPredefinedSchemaPerson() throws Exception { | ||
| // Field names aligned with schema.org/Person and schema.org/PostalAddress | ||
| testPredefinedSchema(PredefinedRecordSchema.PERSON, 5, | ||
| "identifier", "givenName", "familyName", "email", "telephone", "birthDate", "age", "active", "address"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintaining this list of expected field names seems cumbersome. I think it is sufficient to avoid expecting any particular field names and do a simpler check.
| assertNotNull(content); | ||
| assertTrue(content.startsWith("["), "Content should be a JSON array"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check for JSON array seems unnecessary since it is specific to the expected test behavior, and is effectively confirmed using the JsonTreeReader.
| import org.apache.nifi.avro.AvroRecordSetWriter; | ||
| import org.apache.nifi.avro.AvroTypeUtil; | ||
| import org.apache.nifi.components.AllowableValue; | ||
| import org.apache.nifi.json.JsonRecordSetWriter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend avoid use of specific format types for testing, since GenerateRecord is agnostic to the particular format.
517871b to
344d665
Compare
|
Thanks @exceptionfactory - I pushed a commit to address your comments |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pvillard31! +1 merging
Summary
NIFI-15448 - Add option for using predefined schemas in GenerateRecord
Using JSON Record Writer with specified pattern for date, time and timestamp formats:
{ "id" : "e86c234e-133d-48d7-a267-54c3a632d077", "firstName" : "Mercedez", "lastName" : "Rodriguez", "email" : "sheldon.dickens@gmail.com", "phoneNumber" : "(447) 857-5497", "dateOfBirth" : "11/01/2002", "age" : 70, "active" : false, "address" : { "street" : "28368 Cartwright Landing", "city" : "West Sunshine", "state" : "Vermont", "zipCode" : "18616", "country" : "Denmark" } }{ "orderId" : "a19b9fbc-486c-4445-8c20-6fa0f45b88a0", "customerId" : "23105625-d197-4ad7-82ae-2b2e010e978c", "customerName" : "Elouise Nikolaus", "customerEmail" : "myron.schroeder@yahoo.com", "orderDate" : "04/17/2025", "orderTime" : "18:44:47", "orderTimestamp" : "04/17/2025 18:44:47", "totalAmount" : 4496.86, "currency" : "CAD", "status" : "SHIPPED", "shipped" : true, "itemCount" : 3, "lineItems" : [ { "productId" : "PRD-71258615", "productName" : "Heavy Duty Wooden Car", "quantity" : 7, "unitPrice" : 164.17 }, { "productId" : "PRD-63121751", "productName" : "Fantastic Paper Shoes", "quantity" : 9, "unitPrice" : 327.25 }, { "productId" : "PRD-42145137", "productName" : "Enormous Paper Table", "quantity" : 2, "unitPrice" : 201.21 } ] }{ "eventId" : "dc8a0971-bce7-40b8-8a9c-af9a1d8ef18d", "eventType" : "WARNING", "eventDate" : "12/10/2025", "eventTime" : "17:38:01", "eventTimestamp" : "12/10/2025 17:38:01", "source" : "api-gateway", "severity" : "CRITICAL", "message" : "Illo quibusdam eligendi fugiat a quaerat eos laborum.", "processed" : true, "retryCount" : 0, "durationMs" : 4472, "tags" : [ "automated", "pending" ], "metadata" : { "environment" : "staging", "correlationId" : "85e7317e-6550-4d1b-a14d-646bcf4809c4", "region" : "eu-west-1", "version" : "1.5" } }{ "sensorId" : "SNS-3385333161", "deviceType" : "MULTI", "manufacturer" : "EnviroMonitor", "readingTimestamp" : "01/09/2026 12:39:43", "temperature" : 39.71, "humidity" : 61.94, "pressure" : 1017.96, "batteryLevel" : 38, "signalStrength" : -46, "online" : false, "location" : { "latitude" : 55.63841, "longitude" : -134.873745, "altitude" : 336.13 } }{ "productId" : "48083f6d-7fc7-4a90-8eca-3303d527e4d5", "sku" : "SKU-43443749", "name" : "Gorgeous Copper Car", "description" : "Voluptatibus nesciunt possimus totam nobis. Eum dicta deserunt. Expedita accusantium quisquam. Reiciendis porro modi officiis numquam necessitatibus. Repellat asperiores distinctio est velit.", "category" : "Jewelry & Tools", "brand" : "Zemlak and Sons", "price" : 1441.33, "currency" : "USD", "inStock" : true, "quantity" : 166, "rating" : 4.8, "reviewCount" : 3081, "createdDate" : "01/12/2025", "lastUpdated" : "01/03/2026 17:03:25", "tags" : [ "bestseller", "exclusive" ], "dimensions" : { "length" : 9.24, "width" : 68.33, "height" : 77.48, "weight" : 48.9 } }{ "tradeId" : "d3400be3-e1b9-4dcb-ab2a-ef025750f49e", "symbol" : "GOOGL", "companyName" : "Alphabet Inc.", "exchange" : "NYSE", "tradeType" : "BUY", "tradeTimestamp" : "01/08/2026 18:57:43", "price" : 2341.3815, "quantity" : 6428, "totalValue" : 1.505040028E7, "currency" : "USD", "bidPrice" : 2339.0401, "askPrice" : 2343.7229, "high52Week" : 3512.0723, "low52Week" : 1404.8289, "marketCap" : 1227598881693, "settled" : false }{ "id" : "4043aad4-1160-4aa2-a60e-49171c9a4d54", "active" : true, "score" : 44, "count" : 186350, "rating" : 4.99, "price" : 743.37, "balance" : 44918.92, "initial" : "M", "flags" : 77, "rank" : 824, "createdDate" : "03/31/2025", "lastLoginTime" : "21:26:28", "lastModified" : "01/05/2026 01:02:56", "tags" : [ "automated", "important", "verified", "pending" ], "scores" : [ 91, 93, 96 ], "metadata" : { "environment" : "staging", "source" : "web", "region" : "us-east-1", "version" : "1.5" }, "profile" : { "firstName" : "Cinthia", "lastName" : "Windler", "email" : "marcela.mante@yahoo.com", "age" : 79, "verified" : false, "address" : { "street" : "8130 Jenee Ford", "city" : "West Lewisport", "state" : "Indiana", "zipCode" : "15860", "country" : "French Southern Territories", "coordinates" : { "latitude" : -73.280023, "longitude" : -98.408473 } } }, "orders" : [ { "orderId" : "ORD-55940080", "amount" : 268.44, "currency" : "USD", "placed" : "12/14/2025", "shipped" : true } ] }Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000VerifiedstatusPull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation