Version: 1.1
Date: 12.04.2022
Authors: Mario Scrocca (@marioscrock), Milan Straka (@bioticek)
The trias-extractor offer parser is a module of the Ride2Rail Offer Categorizer responsible for parsing offers from Trias and for converting them to the offer cache schema enabling the categorization.
The parsed input format is mainly based on the Trias specification for a TripResponse message but takes also into account custom extensions (available in the folder extensions) developed by Shift2Rail IP4 projects, i.e., the Coactive extensions and the extensions defined for Ride2Rail.
The procedure implemented by the trias-extractor is composed of two main phases.
Parsing of data required from the Trias file provided to an intermediate representation using in-memory objects. The procedures to parse the data are implemented in the extractor.py module. The intermediate object model used to represent the parsed data is defined in the model.py module.
The defined model reflects the offer cache schema:
- Request: id, start_time, end_time, start_point, end_point, cycling_dist_to_stop, walking_dist_to_stop, walking_speed, cycling_speed, driving_speed, max_transfers, expected_duration, via_locations, offers (dictionary of associated Offer objects)
- Offer: id, trip, bookable_total, complete_total, offer_items (dictionary of associated OfferItem objects)
- Trip: id, duration, start_time, end_time, num_interchanges, length, legs (dictionary of associated TripLeg objects)
- OfferItem: id, name, fares_authority_ref, fares_authority_text, price, leg_ids (list of ids of TripLeg objects covered by the OfferItem object)
- TripLeg: id, start_time, end_time, duration, leg_track, length, leg_stops, transportation_mode, travel_expert, attributes (dictionary of key-value pairs)
- TimedLeg(TripLeg): line, journey
- ContinuousLeg(TripLeg)
- RideSharingLeg(ContinuousLeg): driver, vehicle
Location and its subclasses (StopPoint, Address) are used to support the processing but are not serialized in the offer cache.
The parsing procedure is implemented through the following steps:
- Parse the
TripRequestdata associated with the offers described in the TriasTripResponseobtaining amodel.Requestobject - Parse the
TripResponseContextassociated with the offers described in the TriasTripResponseobtaining a list ofmodel.Locationobjects - Parse all the Trias
Trips and the associatedTripLegs obtaining a set ofmodel.Tripobjects referencing an ordered list ofmodel.TripLegs - Parse the Trias Meta-
Ticketassociated with the different TriasTrips obtaining a list ofmodel.Offerobjects referencing the associatedmodel.Tripand bound to themodel.Requestobject - Parse the Trias
Ticketassociated with each Meta-Ticketobtaining a list ofmodel.OfferItemassociated with amodel.Offerand with themodel.TripLegs covered by the offer item. - Parse the
OfferItemContextfor each TriasTicketobtaining a dictionary of key-value pairs bound to specificmodel.TripLegs associated to themodel.OfferItem
Notes:
- Step 1: if not provided in a parameter, a UUID is automatically assigned to each request received by the
trias-extractorand used as id for themodel.Requestobject - Step 5: a
model.Offercan be associated with nomodel.OfferItemif a purchase is not needed to perform the trip - Step 6: If the
OfferItemContextcontains a composite key, the assumption is that it is composed asoic_key:leg_idand the parsed value should be associated only with themodel.TripLeghaving the providedleg_id. In all the other cases the value parsed is associated to all themodel.TripLegs associated with themodel.OfferItem. The information extracted from theOfferItemContextis merged with theAttributes parsed for eachmodel.TripLeg.
Storing of the data parsed by the trias-extractor to the offer cache. A dedicated procedure is defined for in the writer.py module. The complete serialization is composed of queued commands in a pipeline that is executed as a single write to the offer cache.
The trias-extractor component is implemented as a Python application using the Flask framework to expose the described procedure as a service. Each Trias file processed by the trias-extractor component is mapped to a Request object and then serialized in the offer cache.
Example request running the trias-extractor locally.
$ curl --header 'Content-Type: application/xml' \
--request POST \
--data-binary '@trias/$FILE_NAME' \
http://localhost:5000/extract/?request_id=example_1_1The parameter request_id in the URL, serves for testing purposes to set the request_id to an exact value.
If omitted, a random request_id is generated.
Adding Trias requests to a trias folder in the repository root, the load.sh script can be used to automatically launch the trias-extractor service, the offer cache and process the files. The offer cache data are persisted in the ./data folder.
The request_id (key to access the data parsed from the offer cache) is returned in the response as a field in a JSON body together with the number of offers parsed. Example output:
{
"request_id": "581ec560-251e-4dbe-9e52-8f824bda5eb0",
"num_offers": "15"
}Error code 400 is returned if there is an error in the parsing procedure, code 500 if the request fails for any other reason.
The following values of parameters can be defined in the configuration file trias_extractor_service.conf.
Section cache:
- host - host address of the cache service that should be accessed
- port - port number of the cache service that should be accessed
The trias_extractor/config/codes.csv can be modified to configure the parsing procedure of the Attributes associated with the different TripLeg nodes and the offer item context associated with the different Ticket nodes (offer items). The file defines the admissible keys (key column), the expected range of the values (value_min and value_max columns for numeric datatypes) and the datatype (type column, admissible values are string, int, float, date) to execute a preliminary validation of the value parsed.
Different alternatives are provided to deploy the trias-extractor service.
Running it locally (assumption Redis is running at localhost:6379)
$ python3 trias_extractor_service.py
* Serving Flask app "trias_extractor_service" (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: 441-842-797Running on Docker (executes both the trias-extractor service and a Redis container)
$ docker-compose build
$ docker-compose upChange the build section in the docker-compose file to use the Dockerfile.production configuration that runs the Flask app on gunicorn, remove the environment section.
$ docker-compose build
$ docker-compose upEdit the Dockerfile.production file to set a different gunicorn configuration.