-
Notifications
You must be signed in to change notification settings - Fork 71
EMLab and Data Analysis (in R)
This page is structured in three parts: How to define the query files for analysis, how to do single run analysis using R, and how to do a batch run analysis. Before starting the file rConfig.R should be created in the rscript folder. For this the TEMPLATE_rConfig.R should be filled out (at least with the username, and possibly adjusted to the system configuration. It should be noted that the variable resultFolder should be set to the same folder the variable LOCALRESULTFOLDER in scriptConfigurationsFile.cfg of the headless scripts.
The standard query file used for analysis lies in the emlab-generation subfolder and is named queries.properties, however other query files can be defined in the configuration scripts (scriptConfigurations.cfg and rConfig.R). In principle, two types of queries can be formulated (strictly speaking more, but the analysis scripts are only prepared for these two types): Simple Queries which have a single result per tick (time value), but which may nonetheless be composed by several values (e.g. the installed capacities of all technologies) and Table Queries which have multiple results per tick (e.g. a table with power plants in each line, and values of these power plants defined in the columns). The basic layout of the configuration file is as follows (the double quotes " are mandatory):
"Query_Name1", "Query_Object1", "Query1",
"Query_Name2", "QueryObject2", "Query2",
Table queries need to be signified by a preceding TABLE_ in the Query_Name field. The QueryObject is optional (will be explained in detail for the SimpleQueries) and the queries need to be written in Gremlin.
There are two types of simple Queries: those with multiple values per query, and those with a single value. Those with multiple values have a result format (the query needs to formulated without the double squared brackets!) of:
[`[Key1, Value1],[Key2, Value2],[Key3,Value3],...]
And can easily be formulated with the help of the Query_Object fields. And example of a query of power plant plant capacities is given to demonstrate the principle:
"CapacityinMW", "PowerGeneratingTechnology", "capacity= v.in().filter{(it.dismantleTime > tick)
&& ((it.constructionStartTime + it.actualPermittime + it.actualLeadtime) <=
tick)}.sum{it.actualNominalCapacity}
if(capacity == null) capacity = 0
[v.name, capacity]",
However, more complicated queries can be formulated without the Query_Object field, and by manually initiating an empty list, and adding the subresults to the final result (the square brackets around className... need not be seperated by a `, this is only due to the layout problems with Github):
"FuelPricesPerGJ", "DecarbonizationModel", "fuels = g.idx('__types__'[`[className:'emlab.gen.domain.technology.Substance']`]`.filter{it.name != 'Electricity' && it.name != 'CO2'}
result = []
for(v in fuels){
price = v.in('SUBSTANCE_MARKET').in('MARKET_POINT').filter{it.time == tick}.collect{it.price};
density = v.energyDensity;
inGJ = price[0] / density;
result.add([v.name,inGJ]);}
return result",
Finally, simple queries with only one return value can be used by not returning a [[Key1, Value1],[Key2, Value2],[Key3,Value3],...], but simply a single Value`:
"CO2Emissions_inTonpA", "", "plants = g.idx('__types__')[`[className:'emlab.gen.domain.technology.PowerPlant']`].filter{((it.constructionStartTime + it.actualPermittime + it.actualLeadtime) <= tick) && (it.dismantleTime > tick)};
emissions = plants.collect{f.calculateCO2Emissions(it, tick)}.sum();
return emissions;",
Table Queries should give in each time step a table as a result, which has in the first line the column names, and in all following lines the corresponding values for the queried objects. Thus the result should look like
[`[C1, C2, C3, ...],[Object1ValueForC1,Object1ValueForC2,Object1ValueForC3], [Object2ValueForC1,Object2ValueForC2,Object2ValueForC3],...]
An example queries, which list the Segment Clearing Points of the load duration curve, can thus be formulated as (removing the ` between the square brackets)
"TABLE_SegmentClearingPoints", "
points = g.idx('__types__')[`[className:'emlab.gen.domain.market.electricity.SegmentClearingPoint']].propertyFilter('time', FilterPipe.Filter.EQUAL, tick)
finalResult = []
for(v in points){
finalResult.add([v.tick, v.volume, v.price, v.out('MARKET_POINT').collect{it.name}[0],
v.out('SEGMENT_POINT').collect{it.segmentID}[0], v.out('SEGMENT_POINT').collect{it.lengthInHours}[0]])
}
return finalResult;
",
To do analysis of single runs of EMLab mainly two scripts exist: The singleRunAnalysis.R which contains functions to create figures (and possibly statistics) for single runs of a model, and an example of a Sweave File (emlab-Report.Rnw in the subfolder exampleSweaveReport), which can be used to create compiled PDF documents of several analysis figures, either while running and controlling the model live, or using earlier saved run data to create new reports.
emlab-Report.Rnw contains three parts:
- Function definition of Latex functions (defines how latex elements, such as figures should be created based on function input).
- Function definitions of R functions, to create new plots based on the data frame that is created during the simulation (in the example file
createReportPdf) or after the simulation is finished. The cat function is used to call the latex functions. - The execution of the runSimulation function. When it is operated with two parameters, it starts running the model, creating a report and finally saving the simulation results to a file
RUNNAME.RData(given that the model and Agentspring are already waiting to be executed, and the correct scenario is selected). When the runSimulation function is given the absolute path of an earlierRUNNAME.RDataa model report is created based on earlier simulation results.
To read in simulation results of batch runs, functions from the script AgentSpringHeadlessReader.R can be used (it relies on python scripts to read the data, or already generated csv tables. In case no Python is installed the slower AgentSpringHeadlessReaderPureR.R can be used). The usage of this function makes sure that the dataframe is correctly formatted to be used by the following files.
Functions that can be used to run a batch run analysis by generating statistics and figures are in the file batchRunAnalysis.R.
Examples of how to use these files together can be found in the file exampleBatchAnalysis.R. It is recommended to generate specified analysis files for individual experiments.
#Further reading Good pages to start are:
- http://www.statmethods.net/
- http://docs.ggplot2.org/current/
- If your organization has access: http://www.springer.com/statistics/computational+statistics/book/978-0-387-98140-6
- Searching http://www.rseek.org/