SBML4Humans GSoC Project Preparation (Week 2 Progress Update)
This project focuses on developing an interactive report interface for presenting SBML model information. This requires a strong base in understanding the current state of the SBML report and the code that helps in generating the reports. This week’s exercises were centred around understanding the reports module and tests.
The second week of preparation taught me many new things related to SBML report generation and rendering of the report. I learnt about reading (parsing) SBML model documents, creating structured and formatted document trees from the parsed models, and presenting them in pretty-printed and HTML formats. During this period, I have studied the following resources:
Overview and test examples of the SBML report generation process from the documentation (https://sbmlutils.readthedocs.io/en/latest/notebooks/sbml_report.html)
Code of the reports module from the repository (https://github.com/matthiaskoenig/sbmlutils/tree/develop/src/sbmlutils/report)
Code of related modules and helper functions which assist in reading SBML documents and rendering of SBML reports (https://github.com/matthiaskoenig/sbmlutils/tree/develop/src/sbmlutils)
Examples of HTML and matplotlib rendered reports of the examples given in the same repository
The prior study of the SBML specification and object definitions in the Week 1 exercises was very helpful. I was able to connect the code for creating and formatting different model objects to their definitions as given in the specification document. (http://sbml.org/Special/specifications/sbml-level-3/version-2/core/release-2/sbml-level-3-version-2-release-2-core.pdf).
Studying the examples given in the documentation gave a good understanding of report generation process by introducing the related functions (eg. create_report), required arguments (eg. sbml document paths, validation and promotion flags). Running the tests showed the workflow of the report generation. Changing the code at different points (on a local branch of the repository) and subsequent run of tests revealed the various breakpoints, warnings and errors that arose due to those changes. This exercise gave in-depth understanding of the function of each line in the code.
Next, I started studying sample HTML reports of example models given in the repository. I used a Google Chrome browser to render the HTML files. The HTML pages showed me how different parts of the model (eg. species, compartments, parameters, unit definitions, etc.) are rendered and presented in a well-structured and tabulated format. The report also provided the history of the model, its creator’s details and an overview of the model in textual format. One of the most useful things of the report that I personally liked was the availability of the XML specification of each component in the report. One can click on the button to view the XML specification of the respective component under its name and a pop-up dialog displays the corresponding XML fragment.
(Model information and overview as rendered in the HTML report for BIOMD0000000012 SBML model)
(Snapshot of the tabulated reactions of the model and their attributes)
This was followed by a detailed study of the code of the reports module. It mainly consists of three interlinked files (formatting.py, sbmlfilters.py and sbmlreport.py). As the name suggests, the formatting.py file defined the display format of contents of the different components. It defined functions for the creation of HTML tags for annotations (eg. span tag, anchor tag, etc), their attributes and values and how to merge those tags into a related block of HTML code. It also define helper functions for date format strings, conversion of MathML to string formats, format of display of reaction equations according to the attributes of the reaction (e.g. reactants ⇒ products for an irreversible reaction, reactants ⇔ products for a reversible reaction). Many other formatting helper functions were defined in the file.
(A snapshot of the XML code of a species component shown in a pop-up dialog)
These formatting functions acted as helper functions at various stages in the main report creation file sbmlreport.py. All report creation methods such as create_report, _create_html, _create_index_html, etc were defined here. It also defined other functions to iteratively fetch different components of the SBML model (for which the report has to be generated) in the form of dictionaries with the attributes as keys for each of the components. Many of these functions made use of the formatting helper functions to return the result in required format to be displayed in the final HTML report. The compiled dictionaries had to be then passed to the HTML rendering engine which in our case is a Jinja engine. The engine uses the dictionaries of items as its context, and puts each of the components’ values at the appropriate places in the HTML template by the use of template tagging. Essential arguments such as template documents, SBML model files, report output location, etc are provided to the create_report function by specifying the path of these resources. Appropriate checks have been included in the code to ensure that the specified paths exist and are cast to the correct Path models. The same casting is done in case of other model components and appropriate warnings and exceptions are raised in case of class mismatches.
I have also tried to cover the other modules and understood the different function definitions in each. The factory module defined the general workflow of creation of the objects and their fragments from scratch. These fragments are iteratively created and assembled into a complete object by these functions. The creation of these objects is triggered by the corresponding object creator functions defined in modelcreator module. These interrelated modules outlined the use of design patterns in the project and helped me to understand the workflow of model and component creation. Equations, units, history and other related files helped me a lot to study the definition of these components in detail and I could very well relate the functionality of the code in these files to their realization as given in the SBML specification.
Running tests using pytest from the command line gave me more insight into the unit-testing features in this project. The different steps in the workflow of certain modules in the project were further highlighted by studying the tests and running them. Changing the code at different points in these tests and observing the corresponding outputs and errors, following the traceback calls and debugging them, was especially useful because it helped me to understand the functionality of each fragment of the code. This was a really helpful exercise towards understanding the codebase better.
A PyCharm Professional v20 IDE was used to browse the code repository on my local computer. The same IDE was also used to run tests and examples using the pytest module. The IDE also provided support for running Jupyter notebook files (.ipynb) and also visualizing plots generated by matplotlib (eg. src/sbmlutils/converters/xpp_example/results/112836_HH-ext.png was generated as an example run of the xppexamples module). The use of this IDE made browsing the repository and studying generated reports very easy.
I will continue to revisit these modules in the future to gain further understanding of each of them. Detailed analysis of the codebase and unittests will give me in-depth understanding of the project (which is based on developing an interactive report interface) in the coming weeks.
Comments
Post a Comment