Programming/Style
Programming Standards
Introduction
Coding conventions are a set of guidelines for a specific programming language that recommend programming styles, practices, and methods for each aspect of a program written in that language.
Purpose
Coding standards are also an often overlooked but very critical attribute of software development. Following a consistent coding standard helps improve the quality of the overall software system. This page will lay out the basic programming style set by the Viper HPC Team.
Software production should have the following attributes:
- Maintainability
- Dependability
- Efficiency
- Usability
Life Cycle
As with all types of design, there are several phases of production from inception to production (and then maintenance through software updates).
Standard methodologies include waterfall, prototyping, iterative and incremental development, spiral development, agile software development, rapid application development, and extreme programming. Agile is the preferred style within the Viper team see the [Agile wikipedia].
- The default software repository method is GitHub or GitLab (essentially the same). Using this is described here; using GitHub.
Requirements
This is basically the requirements of the final production software, it is what you require from the process to the production stage.
Architecture
Software architecture is concerned with deciding what has to be done, and which program component is going to do it (how something is done is left to the detailed design phase, below). This is particularly important when a software system contains more than one program since it effectively defines the interface between these various programs. It should include some consideration of any user interfaces.
Design
Easily the most important stage within the full life cycle and decisions made at this stage will have ramifications for the whole software. The main purpose of design is to fill in the details within the architectural design.
Design requires a good conceptual model of the specification requirement. Without a good model, we have the possibility of missing the goal of the software and also include the possibility of vastly increasing the time taken to project fruition.
Programming Language Choice
Viper provides a large number of different programming languages and can be extended to include software not included within this WIKI.
The usual rule here is to use the best programming language suitable for solving the task required of it.
Coding Standards
Code Layout
You should indent your code. However, it's also worth noting that it is a good idea to keep your indentation style consistent.
This is only a matter of preference. There is no "best" style that everyone should follow. Actually, the best style is a consistent style. If you are part of a team or if you are contributing code to a project, you should follow the existing style that is being used in that project.
The indentation styles are not always completely distinct from one another. Sometimes, they mix different rules. For example, in PEAR Coding Standards, the opening bracket "{" goes on the same line as control structures, but they go to the next line after function definitions.
Here is an example of some research code that works and compiles, but is very hard to read, modify, to scale for anyone else. Reference [1].
#include <stdio.h> int main(void) { int seg[10] = {6,2,5,5,4,5,6,3,7,6}; int d1, d2, d3, d4, m=0, td, ts; for (d1=0; d1<2; d1++) for (d2=0; d2<10; d2++) for (d3=0; d3<6; d3++) for (d4=0; d4<10; d4++) if ((!((d1==0)&&(d2==0))) && (!((d1==1)&&(d2>2)))) { if (d1==0) { ts = seg[d2] + seg[d3] + seg[d4]; td = d2 + d3 + d4; if (ts == td) { m++; printf(" %1d:%1d%1d\n",d2,d3,d4); } } else { ts = seg[d1] + seg[d2] + seg[d3] + seg[d4]; td = d1 + d2 + d3 + d4; if (ts == td) { m++; printf("%1d%1d:%1d%1d\n",d1,d2,d3,d4); } } } return 0; }
'Rationale: This is why a set of programming style recommendations like these are essential and significantly cut down the time spent on software and maintenance.
Source Code Templates
Here is a template of a source code skeleton in PERL, but can be used for many different languages as well:
#!/usr/bin/env perl # # VIPER - HPC # IT Department # Hull University # # Written by DBird (email [mailto:d.bird@hull.ac.uk d.bird@hull.ac.uk]) # # Initial DBird 1.0 2018-02-23 - Initial release # # ---------------- Modules ------------------------------------------ use strict; # perform error checking # ---------------- Variables ---------------------------------------- # ---------------- Constants ---------------------------------------- # ------------------------------------------------------------------- # ---------------- Subroutines -------------------------------------- # ------------------------------------------------------------------- # ---------------- errorTrap ---------------------------------------- sub errorTrap { my ($errMess) = @_; print "An error occurred\n"; print "Reason : $errMess\n\n"; exit; } # ------------------------------------------------------------------- # ---------------- Main Program ------------------------------------- # ------------------------------------------------------------------- # ---------------- Initialisations ---------------------------------- # ---------------- finish up and exit ------------------------------- exit;
Versioning
The method of versioning software is as follows:
<Major version number>.<Minor version number>.<Patch number> for example 2.3.1.
- Major version numbers change whenever there is some significant change being introduced.
- Minor version numbers change when a new, minor feature is introduced or when a set of smaller features is rolled out.
- Patch numbers change when a new build of the software. This is normally for small bug fixes or the like.
- The major version number always starts at 1, a number starting at 0 would be indicated a pre-release and not production quality.*
- Version numbers allow people providing support to ascertain exactly which code a user is running.
- For large projects which involve more than one source file, [Git] is recommended as the versioning tool.
Indentation
Indenting code allows us to see the structure within code blocks and gives an obvious pattern to the structure, an example is shown below:
if(true){ doThis(); } else { doThat(); }
is far less readable then
if(true){ doThis(); } else { doThat(); }
- Note with Python indentation is essential and forms part of the source code structure
Commenting
Commenting code is essential not only to the original author but to maintainers of that code too and should clearly demonstrate the function of the code.
The comments should also be not complicated. The purpose of commenting is to help the person presently reading the source code.
- Avoid obvious comments - commenting on your code is essential in terms of maintainability; however, it can be overdone or just plain redundant.
- Code grouping - more often than not, certain tasks require a few lines of code. It is a good idea to keep these tasks within separate blocks of code, with some spaces between them.
- Keep comments short and simple.
- Comments should not be on every line.
- Loops (e.g.. while, for,....) and variable assignments do not require commenting.
- Comments should be written in English which is the most commonly used language among developers today.
Where to Comment:
- Reference software tickets from our internal system and any others from suppliers as well.
- The top of any program file - This is called the "Header Comment". It should include all the defining information about who wrote the code, why, when, and what it should do.
- Above every function. - This is called the function header and provides information about the purpose of this "sub-component" of the program.
- Inline - within the code where a description is necessary and non-trivial.
- According to the Ada coding standard (2005), "Programmers should include comments whenever it is difficult to understand the code without the comments"
Where to Not Comment:
Where the code is obvious! For example, if you make a comment like the following, it will not be of much help:
! loop from 1 to 10 do i=1,10
Naming Conventions
All Constants within your program should be uppercase throughout:
my $VERSION = "1.1.0";
All variables should start with a lowercase
my $counter = 0;
There are two popular options with more descriptive (and therefore longer variable names):
- camelCase: The first letter of each word is capitalized, except the first word.
- under_scores: Underscores between words, like: pgsql_connect_dbase().
The option selected here must be consistent across the program or suite of programs.
Code Units
The code that a programmer writes should be simple. This can be implemented by using:
- Subroutines that describe primitive actions (e.g. calculateMean() ).
- Complex logic should be avoided.
- Commenting on non-trivial areas.
Complicated logic for achieving a simple thing should be kept to a minimum since the code might be modified by another programmer in the future.
Error Checking
It is very easy to fall into the trap that your code will follow the exact 'golden path' you expect it to, in practice this does not happen with real users and real data.
- Try and generate tests that will cover the maximum coverage of your code, particularly at the module level.
Special areas of interest in error catching are the following:
- User Input which doesn't correspond to expected types or values.
- Data files where the data is out of expected values.
- Out of bound arrays, some languages will not detect this automatically and completely fail.
- Division by zero, square root of negative numbers or logarithm of a negative real number.
Ways of mitigating this would be:
- Use a common errorTrap subroutine to capture errors from a number of different program error events.
Not putting checks where it would be unnecessary slows down the execution, particularly in HPC where speed is a high priority.
For example [Reference 2]:
real :: x x = sin(y) + 1.0 if (x >= 0.0) then z = sqrt(x) end if
Note: In this example x can never go negative anyway, why are we testing for it?
Portability
The software should be capable of being compiled or interpreted on a number of different systems, not just Viper's architecture. If this cannot be achieved (i.e. some reliance on a certain architecture) then the program should be configurable to allow different systems to be used easily.
An example is shown below:
my $ARCHITECTURE = "ARM"; my $NUMBER_CORES = 64;
Code Development
Code Building
A best practice for building code involves daily builds and testing, or better still continuous integration, or even continuous delivery.
Testing
Testing is an integral part of software development that needs to be planned. It is also important that testing is done proactively; meaning that test cases are planned before coding starts, and test cases are developed while the application is being designed and coded.
If this is the modification of existing code, then the appropriate regression testing involving all those parts affected by the change must be carried out before re-deployment.
Debugging
Programmers tend to write the complete code and then begin debugging and checking for errors. Though this approach can save time in smaller projects, bigger and more complex ones tend to have too many variables and functions that need attention. Therefore, it is good to debug every module once you are done and not the entire program. This saves time in the long run so that one does not end up wasting a lot of time on figuring out what is wrong. Unit tests for individual modules, and/or functional tests for services and applications, can help with this.
In the absence of profilers, print statements can be useful although using logging or sys.stdout (Python) are a better approach. These can be enabled with a DEBUG flag, so preventing orphaned print statements within your program and also allowing this to be used during maintenance.
import pdb; pdb.set_trace();
Documentation
A very useful tool here is Doxygen which can create HTML/latex-based documentation. It relies on code commenting to work fully and so the earlier point on this is very important here. Another possible choice is FORD.
Deployment
This would involve the deployment out to Viper's cluster and the possibility of other HPCs (e.g. portability).
Development do nots
- Do not develop code that you can't maintain - or anyone else can maintain either.
- Do not make your software difficult to build and install - this comes back to maintainability again.
- Do not keep the source code to yourself - GitHub is an excellent de facto software versioning system.
- Do not forget documentation - You know how it works, make this knowledge known to others who see your software for the first time.
- Do not overlook testing.
References
- [1] - C code from https://www2.cs.arizona.edu/~mccann/style_c.html
- [2] - Fortran code example from https://www.tutorialspoint.com/fortran/index.htm
Next Steps
- https://en.wikipedia.org/wiki/Comment_(computer_programming)
- https://en.wikipedia.org/wiki/Indent_style
- https://en.wikipedia.org/wiki/Characters_per_line
- https://en.wikipedia.org/wiki/Naming_convention_(programming)
- https://en.wikipedia.org/wiki/Programming_style
- python debugger
- Git versioning tool
- www.software.ac.uk
- Doxygen tool