IDEMS Response to the R-Instat review

An impactful review

R-Instat is the 9th click-and-point front end to R to have been reviewed by Bob Muenchen.  His R-Instat review and the comparison of the 9 packages are now live. We are extremely grateful to Bob, for the time he has taken to produce this review, for the resulting report and for the constructive feedback this gives. This helped us recognise that R-Instat is already potentially ready to be of value to a wider audience than we had previously realised as well as helping us improve it going forward. 

For IDEMS specifically the implications of this review are substantial.  Before, our approach was to focus our efforts on building the underlying structures in general and the Climatic application area in particular.  We were also supporting our African Data Initiative (ADI) colleagues on the general audience improvements as per their needs without pushing R-Instat to be what we would consider ‘ready’ for general release. David Stern has stated to colleagues that this “changes everything” and we all agree that the review is “not too bad, all things considered”.

This response largely describes our resulting plans and change of focus for R-Instat. The most notable change is in our definition of what we would now consider a first production ready release. Until now we had a clear plan of our aim for our 1.0 release, which we have been gradually moving towards. This review has made us realise that plan should be our 2.0 release, and there is a simpler version which we would consider ‘ready’ for general production.

The review was of R-Instat version 0.7.3.1.  We will release an update soon (Version 0.7.4),  then work towards version 0.8 by mid 2022 and version 0.9 by the end of 2022. Version 0.9  represents a beta version of our new vision for a 1.0 release.

This is the most impactful change in direction in the R-Instat project since work started in 2015, which is saying something as furthering R-Instat development was a motivating factor in the creation of IDEMS International in 2018. We attribute this change to the depth of insights Bob has contributed, through his thorough, thoughtful and thought provoking review.

Building on our strengths

The review highlights some of the strengths of R-Instat, some of which we were aware of, while others took us by surprise.

Climatic analysis

This was only mentioned in passing but is a considerable strength of the current version. For climatic analyses, for staff in many National Meteorological Services (NMSs) the current version is in use. Resources include a 280 page user-guide and a series of 20 short videos including on data tidying, quality control and other topics.

This remains a key area of focus and has provided a consistent source of R-Instat funding through workshops and funded development. This is already set to continue and expand through 2022.

Import and Export

R-Instat’s strength in this area was noted.  It is based on the excellent rio R package that describes itself as a “swiss-army knife” for the task.  Our import (and export) of spreadsheets, and of files from the three main statistics packages (SAS, SPSS and Stata), fits with our views that users of other packages may wish to add R-Instat to their collection, rather than needing to choose a single package.

Links to databases are only currently available for specific applications. The review highlighted the importance of this area, so expanding database linkages is now planned in 2022.

Data Wrangling

R-Instat’s Prepare menu was noted as having one of the most comprehensive sets of data management tools of any GUI. We have focused on these features since beginning R-Instat, partly due to our experience working with people who have often required the most support in the tidying and preparation steps of their data analysis. In particular, climatic data has often required extensive tidying as noticed in Tidy Data paper where the complex example was of a climatic data set

The review has still pinpointed topics for improvement. The review has highlighted how we could make the handling and converting of column types more natural from a users perspective. Similarly, repeated operations for large datasets have been on our agenda for a long time and the review has given us direction on steps to add efficiency here.

Graphical Data Visualisation

We have put considerable effort into the system for graphics in R-Instat. However, we were pleasantly surprised to find it was already rated quite so strongly in the review, given that we still have substantial work planned in this area. R-Instat uses the ggplot2 graphics wherever possible, partly because of our recognition of the Grammar of Graphics, even before it’s R implementation. Hence, we have laboured, since the start, on developing a comprehensive system around ggplot2 within R-Instat.

Our next steps include plotting maps beyond the climatic menu for general use, developing a simple system for including summary and model results on graphs.

Modelling

The review claims this area is potentially good for existing R users who do not model frequently enough to have the package::function information at their fingertips. Our surprise is that this was seen as positive as we felt that modelling was an area of real weakness that needed a large amount of work. We can now see that substantial improvements can be made with a relatively small input, and larger tasks can be left for future releases.

Accessing complexity

The review points out that R-Instat is making some complex R tasks easier and more accessible. This includes producing complex ggplot2 graphs, and the often difficult data wrangling tasks.

This will remain a positive design choice of R-Instat, and we will continue to make complex R tasks accessible to more users. We accept that some of these tasks may remain out of reach for some users.

Supporting R users

The review emphasised that R-Instat may be more suited to users with some familiarity or interest in R. We aim for R-Instat to be a stepping stone to R, for those who would appreciate a gentler introduction to R that also enables them to be productive in their data analysis.

We plan to continue to develop facilities to support this user group, including: improving the output, script and log windows.

Owning our weaknesses

The review also highlights areas where R-Instat is currently weak.  Again some of these areas we were aware of, while others took us by surprise. Now they are pointed out, we can address them where possible or accept and mitigate against their effects.

Welcoming new users

In R-Instat, all changes to the data are done in R.  This is recorded as R commands and is an important feature of R-Instat.  It enables a comprehensive log file to be kept.  This provides a complete record of everything that was done.

By default we show the R commands in the output (results) window.  We have felt this does no harm and it shows that R-Instat is not a “black box”.  It shows there is a logical reason for any result, or change in the data.  However, we now recognise that, for some (many?) potential users, these commands are exactly why they don’t use R.  

This is not at all welcoming for some new users.  They plan to use just the “click and point” software and these commands can be daunting from the beginning.

Now you can right-click in the results window and turn them off each time (or use the Tools > Options dialog to set them off for the future.) But, we will now change the default, so more experienced users are encouraged to turn them on.

Furthermore, according to the reviewer, many users start learning a new package by typing in a small set of data.  This uses the first dialogue in the first menu, namely File > New Data Frame.  The default option in this dialogue indicates the power of R-Instat for those who are happy with R commands.  But, for everyone else, it is pretty daunting. 

Thanks to this review, we will now, through the course of 2022, be making a number of changes to the default options of R-Instat to make it more welcoming for such an audience.

Ease of use

We strongly believe in “making it easy”, however the review has helped us to accept that we are not currently going to be able to make R-Instat easy to use for some audiences. This would require us to embrace simplicity from the perspective of specific user audiences, which we had not fully considered before.

Some simplicity would inevitably come at the expense of the deeper functionality we have prioritised. We have made the conscious choice to expose users to the incredible power of R on many occasions. We admit that there would be great value in hiding some of this power for some audiences and aim to eventually have multiple skins which could be tailored to the needs of a specific audience. This is a large task which we intend to pursue once we are happy with the general release, or upon special request.

However there are many smaller changes that could make the user experience appear simpler by changing behavior. We can see how we have been neglecting some of these opportunities to embrace simplicity in our design and intend to work on this through the course of 2022.

In short, given we will not yet be hiding the complexity of the software, we accept that R-Instat will still potentially be too daunting for some, but hopefully the upcoming changes will increase the accessibility for many.

Handling of multiple data frames

The review has helped us recognise that users may find that the way multiple data frames are handled may be confusing within dialogs, particularly when multiple data frames have similar column names. Although this functionality ensures that dialogs re-open in the way they were closed, in some cases this is not what users may expect and could lead to frustration.

There are no simple solutions to this problem, but this is now firmly on our radar as a feature to improve. We will now be keeping this at the front of our mind as we design the improvements to the output window, log window, access to the list of previous dialogs, and upcoming features to repopulating dialogs from a script, all of which could contribute to create a more natural navigation back to dialogs.

Group by

The review notes that R-Instat lacks “group-by” facilities. This is the facility to repeat analyses, or other tasks, for different levels of a factor, or for different variables. For example, with data from 3 sites you may wish to report on the results separately for each site. With a large number of groups, repeat use of a dialog is impractical and a “group by” facility is needed. 

This feature has been on our roadmap for some time, but the review has highlighted the importance of this and as such we will prioritise implementing this feature in 2022.

Data selection

We were surprised that our data selection features in dialogs were seen as a weakness. This was due to it being quite different to the way other GUI software work and hence may feel unnatural for some users.

We appreciate these comments and suggestions for improvements. We recognise that we have relied too much on functionality via right click and we could add other features that are more in line with other GUI software to make this more user friendly.

Halfway dialogs

R-Instat includes a number of “halfway” dialogs, which allow for entering an R command e.g. to fit a model, with help from special keyboards that give access to command R commands.

The review noted that these keyboard groups and keys are named with the corresponding R packages and functions. Hence, for users who are not familiar with any R terminology, these dialogs will be difficult to use.

We aim to accompany these complex dialogs with simpler dialogs. However the review has helped us realise that simply improving these halfway dialogs could already be impactful and this is planned as a priority for 2022.

Modelling

R-Instat was noted as lacking Baysian and machine learning capabilities. They are on our roadmap to include, and starting this year.  Even higher priority, as mentioned above, is to improve the main dialogues to simplify the model fitting that is currently only available through the halfway dialogues which require some knowledge of R.  The graphical facilities will also be enhanced to make it easier to display model results and other numerical summaries on the corresponding visualisations.

Managing projects

The review noted we lack a single “save project” option, which saves all parts of the environment. In R-Instat it is possible to save each component of the work, namely data, results, log and a script, as well as export in a multitude of ways ready for other software. We had prioritised this approach as we anticipated the need to share beyond R-Instat above the thought of reopening within R-Instat. However, we now recognise that the lack of a single project file is a serious oversight and would be a valuable feature for all users. 

This will be a priority for 2022 which we intend to combine with the potential to author and manage multiple scripts simultaneously.

Publishable output

The lack of publishable output was a major critical point in the review. We recognise that this has been neglected as an important priority. In particular, the lack of well formatted tables was highlighted in the review. There are a lot of small things that can be done in R-Instat to quickly improve the publishability of the output. 

Steps to improve this are already underway, including a new output window that will be available in version 0.7.4, and embedded html tables shortly after. In fact, this will now be a top priority for improvement through the course of 2022 and we are quite excited by some of the changes we have planned.

Help

The review noted that a help system is in place in principle, rather than practice.  There is a help button in place on each dialogue, but currently much of the content is absent.  This will be rectified in the updates this year to provide more comprehensive help for the entire software.

Tutorial documents are available on the R-Instat website for beginners getting started with R-Instat. For climatic users, there is more help, including a series of 20 short videos for climatic analyses. Corresponding videos are planned to be created for general use.

Final remarks

R-Instat has been a “labour of love” since 2015.  It currently has been adopted by a number of climatic data users as well as finding a small niche user group involved in statistics capacity-building in Africa. We have always been working towards a version with a much wider appeal.  This review has shown we may already have a usable product for some, and are closer than we thought, to being able to appeal to many. 

Part of the motivation behind the creation of IDEMS and its partner, INNODEMS, was to support the creation of R-Instat indefinitely and to create service-based support.  This is already the case for the climatic area, where R-Instat generates modest funding through training and also the occasional support for specific products. As well as contributing to the development itself IDEMS also supports INNODEMS, a Kenyan collaborator, enabling it to contribute substantially to the development of both the software and the support materials.

We now look forward to a hard-working year to produce a more polished product that could serve a much wider audience. We expect this will be primarily supported as IDEMS investment towards future R-Instat services, but may investigate other avenues to support this development.

We thank Bob again for the spur to our forthcoming activities, through his most useful evaluation of the current product.