Showing posts with label Unit Testing. Show all posts
Showing posts with label Unit Testing. Show all posts

Saturday, 8 April 2023

Safe refactoring using Scientist



Refactoring is a critical yet often overlooked activity in the product development lifecycle. Despite its importance, teams tend to neglect it until they encounter significant showstoppers during the development process. Several factors contribute to teams neglecting refactoring, including.


  • Pressure to meet product release dates
  • Concerns about production stability
  • Difficulty in writing high-quality unit and functional tests


When teams encounter these roadblocks, they often make a business case for refactoring, reengineering, or implementing new technology. Although there may be resistance, the product team typically agrees to it, and it is integrated into the development sprints. If you see a specific tech debt item in the sprint backlog, it is a clear indication that the code's health was not maintained due to business priorities, and now it is time to address it.

To put it in financial terms, failing to maintain code health is like missing an installment payment and incurring additional interest from your banker.

Integrated Development Environments (IDEs) have evolved to provide safe and efficient refactoring options such as renaming variables, extracting methods, simplifying branch conditions, inlining methods, and moving code. While these options are generally beneficial, some refactoring tasks are more complex and riskier. These tasks may involve changing the implementation of core components, such as modifying persistence, altering core algorithms, or adjusting underlying data structures to improve performance.

Some of the way to test such refactoring is by using feature flags or A/B testing.

While browsing through GitHub, I came across Scientist, a library that provides a way to verify critical refactoring. It offers an intuitive approach to code verification. It is based on experiment , observation & verification.




Let's take a look at some code snippets.


Experiment<Integer, Integer> experiment = new Experiment("Next Experiment");

experiment
.withControl("BitCount Using binary string", x ->
(int) Integer.toBinaryString(x)
.chars()
.filter(y -> y == '1')
.count()
);

experiment
.withCandidate("BitCount using native", x -> Integer.bitCount(x));

experiment
.withParamGenerator(() -> 100)
.compareResult("bit length", (control, candidate) -> control == candidate);

experiment
.run()
.publish();

This library has several components, including:

  • Control function
  • Candidate function
  • Experiment parameters
  • Result comparator function

Once you specify these parameters, you can run experiments. As you begin to use the library for more complex problems, additional considerations may arise, such as the number of times the experiment should be run, whether to run them in parallel, and setting timeouts


experiment
.withControl("BitCount Using binary string", x ->
(int) Integer.toBinaryString(x)
.chars()
.filter(y -> y == '1')
.count()
);

experiment
.withCandidate("BitCount using native", x -> Integer.bitCount(x));

experiment
.withParamGenerator(() -> 100)
.compareResult("bit length", (control, candidate) -> control == candidate);

experiment
.times(100)
.parallel()
.run()
.publish();


Other uses of Scientist

Thus far, we have discussed this library's potential for safe refactoring, but it can also be utilized for running experiments alongside real production code. This allows for the experiment to be run under the same constraints as the current code and produce useful feedback.

Since we are discussing experiments, it would be wise to store the results in a database or another system that can keep a log of the experiments.

In many cases, running these experiments can be costly, so previous results can be utilized to verify new code.

Furthermore, this library can be used to test multiple variations of new logic and select the most optimal one.


Code used in this blog is available @ github







Saturday, 7 November 2020

Private method or state testing in JVM

Unit testing private method is not recommended but some time we are in situation when unit test requires to inspect private state of object. As guideline we must avoid such type of design but some time especially when using some framework or library we are left with no options.




One of the such thing i found recently while writing some unit test around spark data frame. As part of one of the feature dataframe/dataset caching was required and no easy way to verify whether caching was done or not. I didn't want to add other layer of abstraction on Spark API to do this.

Code snippet that needs to be tested. 

sparkSession.read
.parquet("....")
.cache // Cache this DF/DS
.createOrReplaceGlobalTempView("mysupertable")

Java has excellent support for Metaprogramming from day 1 and reflection plays big role it that. Reflection can be used in such scenario.

sparkSession.sharedState.cacheManager maintain cached tables details. 

CacheManager is internal spark class and may change with new spark version , so test based on internal details has risk of breaking but it also gives good idea about internals of framework.

Lets try to access private state of cachemanager via java reflection.

Below code snippet will search the fields with specific pattern and make it accessible. 

def fieldValue[T](fieldName: String, obj: Any, cls: Class[T]): T = {
val field = {
val matchedField = classOf[CacheManager].getDeclaredFields()
.filter(_.getName().endsWith(fieldName))
.map(f => {
f.setAccessible(true)
f
})
.head
matchedField
}

field
.get(obj)
.asInstanceOf[T]
}

setAccessible(true) is the key thing, it is required for private or protected members.
One more to highlight that "endsWith" is used for matching rather than exact match because class under test is written in Scala and for Scala part of the field names are generated by compiler, so not possible to find exact match. Java reflection has getDeclaredField function that accept field name also. 

Once member is marked as accessible then read/write is possible for that field.

Sample call to this API will look something like below

fieldValue("cachedData", sqlSession.sharedState().cacheManager(), LinkedList.class)
Unit test code checking cache can be something like below 

LinkedList cachedData = fieldValue("cachedData", sparkSession().sharedState().cacheManager(), LinkedList.class);
int before = cachedData.size();
loadVehicleTable();
int after = cachedData.size();
Assert.isTrue(after == before + 1);

Reflection is very powerful and comes handy in such scenario but comes with tradeoff of relaxed type safety guarantee by treating method or variable as String but it is still better than not testing at all. 
 
One more thing that i discovered during this exercise is that JVM spends some time in checking field access and it causes some overhead when reflection is used.
Turning off access check will make code fast also.

With this trick now you can start testing some of the private function that you wished were public or left public only for testing.