Saturday 7 November 2020

Private method or state testing in JVM

Unit testing private method is not recommended but some time we are in situation when unit test requires to inspect private state of object. As guideline we must avoid such type of design but some time especially when using some framework or library we are left with no options.




One of the such thing i found recently while writing some unit test around spark data frame. As part of one of the feature dataframe/dataset caching was required and no easy way to verify whether caching was done or not. I didn't want to add other layer of abstraction on Spark API to do this.

Code snippet that needs to be tested. 

sparkSession.read
.parquet("....")
.cache // Cache this DF/DS
.createOrReplaceGlobalTempView("mysupertable")

Java has excellent support for Metaprogramming from day 1 and reflection plays big role it that. Reflection can be used in such scenario.

sparkSession.sharedState.cacheManager maintain cached tables details. 

CacheManager is internal spark class and may change with new spark version , so test based on internal details has risk of breaking but it also gives good idea about internals of framework.

Lets try to access private state of cachemanager via java reflection.

Below code snippet will search the fields with specific pattern and make it accessible. 

def fieldValue[T](fieldName: String, obj: Any, cls: Class[T]): T = {
val field = {
val matchedField = classOf[CacheManager].getDeclaredFields()
.filter(_.getName().endsWith(fieldName))
.map(f => {
f.setAccessible(true)
f
})
.head
matchedField
}

field
.get(obj)
.asInstanceOf[T]
}

setAccessible(true) is the key thing, it is required for private or protected members.
One more to highlight that "endsWith" is used for matching rather than exact match because class under test is written in Scala and for Scala part of the field names are generated by compiler, so not possible to find exact match. Java reflection has getDeclaredField function that accept field name also. 

Once member is marked as accessible then read/write is possible for that field.

Sample call to this API will look something like below

fieldValue("cachedData", sqlSession.sharedState().cacheManager(), LinkedList.class)
Unit test code checking cache can be something like below 

LinkedList cachedData = fieldValue("cachedData", sparkSession().sharedState().cacheManager(), LinkedList.class);
int before = cachedData.size();
loadVehicleTable();
int after = cachedData.size();
Assert.isTrue(after == before + 1);

Reflection is very powerful and comes handy in such scenario but comes with tradeoff of relaxed type safety guarantee by treating method or variable as String but it is still better than not testing at all. 
 
One more thing that i discovered during this exercise is that JVM spends some time in checking field access and it causes some overhead when reflection is used.
Turning off access check will make code fast also.

With this trick now you can start testing some of the private function that you wished were public or left public only for testing.

1 comment: