Tuesday, October 15, 2013

Using Clang to analyze legacy C++ source code

Legacy Code

At IAR Systems where I work, we have a fair amount of C++ source code, some of which is very old. The oldest parts probably date back to the mid 90s. We've done a fairly good job at maintaining the code so many parts are still in active use.

There is one particular code base which has been of interest to me for some time. The code base is (in code years) very old, I believe many parts of it has roots in the mid 90s. The code is used to describe configuration data together with functional parts which describe relationships between the objects in the configuration. There are two main problems with the code:

  1. The only way to reliably change properties in the model is to interact with a UI: pressing buttons, entering text, etc. The model can be serialized to XML, but the XML contents cannot be correctly understood without access to the C++ code.
  2. The only output possible is a sequence of strings on a certain format to be passed on to other tools.
In other words, you can edit the model using a UI, and you can execute the model by passing the results to other tools. For example, it is not possible to
  • Enumerate all properties which can be modified.
  • Describe the properties (their type, the possible values)
  • Describe how properties interact with other properties. The set of valid values of a property may depend on the values of other properties, but this is hidden in the C++ code.
A typical piece of code might look like this:

class Person;
class Manager : public Person
{
    ...
};

class Assassin : public Person
{
    ...
};

class SecretAgentBehavior : public DefaultSecretAgentBehavior
{
    SecretAgentBehavior(Person boss) { ... }
    std::string getArguments()
    {
        Assignment a = boss.getAssignment();
        return "-dostuff " + a.getName + " -o output.txt";
    }
};

// Bob The Boss is a secret agent manager
Manager boss("Bob The Boss", new BossBehavior /* defined elsewhere */);

// John Smith is an assassin, and Bob The Boss is his manager
Assassin john("John Smith", new SecretAgentBehavior(boss));

I would like to be able to extract as much useful information from this code base as possible, to facilitate a future migration of the configuration data to a system which makes it easier to analyze the data and generate different outputs based on the data. Enter Clang.

Clang

The first approach I tried was to try to modify the code itself to produce some useful information, but this did not work well. The necessary metadata was simply missing. For example, the only place where the type of the model elements (the secret agents in the example above) was stored was as C++ types. The limited support for reflection in C++ made this approach impossible.

I then got this "crazy idea" to try to use Clang to parse the C++ source code. The source code itself is fairly structured and follows a handful of different patterns, so we are not talking about analyzing arbitrary C++ source code. Also, we are only interested in parsing the source code, not generating executable code. This is fortunate, since the code in question has never been compiled by anything other than Visual Studio, and uses lots and lots of Visual Studio-specific things (MFC, for example). Would it be possible for Clang to at least be able to build an AST of the code?

It turned out to be if not trivial, at least fairly easy. The main problem was that Clang was unable to parse the Windows-specific header files defining things like CString, CWinApp, etc. I solved this by placing dummy definitions in a special header file. To make sure that all source files which expects to get these definitions actually get them, I created a set of replacement header files (afx.h, windows.h, etc.) which all simply included the header file with the dummy defininitions. For example, the definition of CWinApp looks like this:

class CWinApp
{
public:
  HICON LoadIcon(LPCTSTR name);
  HICON LoadIcon(UINT resid);
};

That's it. Since Clang does not need to compile and link, these kind of dummy definitions are enough.

Ok, so once the code base passed successfully through clang-check, then what? How do we get any useful information out?

AST Matchers

There are good tutorials on how to write a Clang tool, so I will skip over that here.

The Clang AST is a complex beast (run clang-check -ast-dump on any non-trivial program), and to make it easier to navigate and make sense of the AST, the AST Matchers API allows you to write "simple" rules describing the parts of the AST that you are interested in.

For the example above, a rule which matches the persons may look like this:

varDecl(
    hasDescendant(
        constructExpr(
            hasType(recordDecl(isSameOrDerivedFrom("Person"))),
            hasArgument(0, expr().bind("name")),
            hasArgument(1, expr().bind("behavior")))));

The bind() calls are placed so that the corresponding AST node can be extracted in the callback function which is invoked each time the rule matches.

The rule will be invoked twice, once for the boss and one for the assassin. The callback function looks like this:

virtual void run(const MatchFinder::MatchResult &result) 
{
    const Expr *name = result.Nodes.getNodeAs("name");
    const Expr *behavior = result.Nodes.getNodeAs("behavior");

    // Now we can do interesting things
    name->dumpColor();
    behavior->dumpColor();
}

Since Clang gives us access to the entire C++ AST (including information from the preprocessor and the source code itself), we can extract all sorts of useful information from here. For example, we can generate output which contains the configuration data on a structured format, together with the source code implementing the functional parts.

person:
   name: Bob The Boss
   type: Manager
   source: { 
Manager boss("Bob The Boss", new BossBehavior /* defined elsewhere */);
   }

Of course, we still would need to implement the getArguments function somewhere.

Conclusion

Clang is the perfect tool (or rather, platform) for analyzing C/C++ source code. It gives full access to the entire AST, since this is the same AST which is used by the actual Clang compiler, it gives you the complete AST, and not some approximation.  The AST matchers framework also is a major time-saver, since it allows you to match out the parts of the AST you are interested in without having to write large statemachine-like code to keep track of where in the AST you are.


Tuesday, May 21, 2013

Garmin 600 initial impressions

My previous GPS went AWOL a few months ago, and I've been waiting for Garmin to release their new Oregon 6xx series so that I can get a new one. Last week I saw that they were in stock at http://www.cyberphoto.se/, so I put in an order and I got it delivered the next day. Excellent service!

The new Oregon 6xx comes in four different variants (as with the 5xx series). The models vary primarily in whether or not they have a camera (600 vs 650) and whether or not they include a topographical map (6xx vs 6xxt). I opted for the cheapest one (600); I don't use the camera much, and OSM maps works well enough. The 650-series also sport a rechargable batterypack which can be charged while the device is connected via USB.

The overall impression is very good. I was a little worried about the ruggedness, but this definitely feels like a device made for outdoors activities. Even though the real test will come when I actually start using it for some actual work. The new capacitive touch screen is really nice, pinch zoom and scrolling is smooth and snappy. I haven't tried it with gloves yet, though.

GSAK still does not seem to support the 6xx series, so I used the 5xx setting which limits the size of a cache file to 5000 caches. Hopefully there is a GSAK update soon so I can take advantage of the possibility of exporting my entire cache database.

Since I did not opt for the model with the topo map, I had to download a OSM map separately. This is super easy to do from here: http://garmin.openstreetmap.nl/. I selected "Sweden", downloaded the map, renamed the "gmapsupp.img" file to "sweden.img" and copied the map file to the Garmin directory on the device.

Now, lets take the little thing out for some fun.

Monday, April 22, 2013

Using gnome-keyring to avoid storing unencrypted passwords (Ubuntu 12.10)

I finally got around to figuring out how to avoid the pesky

-----------------------------------------------------------------------
ATTENTION!  Your password for authentication realm:

   <http://xxx:80> My Subversion server

can only be stored to disk unencrypted!  You are advised to configure
your system so that Subversion can store passwords encrypted, if
possible.  See the documentation for details.

You can avoid future appearances of this warning by setting the value
of the 'store-plaintext-passwords' option to either 'yes' or 'no' in
'/home/myself/.subversion/servers'.
-----------------------------------------------------------------------

As usual, stackoverflow.com and superuser.com provided very useful information, and in my specific case (Ubuntu 12.10) it turned out to be really easy since we can use the GNOME keyring more or less out-of-the-box. First step, modify .subversion/config like this:

store-passwords = yes
store-plaintext-passwords = no
password-stores = gnome-keyring

Second step, export the GNOME keyring settings in your shell init script.

export `gnome-keyring-daemon`

Third step, delete the .subversion/config/auth directory to remove any passwords stored in plaintext.

rm -r ~/.subversion/auth

Now, when you try to do an svn update, you should be prompted to login to Subversion and (probably) to your GNOME keyring:

$ svn update
Updating '.':
Authentication realm:  My Subversion server
Password for 'myself': 
Password for 'login' GNOME keyring: 
At revision 1234.

To convince yourself that there is no locally stored password, you can open the files (there is probably only one) in .subversion/auth. The filename is a hash, so you will have to lookup the actual filename in your directory. You should see something like this:

~/.subversion $ cat auth/svn.simple/XXX
K 8
passtype
V 13
gnome-keyring
K 15
...

Note that the passtype is now gnome-keyring.

Tuesday, April 2, 2013

EclipseCon 2013

EclipseCon 2013 took place in Boston at the Seaport World Trade Center. It was the 12th EclipseCon, counting the meetings at JavaOne in 2002 and 2003. It was my fifth EclipseCon, having previously attended EclipseCon 2008, 2010, 2011, 2012.

Like the last few years, EclipseCon was co-located with OSGi DevCon and ALM Connect. OSGi is the underlying application platform on which Eclipse is built. ALM (Application Lifecycle Management) concerns topics around Agile development, distributed source control, social coding, continuous integration.

I got the opportunity to meet lots of people, and the atmosphere at EclipseCon was as usual very welcoming. In the light of the recent debacle at PyCon, I must say that I'm very proud of EclipseCon and the absence of sexist jokes or other offending remarks.

There are lots of photos on Flickr. I think I managed to stay out of the photos this year.


Themes

Xtend Following the trend from last year, there were several talks and tutorials focused on text-based modeling and language frameworks, mainly centered around Xtext and Xtend

M2M The M2M acronym stands for Machine-to-Machine and is about communication between "the internet of things". Unfortunately I managed to miss all the the cool action around the Arduino and Raspberry Pi devices. :(

Eclipse in Space Following the incredibly cool keynote from three years ago where Jeff Norris did a live demo of a mars robot, this year we got to see how NASA uses Eclipse to control experiments on the ISS (more about that talk below).


Keynotes

This year, the keynote slot was split into two 30-minute talks. The first keynote on Monday was by on the topic of Developers are the New Kingmakers (I had to look up what a Kingmaker is). The key point of the keynote was that technology choices was made by the people which had the money (and thus purchased the software), while open source software has shifted that towards the developers making the choice instead, and thereby determining in which direction technology develops.
Stephen O'Grady 

The second keynote was titled Moving Towards ALM 3.0 by Jeffrey Hammond, and almost felt like a continuation of the previous keynote. Modern ALM practises and new technologies and principles are changing how we build software, and developers are more in command today than before. This also means that many of the architectures we've used during the last 20 years are crumbling, which puts pressure on developing new architectures and frameworks which are more adapted to the new way we develop software. (If this sounds a little fuzzy, its because I had a hard time taking useful notes.)

The Wednesday keynotes were How Github Works by Zach Holman, and The Future of Mobile Development by Jeff Seibert. Holman described his view of how open source works, and the importance of doing fun things. Good motivational talk about getting into open source development. Seibert talked about the difficulties in doing agile mobile development under the limitations of modern app stores (publishing times on iTunes, for example).

The last keynote was Nazhorn, Javascript for the JVM by Jim Laskey from Oracle. He described Nazhorn, which is a Javascript implementation for the JVM, which is supposed to be faster and better than Rhino. A little slide and bullet-heavy keynote which was kind of boring to me since I don't have much experience with Javascript. The last keynote about Vert.x - Polyglot Asynchronous Applications for the JVM by Tim Fox had to be cancelled due to illness.


Tutorials

The first tutorial on Monday was about "What every Eclipse developer should know about Eclipse 4". Not having done any real stuff for Eclipse 4, it seemed like (and turned out to be) a good choice. I got to play around with a real Eclipse 4 app, and got some in depth practise in how to integrate into the application model. More about Eclipse 4 later.

The second tutorial was on Xtend, which is a language which compiles to idiomatic Java code. Unlike Scala and Groovy which compile to Java byte code, Xtend will generate Java code which is visible inside your project. This has a number of benefits, especially when it comes to integrating with other Java libraries. Apart from giving an introduction to Xtend, the tutorial also described one of the really cool features in the new Xtend 2.4 release: Active Annotations. They provide a way to control in detail how Xtend generates Java code, and allows the programmer to customize the generated Java code, for example by adding methods.


Talks

The Art of Java Performance Tuning by Ed Merks (the architect behind EMF) was an interesting talk about how to make your Java programs go faster (or become smaller, or both). Ed Merks is always fun to listen to, and his some of his points were "measure, measure, measure" (something I've talked about in another blog post), "precision is not the same as accuracy", "there is no excuse for bad code", and "micro benchmarks are fraught with problems" (and a whole lots of other experiences). He also described a benchmarking harness he wrote to measure small pieces of Java code so that the measurements become useful.

Google Analytics for Eclipse Plugins by Max Rydahl Andersen and Andre Dietisheim from Red Hat was about how to use Google Analytics to collect information about your plugin users. The trick is to fetch a URL from your website when the plugin starts, and encode information about your user that you want to collect in the User-Agent field of the request. For example, the JVM version used is encoded in the field for the flash version, and the Eclipse product being used is encoded as the user agent name. Cool little trick which allows you to know where your users are from, what OS they run, etc.

Eclipse 4 Goes Formal: API You Can Rely On by Eric Moffatt from IBM was another Eclipse 4 talk, but more focused on the details on the API, and how it allows users to do things which was previously impossible. For example, since the application model is just another EMF model, it is easy to programmatically add views, commands, etc. Dependency injection also makes it easy to write code in a more declarative way where dependencies are declared (and later injected).

This Is Not Your Father's CDT by Doug Schaefer and Marc Khouzam was another cover-new-features CDT-talk. It covered the new features which have been added two the CDT over the last two years: Dynamic printf breakpoints (logging breakpoints with printf support), step into selection, expression wild cards and groups, and multi core visualization. Working with the IAR Systems Eclipse/CDT debugger integration, it would be nice to know how easy it is to integrate this support into other debuggers than gdb.

No Stone Unturned - the journey of getting from one year release cycle to continuous delivery by Kai-Uwe Maetzel  was a very interesting talk about how IBM went from one-year release cycles on their Rational Team Concert product to continuous delivery. Much is about development culture, communicating processes, having everyone on board, and looking at every step of the process. 

Shake that FUD: How to migrate Eclipse 3 code to Eclipse 4 was a very enlightening talk by Lars Vogel and Wim Jongman. It was about how to migrate existing 3.x RCP apps to 4.x. This can be done using one of three different approaches: (1) do nothing; use the existing compatibility layers (2) embrace the new programming model and rewrite your code, and (3) mixed mode where you use the compatibility layers, but rewrite code piece by piece. This was pretty straightforward, but I will get back to this talk later.

Are you user-friendly? Improve your designs and delight your users with fast and easy user feedback by April de Vries. A non-tech talk about how to perform usability tests and how to design user surveys. Finished up with a fun demo of usability testing on the EclipseCon website (which didn't go too well).

Building an in-house Eclipse support team by Emilio Palmiero and Pascal Rapicault from Ericsson. They talked about how Ericsson supports their own Eclipse users (which are in the thousands), with a custom support site, custom-built Eclipse installations and update-sites. They used "shared installations" where users can simply select an Eclipse installation and run it from a network drive. Each user can then install plugins locally.

Injection in Eclipse 4: All you need to know about it by Olivier Prouvost. The talk covered many aspects of Dependency Injection (DI) i Eclipse 4, for example how and in which order fields are injected, how the context hierarchy works, and how "re-injection" is used to replace the classical "listener" pattern. It was very nice to get some details around how this works, since it is such a central part in Eclipse 4.

Hallo, Bonjour, 今日は, Hello! Babel tools make internationalization easier described how to use the Babel tools to simplify translating Eclipse plugins to other languages. The tools can help with finding untranslated strings, partially translated strings, etc. The talk also covered the Babel translation server which is able to "crowdsource" plugin translation, and how to setup your own Babel translation server.

A tale about a big SVN to Git migration. A fun talk by Max Rydahl Andersen where he describes the process of migrating the JBoss Subversion repos to Git. Lots of repositories, lots of data, and lots of history. The workflow is to first use "svnsync" to create a Subversion mirror which can be kept up-to-date incrementally. This way you do not need to disturb the live server while testing different ways to convert the repos. He also included some tips about how to proceed after the conversion: "train your team", "learn the cli before using the UI", "ban git push --force".


Now that I got a model, where's my application? by Eike Stepper. Some parts of this talk was stuff I already knew (how to generate an EMF editor, etc.), but the parts on how to create pluggable storage models was interesting. By creating a custom storage implementation, it is possible to control exactly how EMF objects are persisted, without affecting any of the EMF code itself. The talk also covered CDO, which is a database storage layer for EMF, so that EMF object can be stored in a relational database with support for concurrent editing and realtime support. This means that a model in one application instance can be kept automatically in sync with a model in another application.

NASA uses Eclipse RCP applications for experiments on the International Space Station by Tamar Cohen. NASA talks always attract many attendees, and by a planning mistake this had been scheduled in a small room, so it was packed to the last seat and people were sitting on the floor and standing along the walls. Tamar Cohen described a few cases where NASA uses an Eclipse RCP app to implement experiments which are performed on the ISS. Interesting to see how strict NASA software standards are and how the example require all application which run on the ISS (and which communicate with the ISS) to conform to certain rules. For example, any button which causes a command to be sent to or from the ISS has to look a certain way to make it visually clear to the user what the command does.

Bling 3D by Tony McCrary was about a game-designer application based on Eclipse where they run the entire UI on 3D hardware. They also talked a bit about different requirements when writing software for artists and designers. Nice example of how powerful Eclipse 4 is.

In You're not failing fast enoughSarah Goff-Dupont from Atlassian talked about agile best practises, and how to get fast test feedback. Not anything revolutionary, but good to see how other companies are facing the same issues we are.

EMF Community, time for moving to Eclipse 4 thanks to the Extended Editing Framework 2.0 by Goulwen Le Fur. This was my last session for the conference, and I was having definite problems to concentrate. Anyway, this talk was about how EEF can be used to describe a UI model using EMF and automatically bind values to the application model. Useful technology to keep on the radar screen.


CDT Summit

The CDT summit took place on Thursday parallel to the other talks, and I was able to attend parts of it, as some parts  There was several interesting discussions: pros and cons of using Eclipse 4 APIs, possibility of supporting platform version -1, Java 7, web integration in Eclipse, advertising new features, CDT launch improvements, multi-process debug launches.


Conclusion

EclipseCon is absolutely necessary in order to keep up-to-date on whats happening in the Eclipse world. It offers unique possibilities of talking to the people behind the stuff that you work with every day. Something which may take three weeks to figure out by reading the source on your own can be solved in five minutes by talking to the right person at EclipseCon.

Alex Blewitt has also done an excellent write-up of EclipseCon 2013.