How open is your source?
August 14, 2006
My name is Juan, and I’d like to talk a bit about some of the things I learned while porting BixAgent to FreeBSD and Mac OS. Since I learned so much I’ll break it up into a few posts. This one is about FreeBSD and how the free and open philospohy of its community makes it a great OS to work with.
When most people hear the words “open source operating system”, the first thing that comes to mind is usually Linux. It’s kind of interesting when you think about it, because Linux and most of the GNU stuff that usually comes with it aren’t really what I would call open. The license that the code is released under is typically the GPL. It basically says that if you use the code in some new sofware then your new code must also be released with the same restrictions. The idea behind this is to ensure that no one can compete against an open source project using its own source code against it.
This is all well and good if you consider some class of developers and organizations to be worthy recipients of the benefits of openness and others to be the sworn enemies of freedom. If you want to develop software without making the source code available, you can’t use anything licensed under the GPL. That’s fine, though maybe the name open source is really a misnomer since to these people the source is most definitely closed. If you’re a regular software company that doesn’t want to release the source to your product you shouldn’t even be looking at GPL’d source code that’s similar to your product if you can help it. What’s worse is that if you wanted to release your code under a license that’s less restrictive (i.e. more free) you wouldn’t be allowed! There are actually degrees of openness, and the GPL is pretty low on that scale.
So what does all this have to do with anything? Most of FreeBSD is licensed under the BSD license. It pretty much allows you to do whatever you like with the code. That means there’s an amazing resource available to you if you’re working on low level tools such as BixAgent.
When I first started working on the FreeBSD port of BixAgent, I was totally lost. I was directed to look at the man page for sysctl to get started. That’s probably the most useful bit of real documentation that I found, but it’s far from being a complete reference. I’ll share the details of how to get a lot of good stuff from sysctl in a later post.
So about that source code. Let’s say tou want to find out how to get the free disk space on a partition. The command that is most commonly used to report this information is df. Assuming you installed the source code with the base system, you can find the source for df very easily. First type which df to find the location of the executable. If your installation is typical, you’ll see that when you run df it uses the binary at /bin/df. The directory tree under /usr/src mirrors that of the base system. For example, the source to the tools in /bin such as df can be found in /usr/src/bin. The source for df can be found at /usr/src/bin/df.
Most of these tools have fairly simple source code, so it is often more useful to look at the source for a tool than it is to try and find out what you want by searching the web. Just by scanning through df.c I see a few things that might be useful for further research. The first thing in the file is the license. You should really read through that just to make sure. The one for this file pretty much says I can do whatever I want as long as I reproduce the copyright notice. Sounds good to me. Shortly after that there’s the list of #include’s. Since this is the only source file for the program, every api function needed should be declared in the headers in this list. If you look at the list, besides the usual stuff there’s <sys/sysctl.h> and <sys/mount.h>. Finding out what those headers contain could be big clues. If you continue looking at the source, you’ll notice that they use a function called getmntinfo. The man page for this says that it returns a statfs structure for each mounted file system. The man page for statfs contains the definition for that structure.
The most useful fields in the structure for what we’re looking to do are:
uint64_t f_blocks; /* total data blocks in filesystem */
uint64_t f_bfree; /* free blocks in filesystem */
With these data we can find out the number of free blocks and the total number of blocks, and with simple arithmetic we can figure out the number of used blocks. Great! Almost. This is pretty useful, but you might want that to be in bytes rather than blocks. How many blocks are there per byte? The man page doesn’t say, but df knows the conversion and we know everything df knows. We are very close. Looking at the source again and searching for f_bfree a few times leads us to some code that uses the f_bsize member of the statfs struct as the block size. I’m pretty sure that I read through the entire man page before and didn’t notice anything like that. If I look at the man page again, I find this:
uint64_t f_bsize; /* filesystem fragment size */
Oh. Fragment size. The field is named f_bsize which does imply that it could be the block size, but when I read documentation I tend to expect things to be clear. Now, it’s entirely possible and maybe even likely that a fragment and a block are different things. However, if it’s good enough for df, it’s good enough for me. I’m sure that with some amount of research I could find out, but I think it’s pretty safe to assume that code is correct as a starting point.
I believe that reading code is the best way to learn about programming. Using the code that’s available is a great way to learn how things on your new OS. Plus you have the added benefit of being able to customize the tools that come with your system. If you get good enough at it you might even want to improve the code and give something back. I’d say that this is what real open source is about.
How hard is it to write cross-platform software?
February 14, 2006
- Not that hard
- It depends
- Very hard
I would have picked ‘Not that hard’ two years ago when I started this project. However, that quickly skipped over ‘It depends’ and went straight to ‘Very hard’.
These days everything is hard.
Each project has a start and numerous decisions to be made. Knowing I wanted to build distributed software that runs on more than one machine and on Linux and Windows, I thought I basically had three choices:
- C++
- Java
- Scripting language, Perl or Python
And this is where the hard part comes in. You start to realize that these days there are many options available and it is quite a challenge to make good decisions up front. It takes a lot of effort to explore all your options.
It is not as simple as choosing C++ or Java or a scripting language, or even a mix of the three. Each language / platform has their pros and cons that can make or break your schedule. This one choice can quickly balloon into many choices, and each decision affects many aspects of the software.
The basic design for BixData called for an agent for each machine, a central server component and some user interface. Each of these components has to communicate with each other and integration has to be as simple as possible. The software needs to run on Linux and Windows. It has to be stable with a small footprint and very little system impact.
This is where my blog post got out of hand; initially I wanted to go into detail about the choices for inter application communication, user interface and data exchange. But I won’t underestimate the fact that most programmers are painfully aware of all the issues.
So instead I decided to list my reasons for how I think cross platform development could be easier.
Writing cross-platform software would have been much easier if…
- C++ had better memory management and a better standard library
You can’t possibly write a useful application without using XML, HTTP or some form of graphics format such as jpeg or png. Scripting languages have already had these features for a long time.
There are a few useful libraries that provide XML, HTTP, etc. but if you want them to actually work on Linux and Windows (not just pretend makefiles), the list gets short very fast.
I was also surprised at how unnecessarily complicated and bloated some of these libraries are. And its quite a challenge to get some of the libraries to compile and run inside the same applications.
If you are wondering what the short list is for extending the functionality of C++, after many hours of testing and use, try
- libxml, http://xmlsoft.org/
- libcurl, http://curl.haxx.se/
- OpenSSL, http://www.openssl.org/
- Boost, http://www.boost.org/
- GD Graphics Library, http://www.boutell.com/gd/
Writing cross-platform software would have been much easier if…
- Java JFC/Swing was more elegant, less bloated and ran faster
It took me some time at IBM to realize the benefits of programming in Java. You can really develop stable production quality large scale applications that are also much easier to maintain than C++ and less error prone during runtime. Java is quite a bit slower than C++, but better and faster hardware can make up for that at a relatively low cost compared to development time and effort. Better hardware can not make up for bug ridden C++ code.
Java also does not truly run on all platforms. Yes there is a Java runtime for Linux and Windows, but the Linux version for example is not as stable as the Sun version. Also about 2 years ago, the Mac OS X version of Java runtime was far behind the Linux version.
The deal breaker though is writing an interface in Java. I think that interfaces written in JFC/Swing are unresponsive and unattractive. Java would have had much better adoption if its user interface library was as attractive and responsive as the Windows 2000 user interface or KDE interface. Also, unlike scripting languages, Java does not have good hooks into user interface libraries.
Writing cross-platform software would have been much easier if…
- the Mono project was more mature and a viable choice
C# and the .NET system library (or framework) has all the features that a large scale application requires, provides a similar stable coding infrastructure as Java, and has a user interface that is responsive and attractive. But it only runs on Windows.
I’m still intrigued as to how long the Mono project is going to last. The breadth of functionality that is in the .NET framework will be very hard to replicate in Mono. Much harder than say Wine was to Win32 API. If the Mono project was more mature and feature complete, it would be a very viable choice.
Writing cross-platform software would have been much easier if…
- Scripting languages could be compiled into binaries
Scripting languages such as Perl and Python have system libraries that are very feature complete, offer some platform independency and have hooks into viable user interface libraries such as GTK and wxWindows (wxWindgets).
Although scripting languages don’t scale very well to implement large scale server applications, this is not as big a deficiency as the fact that you can not compile scripting languages. When you distribute your application you have to consider whether you want to open source your code, or just have it open, and whether you want that code to be in such a format that it could easily be modified or corrupted. If a binary is corrupted it simply won’t run, but with hundreds of script files the risk of your application corrupting data or exhibiting weird behavior is much higher.
Cross-platform development is hard
I tried to highlight some of the issues regarding the initial direction to take when starting a cross-platform project.
There are many other issues that range from development tools and code editors, to platform specific differences in important areas such as threading and memory management.
We will go into more detail in future posts on these topics.