Internal DSLs live only in external DSLs
This post is about domain specific languages or DSLs, and their classification based on their implementation techniques. Martin Fowler’s work-in-progress DSL book states two major kinds of DSL types – internal and external DSLs.
Internal DSLs require no parser to be implemented, the language is implemented inside the host language, using the language features available. Examples of internal DSLs are UML profiles and many Ruby based API-s like the Rails and rspec.
External textual DSLs require a meta-model, grammar and an external parser, all specially crafted for the language. The parser parses the usage of the language and constructs a model (an instance of the meta-model) of the concepts described with the language. The model can later be used to generate code or before that, transformed into intermediate models. Graphical external DSLs are usually already stored or serialised as models, so that there is no need for parsers or grammars there.
All of the generic programming languages can also be considered as external textual DSLs. The concept space that the generic programming languages cover is the space of computing, mostly the concepts of Turing machine (reading and storing values, comparisons). In that sense, external DSLs are actually the hosts for internal DSLs. Internal DSLs can be implemented only if the external host DSL is available.
Team size and making changes
There has been many discussions about the most effective team size to have for a software project. The most common understanding among the agilists is that teams should have no more than 7 members. Mostly, it has been said, that the main reasons of choosing that kind of team size are low communication overhead and higher motivation and team morale among the members.
Lately, I have noticed another important reason why the teams should be small. On the current project I’m working on, there are about 10-15 developers. The communication has been a problem, but not anymore – all the team members work in the same room and almost behind the same table. There are problems with the motivation – developing a big complex system becomes easily frustrating, especially in the high formalism government projects, but this isn’t the main problem. The problem is the inability to make a change that affects somehow every part of the system.
Although systems should be designed in a way that separate areas are isolated from each other, some changes that leak outside boundaries and break dependencies system-wide are unfortunately real. It might be a bad design that happens in real world projects or those dependencies are just inevitable for some strange reason. For example, refactoring towards Deeper Insight of the domain model because of the better understanding of the domain, affects all other parts of the domain and the code-base. This is especially true if the change deals with the higher level view of the domain – restructures Bounded Contexts, adjusts Conceptual Contours, distills Generic Subdomains or Core domains or so on.
If the need for this kind of deep refactoring or restructuring appears on the small team, there is not much of a problem, two guys could pairprogram and create a separate branch (or work on their local copy) for the change and merge the result with the main branch afterwards. If the domain-restructuring task can be divided into subtasks, the other pair (or remaining two pairs) could carry some smaller restructuring task or just implement their normal stories or usecases like usual and commit the work to the main branch and let the restructurers to fix the conflicts later. This works nicely when the other pairs (or worse – individual programmers) are not changing the main branch too often.
Things go nasty when there are more than two pairs who need to continue their normal development. The other pairs will create so much updates to the main branch so that when the restructuring branch pair will finish resolving the conflicts of their last update of the main branch, the main branch will be in serious conflicts again after they update their branch. The restructuring pair will never finish the merge because of the changes the other developers are making to the main branch.
There are many ways to solve this situation and each one of those looks bad.
- Make the other developers stop their work until the change-branch has been merged with the main branch. This would not please the project managers. Having more than ten persons doing nothing for a couple of days is mostly unacceptable because of the cost and the project schedule. It might be wise to schedule a bigger change like that to match a holidays or normal vacation times, that naturally shrink the team to smaller size. Unfortunately, those timeframes appear only couple of times in the year and the developers responsible for the change should not take a vacation, that might be frustrating for them. Who would want to fix a monster during the winter solstice holiday?
- Let the restructuring-branch pair to inform the other pairs about the areas that are fragile and ask the pairs who implement new features to be more careful. This might work, if the other developers can avoid touching the system parts that are in change. It also depends on the dependencies(:-) or lack of them. Although, this approach might leave technical/functional debt that needs not to be forgotten and must be addressed later. There is also a danger of not keeping the iteration deadline.
- Commit the conflicted change, break the build and start collective conflict-resolving and unittest-fixing frenzy. This will turn the team to the panic mode and will shift iteration deadline to the future. Also this will brake the sacred green build and there will not be any new releases for a couple of days. That in turn will affect the customer feedback cycle, testing team and all other partys who will depend on the frequent builds.
I prefer the last solution. It improves communication and improves the team morale. In case of emergency the team wakes from their usual lazy mode and will work effectively to end the crisis. Although there is no green build for a couple of days while the team is trying to make it green again, those couple of days without a build are acceptable for all. Of course, there is a way not to break the build, if the developers will do the merging inside the change-branch. Unfortunately, there isn’t much difference because there will not be any new builds anyway because of the locked main branch. Having the red build will make the team to take the merging task more seriously.
To sum it up – working with a big team makes it hard to implement far-reaching changes inside the codebase. The amount of work committed to the main branch will make the merge with the change-branch hard or almost impossible, leaving the change-branch developers into infinite “update local copy”-”fix the conflicts”-”fix the tests” loop. This inability to make a change will rot the system design over time and will make things worse in the future. To avoid this situation it’s wise to keep the teams small.
Time and models
I have been lately working on projects that operate with the domains that have subtle view on time. This means that many concepts modelled in that domain need to have a history or some groups of concepts are constantly evolving over time and other dependant concepts need to reference some older “versions” of the evolving group. This “history” isn’t just some sort of a timestamping, audit-trailing or versioning inside the database, this history is something that belongs to the domain and the domain user is aware of.
This has lead to the idea that time should be a part of the metamodel that is being used for describing the real model. This means, that if the model is being represented using object-oriented techniques, the metamodel of object-orientation should have a dimension of time in it. This would result in some sort of a time-based object-orientation where the state of the objects isn’t determined with the state that it happens to be at the current time. The framework or language or virtual machine that manages the objects should also remember changes in the states of the objects and in their connections. Maybe even history of messages? And that time-dimension could be part of the language.
A good example would be an online shop. All the shops have the product catalogue and the products have their current prices. When the customer submits an order, the underlying software creates new order, adds the orderlineitems that reference to the products inside current product catalogue. I guess it’s probably quite common way the most of the small online shops do their order management. The problem arises, when the product prices are changed and this change also affects the total sum of the past orders, that include those products. Of course, smart designers have a solution – they just add the price-field to the orderlineitem and store purchase price there. Changing the product prices wouldn’t affect the past orders. This works well, but it could be solved differently. The orderline items could still reference the product catalogue, but that reference (or association or relation) should also have a time attribute that helps to remember the state of the product catalogue at the moment of the purchase. If the product prices or specifications are being changed, the orderlineitem still points (thanks to the the stored time) to the version of the product catalogue that was valid at the purchase time. The benefits are that the orderlineitem has more information about the product that was sold at that time and there is less uncontrolled explicit information duplication (no need for extra fields in orderlineitem).
Adding the time-dimension to the infrastructure might be also helpful in the relational databases. It’s common to add a lot of timestamp and time fields to the tables and add some triggers or other mechanisms to store the changes in the table rows. There are many well known patterns for solving many time-based issues (row versioning, logging, record history, backup etc) in relational databases and it would be reasonable to include them into the table management infrastructure. That would help a lot i guess.
Of course this adds some overhead to the underlying frameworks, generate more underlying data and might make things slower, but that probably (like most of the performance issues
) can be solved with more hardware.
To sum it up: having a concept of time embedded inside the language that is being used in defining the software domain model would make the domain model more natural (if the domain models some real-world concepts) and probably introduce some nice possibilities. And of course, still unknown problems too.