Internal DSLs live only in external DSLs
This post is about domain specific languages or DSLs, and their classification based on their implementation techniques. Martin Fowler’s work-in-progress DSL book states two major kinds of DSL types – internal and external DSLs.
Internal DSLs require no parser to be implemented, the language is implemented inside the host language, using the language features available. Examples of internal DSLs are UML profiles and many Ruby based API-s like the Rails and rspec.
External textual DSLs require a meta-model, grammar and an external parser, all specially crafted for the language. The parser parses the usage of the language and constructs a model (an instance of the meta-model) of the concepts described with the language. The model can later be used to generate code or before that, transformed into intermediate models. Graphical external DSLs are usually already stored or serialised as models, so that there is no need for parsers or grammars there.
All of the generic programming languages can also be considered as external textual DSLs. The concept space that the generic programming languages cover is the space of computing, mostly the concepts of Turing machine (reading and storing values, comparisons). In that sense, external DSLs are actually the hosts for internal DSLs. Internal DSLs can be implemented only if the external host DSL is available.
How to detect enterprise software?
Here is the list of quality attributes that an enterprise software must have, so it can be called “enterprise software”. The real enterprise software must be:
- slow – if the software doesn’t do heavy data processing, using all the “industry standard” enterprise technologies (that are also slow, complex and expensive, so they can be “enterprise”), the software can’t be considered enterprise-grade.
- complex – the software must have huge load of features that are accessible through obscure user interface with many data entry fields, weird navigation and huge load of buttons for clicking. The “business processes” are “complex” and must be supported!
- expensive – if it’s cheap, it can’t be serious enough.
So, if you are happening to use (or develop) a slow and complex software that has cost a fortune, then you should know you are dealing with the real enterprise software.
Big teams and “superior leads”
I started writing a response to the Sven’s comment on the previous post and the comment got too big so I’ll post this as a separate post.
The problems of a big team can’t be solved with with “superior leads”. Those large teams need to make their important irreversible decisions in the beginning of the project, before starting work on the larger scale. Smaller teams can make decisions on the same things but their point of reversibility is being pushed further in time. Small teams can re-design and carry out ground-braking changes throughout the code-base without halting the whole team.
Having the need to make the decisions upfront leads to the well documented big-design-upfront antipattern. This means that there will be no information about the emerging important details and the context to support the decisions, there is no good way to respond to changing requirements throughout the project later on and the initial naive view on the problem domain will make the domain model too awkward to work with. And there are probably more problems to follow. Although having experienced “superior leads” “in charge” will lower the risk of making terrible up-front decisions, those leads still can’t see all the problems in advance. Especially if the project itself is large and complicated and requires a lot of manpower to complete.
This leads to one thing – the most effective way to finish a large project is to hire or compile a small number of smart and productive programmers for the team(s), who can make the right decisions (and re-decisions) at the right time. Splitting a project into smaller sub-projects (so that each team could work on their own subcomponent) might help, but this will introduce another extremely irreversible decision of choosing a wise system decomposition. And that, of course, is another long story
Team size and making changes
There has been many discussions about the most effective team size to have for a software project. The most common understanding among the agilists is that teams should have no more than 7 members. Mostly, it has been said, that the main reasons of choosing that kind of team size are low communication overhead and higher motivation and team morale among the members.
Lately, I have noticed another important reason why the teams should be small. On the current project I’m working on, there are about 10-15 developers. The communication has been a problem, but not anymore – all the team members work in the same room and almost behind the same table. There are problems with the motivation – developing a big complex system becomes easily frustrating, especially in the high formalism government projects, but this isn’t the main problem. The problem is the inability to make a change that affects somehow every part of the system.
Although systems should be designed in a way that separate areas are isolated from each other, some changes that leak outside boundaries and break dependencies system-wide are unfortunately real. It might be a bad design that happens in real world projects or those dependencies are just inevitable for some strange reason. For example, refactoring towards Deeper Insight of the domain model because of the better understanding of the domain, affects all other parts of the domain and the code-base. This is especially true if the change deals with the higher level view of the domain – restructures Bounded Contexts, adjusts Conceptual Contours, distills Generic Subdomains or Core domains or so on.
If the need for this kind of deep refactoring or restructuring appears on the small team, there is not much of a problem, two guys could pairprogram and create a separate branch (or work on their local copy) for the change and merge the result with the main branch afterwards. If the domain-restructuring task can be divided into subtasks, the other pair (or remaining two pairs) could carry some smaller restructuring task or just implement their normal stories or usecases like usual and commit the work to the main branch and let the restructurers to fix the conflicts later. This works nicely when the other pairs (or worse – individual programmers) are not changing the main branch too often.
Things go nasty when there are more than two pairs who need to continue their normal development. The other pairs will create so much updates to the main branch so that when the restructuring branch pair will finish resolving the conflicts of their last update of the main branch, the main branch will be in serious conflicts again after they update their branch. The restructuring pair will never finish the merge because of the changes the other developers are making to the main branch.
There are many ways to solve this situation and each one of those looks bad.
- Make the other developers stop their work until the change-branch has been merged with the main branch. This would not please the project managers. Having more than ten persons doing nothing for a couple of days is mostly unacceptable because of the cost and the project schedule. It might be wise to schedule a bigger change like that to match a holidays or normal vacation times, that naturally shrink the team to smaller size. Unfortunately, those timeframes appear only couple of times in the year and the developers responsible for the change should not take a vacation, that might be frustrating for them. Who would want to fix a monster during the winter solstice holiday?
- Let the restructuring-branch pair to inform the other pairs about the areas that are fragile and ask the pairs who implement new features to be more careful. This might work, if the other developers can avoid touching the system parts that are in change. It also depends on the dependencies(:-) or lack of them. Although, this approach might leave technical/functional debt that needs not to be forgotten and must be addressed later. There is also a danger of not keeping the iteration deadline.
- Commit the conflicted change, break the build and start collective conflict-resolving and unittest-fixing frenzy. This will turn the team to the panic mode and will shift iteration deadline to the future. Also this will brake the sacred green build and there will not be any new releases for a couple of days. That in turn will affect the customer feedback cycle, testing team and all other partys who will depend on the frequent builds.
I prefer the last solution. It improves communication and improves the team morale. In case of emergency the team wakes from their usual lazy mode and will work effectively to end the crisis. Although there is no green build for a couple of days while the team is trying to make it green again, those couple of days without a build are acceptable for all. Of course, there is a way not to break the build, if the developers will do the merging inside the change-branch. Unfortunately, there isn’t much difference because there will not be any new builds anyway because of the locked main branch. Having the red build will make the team to take the merging task more seriously.
To sum it up – working with a big team makes it hard to implement far-reaching changes inside the codebase. The amount of work committed to the main branch will make the merge with the change-branch hard or almost impossible, leaving the change-branch developers into infinite “update local copy”-”fix the conflicts”-”fix the tests” loop. This inability to make a change will rot the system design over time and will make things worse in the future. To avoid this situation it’s wise to keep the teams small.
Prank calls from Oracle
More than a week ago I received a phone call. The audio quality was low. The caller had her voice so overamplified that it sounded extremely distorted, just like small cheap chinese radio with volume pumped up to the maximum. Besides that, the caller-lady was a native russian and tried to speak estonian, that didn’t go very well. Although I just couldn’t understand everything the caller wanted to say (because of the distorted pidgin-estonian), it appeared that the caller was from Oracle’s London office and wanted me to become their product reseller. She asked me some questions about my company (size, services, customers) and asked if I want to be contacted by someone who could explain me the offer. Since I was interested and was hoping to speak normal english over a normal-sounding phone-line about the offer, I agreed.
Another lady called from Oracle a week later. This time the the sound quality was the opposite of the first call, but not better. It was so silent. I had to krank my cellphone volume up to eleven and I still couldn’t hear everything. But still, the lady spoke pure english. We actually had some conversation, although I had to ask her to repeat almost everything what she had said. But this conversatioin was short. She somehow thought that I had contacted Oracle with the intention of becoming their reseller and I explained that it was vice-versa, the offer was made from the Oracle and I needed more information about the whole thing. Then she asked some questions about my company size, I answered that our company is a small consulting company and it has somewhat 1-3 full-time working employees and many other friendly companies that can be used to team up with for more serious software development. After that she said good bye and hanged up.
So what had happened? First they call me and want me to be their product reseller, second time they call and just hang up after I explain that I need more information about the whole thing. Might also be that my company is too small to satisfy the Oracle’s reseller requirements. But, since my job is to help my customers to make better software, I also have some influence on choosing database vendors.
Corporate stupidity at its finest?
Time and models
I have been lately working on projects that operate with the domains that have subtle view on time. This means that many concepts modelled in that domain need to have a history or some groups of concepts are constantly evolving over time and other dependant concepts need to reference some older “versions” of the evolving group. This “history” isn’t just some sort of a timestamping, audit-trailing or versioning inside the database, this history is something that belongs to the domain and the domain user is aware of.
This has lead to the idea that time should be a part of the metamodel that is being used for describing the real model. This means, that if the model is being represented using object-oriented techniques, the metamodel of object-orientation should have a dimension of time in it. This would result in some sort of a time-based object-orientation where the state of the objects isn’t determined with the state that it happens to be at the current time. The framework or language or virtual machine that manages the objects should also remember changes in the states of the objects and in their connections. Maybe even history of messages? And that time-dimension could be part of the language.
A good example would be an online shop. All the shops have the product catalogue and the products have their current prices. When the customer submits an order, the underlying software creates new order, adds the orderlineitems that reference to the products inside current product catalogue. I guess it’s probably quite common way the most of the small online shops do their order management. The problem arises, when the product prices are changed and this change also affects the total sum of the past orders, that include those products. Of course, smart designers have a solution – they just add the price-field to the orderlineitem and store purchase price there. Changing the product prices wouldn’t affect the past orders. This works well, but it could be solved differently. The orderline items could still reference the product catalogue, but that reference (or association or relation) should also have a time attribute that helps to remember the state of the product catalogue at the moment of the purchase. If the product prices or specifications are being changed, the orderlineitem still points (thanks to the the stored time) to the version of the product catalogue that was valid at the purchase time. The benefits are that the orderlineitem has more information about the product that was sold at that time and there is less uncontrolled explicit information duplication (no need for extra fields in orderlineitem).
Adding the time-dimension to the infrastructure might be also helpful in the relational databases. It’s common to add a lot of timestamp and time fields to the tables and add some triggers or other mechanisms to store the changes in the table rows. There are many well known patterns for solving many time-based issues (row versioning, logging, record history, backup etc) in relational databases and it would be reasonable to include them into the table management infrastructure. That would help a lot i guess.
Of course this adds some overhead to the underlying frameworks, generate more underlying data and might make things slower, but that probably (like most of the performance issues
) can be solved with more hardware.
To sum it up: having a concept of time embedded inside the language that is being used in defining the software domain model would make the domain model more natural (if the domain models some real-world concepts) and probably introduce some nice possibilities. And of course, still unknown problems too.
Welcome!
Welcome, reader!
I’m Mart Karu, software developer/architect from Tallinn, Estonia and this site will contain articles on the various sub-areas of software development.