datreant downsizing
25 Jun 2018The first commit of what became datreant
occurred over four years ago, and in the course of its development it’s seen many iterations on what it is and how it works.
The library didn’t even start life as datreant
: MDSynthesis was born first, with datreant
becoming a more general library for handling datasets dispersed across a filesystem.
As is often the case with building software, you only really understand what works well and what doesn’t by trying something first.
With so much history, datreant
is finally arriving at a place of stability.
This post is meant to summarize what’s coming, and why.
What will datreant
look like now?
The biggest change came in Issue #145.
A directory is now marked as a Treant if it contains a .datreant
directory.
There are no UUIDs, which were trouble when copying Treants around.
Tags and Categories are stored individually in their own files in this directory, improving performance when either becomes large.
This change also makes the discover
machinery more easily able to find Treants,
improving its performance greatly after this was identified as a bottleneck for larger Treant collections.
datreant
is also no longer a namespace package, and “limbs” have been eliminated as a concept.
The original idea we had when we began was that custom limbs, such as those provided by datreant.data
, could be written and attached to Treants for new functionality.
These could be provided by other packages within the datreant
namespace, but developed and packaged separately so the dependencies of core wouldn’t become bloated.
This functionality, however, was difficult to maintain, and writing new limbs wasn’t actually easy to do.
What’s more, building a CLI interface for Treants was confounded by this complexity.
Packages such as datreant.data
are now deprecated, and datreant.core
is simply known as datreant
.
We believe this change will be a bit painful to users upfront, but long-term allow datreant
to continue to improve in predictable ways.
Apologies in advance
Open-source software is hard to produce, and it can be even harder to maintain.
In an ideal world, we could roll out gradual changes to datreant
in a controlled way, according to a clear project plan.
None of our changes would be a huge disruption to users, and we’d do everything technically possible to ease the transition.
But this project is small, with ~3 developers spending a few extra minutes on it every month.
We don’t have the bandwidth to do everything the way we want to.
With that, we’re releasing datreant
1.0 in the next few days, with a package on PyPI shortly after.
Please uninstall all datreant.*
packages from your environment before installing this, as the reversion from a namespace package will lead to unexpected behavior otherwise.
If you are running existing projects using datreant
and don’t want to be affected by these changes,
we recommend that you pin your dependency to version 0.7.1.
For MDSynthesis users
We are also releasing MDSynthesis 1.0 to coincide with the datreant
1.0 release.
Changes to MDSynthesis include use of the new Treant structure, improving overall performance.
We’ve also directly merged in the datreant.data
functionality, so this will still be available to MDSynthesis users by way of the Sim
object.
Migrating existing Treants
If you have loads of existing Treants in the old format, you can convert them to the 1.0 style with the datreant_07to1.py
script provided with the installation.
If you’ve been using MDSynthesis, and likewise have many Sims lying about, there is a similar script, mds_06to1.py
, which will ship with the 1.0 release of that package.
As for adapting your code to the new changes, please have a look at the CHANGELOG for the list of API changes accompanying this release.
One major item: UUIDs no longer exist, so you’ll need to develop a workaround for any code that relies on these.
We recommend using the absolute path of a Treant as a unique identifier, or otherwise giving your Treants their own UUIDs as a category (key: value)
pair.
We don’t anticipate as many breaking changes in the future. Thanks for bearing with us.
Enjoy
We’ve put literally a couple years’ worth of thought into these changes, and we hope they can make datreant
a better and more sustainable package (and project) into the future.
We value all our users, so please let us know of things that don’t work well, and we’ll get to fixing them.
– @dotsdl