When creating a new system that works with an established project, an important consideration is whether to make a new project or a subproject of the established
project. The considerations include:

  • How fast is the release cycle for the established project?
  • How tight is the integration with the established project?
  • Will the excitement of a new project bring more visibility?
  • Is the community separate from the established project?
  • Is the new project large enough to be self-sustaining?
  • Will tighter integration with the established project be helpful or cause difficulty for integrating with external projects.

We'll cover some of the cases that we made this decision and how they turned out:

  • Hadoop and RecordIO
  • Hadoop and Avro
  • Hive and Tez
  • Hive and ORC

After making the decision, as the projects evolve the tradeoffs may change and
need to be revisited. Unfortunately we also have experience and have
pulled apart ORC from Hive and now the metastore from Hive. We'll cover the
technical challenges and the governance challenges of splitting an existing project
into two projects.