Mapping between a Darwino DB and a relational database

A Darwino database is mapped to a set of relational tables. These tables store all of the documents for all of the stores in all of the instances of the database.

To optimize the performance of the database, one would add indexes depending on the nature of the queries that are being performed. There is an art to this optimization; beside application-specific factors, there may be considerations related to the underlying database engine… Postgres, DB2, and MySQL could have different optimizations.

There is one set of relational tables per Darwino database. If the names of the tables depend on the database name, the definition of the tables is static for a given version of Darwino. This allows the tables to be created once by a DBA, and then used as-is, even when the application evolves.

For performance reasons, indexes on columns can be added. As the platform doesn't know most of the queries that will be executed by the application, it predefines a minimal set of indexes to speed up the generic access to the database (get a document by ID, index by key, synchronization...). But it is up to the application developer to track the requests being emitted by the database and then add additional indexes as required.

On the database systems that support native JSON access, JSON access indexes can also be added. Please refer to your database system documentation for best practices.

Darwino application tables

The prefix of Darwino's table names is the name of the Darwino database. This restricts us to names that are compatible with the rules of the relational database system. The suffix is always an underscore followed by three characters. Let’s take a look:

  • _dsg: The Design table. This includes the lists of fields, stores, indexes, etc… There is at least one record in this table, with the value “DATABASE” in its type column. The name column is empty, and the json column contains the definition of the database. When you initialize a Darwino database, it will create this set of tables and store the database definition in that single record in the _dsg table.

    In the database definition is a field called “version”. When you initialize the database, the version will be “1”. Every time the database definition changes, increment the number. This is important in Darwino because the application is disconnected from the database, so it is possible to have a database design that is not at the level expected by the application. When your code is opening the database, it will open this record from the _dsg table, extract the version value, and do a compare.

    If it’s a match, it will open the database, return a handle, and off you go.

    In another case, the application may be expecting to work with a higher version of the database. When opening, you specify how to update. One choice is to upgrade the database by running a function that you provide to give the new database definition.

    If, on the other hand, the application is expecting a lower level of the database, it will fail. The application will not be able to update the database because it won’t know how.

    In the DatabaseDef class, the loadDatabase method does the job of checking the version number and returning either null or a handle to the desired database (referenced by its name).

    If the runtime itself has been updated, the “tableVersion” in the database definition comes into play. It is not managed by the application; it is managed by the runtime. If Darwino has been upgraded and needs to upgrade the design of its default tables, it will do so transparently to the user, as long as the RDBMS user has the rights to run DDL statements.

    There is also versioning associated with the DatabaseCustomizer, which is where the developer, using a set of DDL statements, defines additional indexes, stored procedures, and triggers, typically for the purpose of optimizing the performance of the application. See Optimizing the database for details.

  • _doc: The document table. There is one row per document. This row contains the JSON value as well as some metadata.

    • docid – autogenerated key value. This is dependent on the database and on the instance in the database.
    • instanceid – identifies the instance within the database that “contains” this document
    • storeid – along with the docid and instanceid, define the unique primary key of a document.
    • unid – The document's unique identifier
    • repid – ID of the server where this document was last created/modifed. This helps tracking where the doc comes from, and optimizing replication by not sending a document back to where it changed.
    • pstoreid – a pointer to the store of the parent document
    • parent – a pointer to the parent document
    • smstoreid – pointer to the syncmaster store
    • smunid – pointer to the syncmaster document
    • seqid – sequence number used internally in replication
    • updid – internal replication version ID
    • udate – the last replication time of the document. This is updated automatically
    • sftdel – soft delete flag, not currently used
    • cdate – creation date of the document
    • cuser – the user that created the document
    • mdate – the date of last modification
    • muser – the user that last modified the document
    • rsec – used internally to support reader fields
    • rsed – used internally to support writer fields
    • json – the JSON data of the document
    • sig – a signature for the document, not used
    • changes – used internally to support replication
    • cdatets – an easily readable and queryable copy of the creation date in timestamp format
    • mdatets – an easily readable and queryable copy of the modification date in timestamp format

      Note: the date fields (udate, cdate, and mdate) are stored as integers. They are the Java date converted to a long. They represent the number of milliseconds since the 1/1/70. This is to accommodate the precision required for replication, and the requirement that the dates be completely compatible with all possible relational database systems.

  • _bin table: Used to store binary data associated with documents, but outside of the documents. Here data is stored with a computed key based on the hash of the file’s contents, and can be shared between multiple documents pointing to the same bin record.

  • _dov table: This stores the list of fields extracted from documents. The extracted data is stored in one of four columns, one for each possible data type; they are named ftxt, fnum, fbol, and fdat.

  • _idx: This is where indexes are stored. There are entries for each document, and for each entry there are stored keys and values.

  • _idv: Like the _dov table, but for the index level, because we can store fields at the index level.

  • _lck: This is for document locking; not currently used.

  • _rep: Stores the replication information. It stores the last replication date for one replication profile. The last replication date for each replication profile is stored in the target of a replication. When pushing replication changes, the first step is to ask the target for the last replication time. The target checks this table and returns the value. The source then composes the list of changes and sends that. This is because you want to base the replication on the target’s clock.

  • _sec: Stores the reader and writer information. For every document, it stores the entry name and whether it’s read-only or read/write.

  • _sed: Like the security table, but for ereaders and ewriters.

  • _stu: This is the deletion stub table, used during replication to convey that a document has been deleted.

  • _tag: Stores the social data tags. It is indexed by the docid, and there is one row for every tag.

  • _usr: The user-related social data, such as the rating and sharing information, as well as whether the document has been read and when it was last read. It also stores the replication time information for this data.

Database definition class

This class defines the JSON database, including the stores, the fields being extracted, the indexes, the security, and any other database options.

  • setACL() is used to set the access levels of people, groups, and roles (who can read, edit, create documents, etc…). These access rights can be resolved dynamically. In particular, they can be resolved for an Instance.
  • setDocumentSecurity(int documentSecurity) determines how readers and writers fields will be handled. Choices include no reader/writer security, reader only, writer only, etc...
  • setInstanceEnabled(boolean instanceEnabled) – are Instances allowed or not.
  • setPreventRestAccess(Boolean preventRestAccess) – Darwino provides a set of REST services so that data can be read and written via REST services. Disabling REST services prevents raw REST access to document data, ensuring that all access be through the appropriate business logic. If this is disabled, then, even if REST services are deployed, Darwino will deny REST access to the data.
  • setReplicaID - internal
  • setReplicationEnabled(boolean replicationEnabled) – If replication is enabled, Darwino records more data to support replication, including deletion stubs. This overhead can be avoided by disabling replication.
  • setSoftDeleteEnabled – not implemented
  • setTableVersion – internal
  • setTimeZone – Darwino stores dates in ISO 8601 format, by default using GMT as the time zone. Setting this value will override that default. Dates will be stored instead using the specified time zone. There is never a loss of certainty; Darwino always stores the values with a time zone; this merely determines which zone is used as the default.

Stores

Physically, a store is nothing; it is just a concept. It is actually a logical collection of documents in a database. There is not one table per store; instead stores are implemented as a column value in each document. Every document in a database is identified by its UNID and its storeID; together, these two fields define the document’s key. This way of implementing stores limits the actions needed to maintain the database’s DDL, and it allows cross-store queries in the same database

Stores have several options:

  • setAnonymousSocial(boolean anonymousSocial) – Enabling this allows tracking of the social activities of the anonymous user; for example, tracking when anonymous reads a document. This defaults to false, since there are few cases where it would be desired.
  • setFTSearch(_FtSearch ftSearch) – Specifies what data should be extracted from the JSON to a table so it can be fulltext-indexed. If, for example, you might specify "$" if you want to index the entire document, or just the JSON path for the field "Title" if you want that field to be the only indexed data.

  • setFtSearchEnabled(boolean fulltextEnabled) – Enables full text search of the documents in the store. Darwino uses the full text search engine provided by the host database. This results in maximum performance and low overhead. It is possible to test at run time, using the Store-level method isFtSearchEnabled(), whether the database supports full text search, so the UI can be adjusted accordingly. None of the databases know, now, how to do full text search on the JSON documents. The _fts table contains the names and values of the fields that you wish to fulltext index.

  • setLabel(String label) - User-friendly label displayed to the user.

  • setPreventRestAccess(Boolean preventRestAccess) – If REST access is enabled at the database level, it can be prevented at the store level.

  • setReadMarkEnabled(Boolean readMarkEnabled) – This applies to the social data read marks, and is set to false by default. If enabled, when a document is read by a user who is not anonymous (unless setAnonymousSocial is enabled), that document is marked as read by that user. This option exists so that the write operation required at every read to support read makrs can be avoided.

  • setTaggingEnabled(boolean taggingEnabled) – Enabling this allows Darwino to maintain an array of tags for each document. You can search documents by a tag or a combination of tags. There is also a well-optimized function at the store level that returns a tag cloud.

  • setFields() has two forms. The simple form takes the name of a field as its parameter and it indexes that field. The other form takes an array of FieldNodes.

  • addQueryField() has five forms. The first takes one parameter, that being that name of the field. It will use that value both as the field name and as the path to the data. The next takes three parameters: the field name, the data type, and a Boolean determining whether the field is multiple. The third adds a specification of the path in the JSON. The fourth form takes a single parameter, this being a callback fieldFunction, which itself has several parameters: the field name, the data type, the multi Boolean, the name of a registered callback function and a JSON path to the data in the JSON document which acts as the parameter to the referenced callback function. The fifth form is like form #4, but uses a Darwino query language statement in place of the function name and parameter.

    By allowing a callback function or query result, the function allows sophisticated processing to be performed when creating the field value, which can then be used in a query.

Indexes

In Darwino, an index is the MAP action in MAP/REDUCE. It allows fast access to data, as well as pre-computing of some data (ex: number of children, social data...) and then querying these data. It associates a key with a value for a selected set of documents. The value can be computed from the actual JSON document.

The store.addIndex() method creates an index based on a subset of the data in the JSON documents. Once you add the index, you define the keys and the values to extract from the JSON document.

When you execute a query on the index using the Darwino API, you can choose to return either the values in the index, or the JSON value in the document itself.

For example, index.keys(“_unid”) will set the unid as the key. index.valuesExtract(“\”$\””) will specify the root of the JSON value (the entire document) as the value to extract. This is using JSON Path expressions.

When specifying the keys, you can specify whether the keys are unique or not. This is done by calling the setUniqueKey(Boolean uniqueKey) method.

results matching ""

    No results matching ""