Apache CouchDB Interview Questions and Answers

Apache CouchDB is open source database software that focuses on ease of use and having a scalable architecture. It has a document-oriented NoSQL database architecture and is implemented in the concurrency-oriented language Erlang; it uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.


1)What Language Is Couchdb Written In ?


Answer)Erlang, a concurrent, functional programming language with an emphasis on fault tolerance.

Early work on CouchDB was started in C++ but was replaced by Erlang OTP platform. Erlang has so far proven an excellent match for this project.

CouchDB’s default view server uses Mozilla’s Spidermonkey JavaScript library which is written in C. It also supports easy integration of view servers written in any language


2)Why Does Couchdb Not Use Mnesia?


Answer)The first is a storage limitation of 2 Giga bytes per file.

The second is that it requires a validation and fix up cycle after a crash or power failure, so even if the size limitation is lifted, the fix up time on large files is prohibitive.

Mnesia replication is suitable for clustering, but not disconnected, distributed edits. Most of the cool features of Mnesia aren’t really useful for CouchDB.

Also Mnesia isn’t really a general-purpose, large scale database. It works best as a configuration type database, the type where the data isn’t central to the function of the application, but is necessary for the normal operation of it. Think things like network routers, HTTP proxies and LDAP directories, things that need to be updated, configured and reconfigured often, but that configuration data is rarely very large.


3)How Do I Use Transactions With Couchdb?


Answer)CouchDB uses an Optimistic concurrency model. In the simplest terms, this just means that you send a document version along with your update, and CouchDB rejects the change if the current document version doesn’t match what you’ve sent.

You can re-frame many normal transaction based scenarios for CouchDB. You do need to sort of throw out your RDBMS domain knowledge when learning CouchDB, though.

It’s helpful to approach problems from a higher level, rather than attempting to mold Couch to a SQL based world.


4)How Do You Compare Mongodb, Couchdb And Couchbase?


Answer)MongoDB and CouchDB are document oriented database.

MongoDB and CouchDB are the most typical representative of the open source NoSQL database.

They have nothing in common other than are stored in the document outside.

MongoDB and CouchDB, the data model interface, object storage and replication methods have many different.


5)How Is Pouchdb Different From Couchdb?


Answer)PouchDB is also a CouchDB client, and you should be able to switch between a local database or an online CouchDB instance without changing any of your application’s code.

However, there are some minor differences to note:

View Collation – CouchDB uses ICU to order keys in a view query; in PouchDB they are ASCII ordered.

View Offset – CouchDB returns an offset property in the view results. In PouchDB, offset just mirrors the skip parameter rather than returning a true offset.


6)So Is Couchdb Now Going To Written In Java?


Answer)Erlang is a great fit for CouchDB and I have absolutely no plans to move the project off its Erlang base. IBM/Apache’s only concerns are we remove license incompatible 3rd party source code bundled with the project, a fundamental requirement for any Apache project. So some things may have to replaced in the source code (possibly Mozilla Spidermonkey), but the core Erlang code stays.

An important goal is to keep interfaces in CouchDB simple enough that creating compatible implementations on other platforms is feasible. CouchDB has already inspired the database projects RDDB and Basura. Like SQL databases, I think CouchDB needs competition and a ecosystem to be viable long term. So Java or C++ versions might be created and I would be delighted to see them, but it likely won’t be me who does it.


7)What Does Ibm’s Involvement Mean For Couchdb And The Community?


Answer)The main consequences of IBM’s involvement are:

The code is now being Apache licensed, instead of GPL.

Damien is going to be contributing much more time


8)Mention The Main Features Of Couchdb?


Answer)JSON Documents – Everything stored in CouchDB boils down to a JSON document.

RESTful Interface – From creation to replication to data insertion, every management and data task in CouchDB can be done via HTTP.

N-Master Replication – You can make use of an unlimited amount of ‘masters’, making for some very interesting replication topologies.

Built for Offline – CouchDB can replicate to devices (like Android phones) that can go offline and handle data sync for you when the device is back online.

Replication Filters – You can filter precisely the data you wish to replicate to different nodes.


9)What Is The Use Of Couchdb?


Answer)CouchDB allows you to write a client side application that talks directly to the Couch without the need for a server side middle layer, significantly reducing development time. With CouchDB, you can easily handle demand by adding more replication nodes with ease. CouchDB allows you to replicate the database to your client and with filters you could even replicate that specific user’s data.

Having the database stored locally means your client side application can run with almost no latency. CouchDB will handle the replication to the cloud for you. Your users could access their invoices on their mobile phone and make changes with no noticeable latency, all whilst being offline. When a connection is present and usable, CouchDB will automatically replicate those changes to your cloud CouchDB.

CouchDB is a database designed to run on the internet of today for today’s desktop-like applications and the connected devices through which we access the internet


10)How Much Stuff Can Be Stored In Couchdb?


Answer)For node partitioning, basically unlimited. The practical scaling limits for a single database instance, are not yet known.


11)What Is Couchdb Kit?


Answer)The Couchdb Kit is used to provide a structure for your Python applications to manage and access Couchdb. This kit provides full featured and easy client to manage and access Couchdb. It helps you to maintain databases, to view access, Couchdb server and doc managements. Mostly python objects are reflected by the objects for convenience. The Database and server objects are used easily as using a dict.


12)Can Views Update Documents Or Databases?


Answer)No. Views are always read­only to databases and their documents.


13)Where Are The Couchdb Logfiles Located?


Answer)For a default linux/unix installation the logfiles are located here:

/usr/local/var/log/couchdb/couch.log

This is set in the default.ini file located here:

/etc/couchdb/default.ini

If you've installed from source and are running couchdb in dev mode the logfiles are located here:

YOUR­COUCHDB­SOURCE­DIRECTORY/tmp/log/couch.log


14)What Does Couch Mean?


Answer)It's an acronym, Cluster Of Unreliable Commodity Hardware. This is a statement of Couch's long term goals of massive scalability and high reliability on fault prone hardware. The distributed nature and flat address space of the database will enable node partitioning for storage scalability (with a map/reduce style query facility) and clustering for reliability and fault tolerance.


15)Is Couchdb Ready For Production?


Answer)Yes. There are many companies using CouchDB.


16)What Platforms Are Supported?


Answer)Most POSIX systems, this includes GNU/Linux and OS X.

Windows is not officially supported but it should work


17)How Do I Do Sequences?


Answer)With replication sequences are hard to realize. Sequences are often used to ensure unique identifiers for each row in a database table. CouchDB generates unique ids from its own and you can specify your own as well, so you don't really need a sequence here. If you use a sequence for something else, you might find a way to express in CouchDB in another way.


18)How Do I Use Replication?


Answer)POST /_replicate with a post body of

{"source":"$source_database"

,

"target":"$target_database"}

Where $source_database and $target_database can be the names of local database or full URIs of remote databases. Both databases need to be created before they can be replicated from or to.


19)How Do I Review Conflicts Occurred During Replication?


Answer)Use a view like this:

map: function(doc) {if(doc._conflicts){emit(null,null);}}


20)How Can I Spread Load Across Multiple Nodes?


Answer)Using an http proxy like nginx, you can load balance GETs across nodes, and direct all POSTs, PUTs and DELETEs to a master node. CouchDB's triggered replication facility can keep multiple read-only servers in sync with a single master server, so by replicating from master ­> slaves on a regular basis, you can keep your content up to date.


21)Can I Talk To Couchdb Without Going Through The Http Api?


Answer)CouchDB's data model and internal API map the REST/HTTP model so well that any other API would basically reinvent some flavor of HTTP. However, there is a plan to refractor CouchDB's internals so as to provide a documented Erlang API.


22)Erlang Has Been Slow To Adopt Unicode. Is Unicode Or Utf­8 A Problem With Couchdb?


Answer)CouchDB uses Erlang binaries internally. All data coming to CouchDB must be UTF­8 encoded.


23)How Fast Are Couchdb Views?


Answer)It would be quite hard to give out any numbers that make much sense. From the architecture point of view, a view on a table is much like a (multi­column) index on a table in an RDBMS that just performs a quick look­up. So this theoretically should be pretty quick. The major advantage of the architecture is, however, that it is designed for high traffic. No locking occurs in the storage module (MVCC and all that) allowing any number of parallel readers as well as serialized writes. With replication, you can even set up multiple machines for a horizontal scale­out and data partitioning (in the future) will let you cope with huge volumes of data


24)Is it possible to communicate to CouchDB without going through HTTP/ API?


Answer)CouchDB's data model and internal API map the REST/HTTP model in a very simple way that any other API would basically inherit some features of HTTP. However, there is a plan to refractor CouchDB's internals so as to provide a documented Erlang API.


25)What Platforms are Supported?


Answer)Most POSIX systems,this includes GNU/Linux and OS X.

Windows is not officially supported but it should work.


26)My database will require an unbounded number of deletes, what can I do?


Answer)If there's a strong correlation between time (or some other regular monotonically increasing event) and document deletion, a DB setup can be used like the following:

Assume that the past 30 days of logs are needed, anything older can be deleted.

Set up DB logs_2011_08.

Replicate logs_2011_08 to logs_2011_09, filtered on logs from 2011_08 only.

During August, read/write to logs_2011_08.

When September starts, create logs_2011_10.

Replicate logs_2011_09 to logs_2011_10, filtered on logs from 2011_09 only.

During September, read/write to logs_2011_09.

Logs from August will be present in logs_2011_09 due to the replication, but not in logs_2011_10.

The entire logs_2011_08 DB can be removed.


27)How do I backup CouchDB? What data recovery strategies exist?


Answer)While CouchDB is a very reliable database, a careful engineer will always ask "What happens when something goes wrong?". Let's say your server has an unrecoverable crash and you lose all data... or maybe a hacker finds your top secret credentials and deletes your data... or maybe an undiscovered bug causes data corruption after an event... or maybe there is a logic error in your application code that accesses your database. Ideally we try to avoid these situations by preparing for the worst and hoping they never occur, but bad things do happen and we should be ready to react when they do. There are a few traditional data backup strategies for CouchDB: Replication Database file backup Filesystem snapshots Replication Based Backup CouchDB is well known for its push and pull replication functionality. Any CouchDB database can replicate to any other if it has HTTP access and the proper credentials. Database File Backup Under the hood, CouchDB stores databases and indexes as files in the underlying filesystem. Using a common command line back up tool, like rsync, we can perform incremental backups triggered by cron. Filesystem/VM Snapshots Most VM's and newer filesystems have snapshot capabilities to allow roll backs to preserve data.


28)How Do I Configure SSL (HTTPS) in CouchDB?


Answer)Secure Socket Layer (SSL) is used in conjunction with HTTP to secure web traffic. The resulting protocol is known as HTTPS. In order to utilize SSL, you must generate a key and cert. Additionally, if you want your web traffic to be safely accepted by most web browsers, you will need the cert to be signed by a CA (Certificate Authority). Otherwise, if you bypass the CA, you have the option of self signing your certificate. Production Security Apache CouchDB leverages Erlang/OTP's SSL, which is usually linked against a system-provided OpenSSL installation. The security, performance & compatibility with other browsers and operating systems therefore varies heavily depending on how the underlying OpenSSL library was set up. It is strongly recommended that for production deployments, a dedicated well-known SSL/TLS terminator is used instead. There is nothing fundamentally wrong with Erlang's crypto libraries, however a dedicated TLS application is generally a better choice, and allows tuning and configuring your TLS settings directly rather than relying on whatever Erlang/OTP release is provided by your operating system. Key & CSR Procedure using OpenSSL OpenSSL is an open source SSL utility and library. It comes standard with many UNIX/LINUX distributions. We will use OpenSSL to generate our private key and generate our certificate signing request (CSR).


28)What are the consequences of having a high ratio of 'deleted' to 'active' documents?


Answer)Every document that is deleted is replaced with small amount of metadata called a tombstone which is used for conflict resolution during replication (a tombstone is also created for each document that is in a batch delete operation). Although tombstone documents contain only a small amount of metadata, having lots of tombstone documents will have an impact on the size of used storage. Tombstone documents still show up in _changes so require processing for replication and when building views. Compaction time is proportional to the ratio of deleted documents to the total document count.


29)Deleted documents have an overhead in CouchDB because a tombstone document exists for each deleted document. One consequence of tombstone documents is that compaction gets slower over time. Three options for purging tombstone documents from a CouchDB are: Create a new database for every N time period (and delete that database when the period expires) Filtered replication Do nothing How can I choose which option is the most suitable?


Answer)Each approach is described below. Note that you may need to use a combination of both approaches in your application. Alternatively, you may find through testing that your tombstone documents don't add significant overhead and can just be left as is. Create a new database for every N time period When to use this approach? This approach works best when you know the expiry date of a document at the time when the document is first saved. How does it work? Each document to be saved that has a known expiry date will be stored in a database that will get dropped when its expiry date has been reached. When the document is being saved, if the database doesn't already exist then a new database must be created. The rationale of this approach is that dropping a database is an in-expensive operation and does not leave tombstone documents on disk. Gotchas It is not possible to query across database in Cloudant/CouchDB. Cross database queries will need to be performed in the application itself. This will be an issue if the cross database queries require aggregating lots of data. Filtered replication When to use it This approach works best when you don't know the expiry date of a document at the time when the document is first saved, or if you would have to perform cross database queries that would involve moving lots of data to the application so that it can be aggregated. How does it works? This approach relies on creating a new database at an opportune time (NOTE 1) and by replicating all documents to it except for the tombstone documents. A validate_doc_update (VDU) function is used so that deleted documents with no existing entry in the target database are rejected. When replication is complete (or acceptably up-to-date if using continuous replication), switch your application to use the new database and delete the old one. There is currently no way to rename databases but you could use a virtual host which points to the "current" database. An example of such a VDU function is below function (newDoc, oldDoc, userCtx) { // any update to an existing doc is OK if(oldDoc) { return; } // reject tombstones for docs we don't know about if(newDoc["_deleted"]) { throw({forbidden : "We're rejecting tombstones for unknown docs"}) } }


30)My filtered replication takes forever. Why is my filtered replication so slow?


Answer)Filtered replications works slow because for each fetched document runs complex logic to decision: to replicate it or not.





Launch your GraphyLaunch your Graphy
100K+ creators trust Graphy to teach online
Learn Bigdata, Spark & Machine Learning | SmartDataCamp 2024 Privacy policy Terms of use Contact us Refund policy