14. Appendix C: Distributed Indexing with centipede

centipede is the LDAP centroid index generation and maintenance program. You can use it to extract centroid or other index information from one LDAP server and install it in another. Although index information can be extracted from any LDAP server, only a slapd LDAP server will understand the information and thus be capable of making use of it as indexing information (i.e., you should only attempt to install index information in a slapd LDAP server). centipede is very experimental at the moment, so use it at your own risk.

Why would you want to do this? If you want to support searches whose scope cannot be easily restricted using the LDAP namespace, centipede can make these searches efficient. For example, what if you are looking for Babs Jensen, but you don't know what company she works for, or even what state she's in. All you know is that she is a US resident. A search of the entire c=US subtree may be what you want to do, but that's potentially very expensive since it involves contacting every server in the US. With centipede, an indexing slapd can use the index information centipede provides to prune the search space of servers, only referring the client to servers likely to have information on Babs. Or, you might want to create a special index area in your LDAP tree that collects centipede information from other servers based on some entirely different criteria not related to the hierarchy of the LDAP namespace.

The general form of a centipede command is as follows.

ETCDIR/centipede [-f filter] [-F] [-R] [-f filter]
[-t directory] [-m authmethod] [-b binddn]
[-p passwd] [-c cachesize]
-s sourceurl
-d desturl
attributes
The options have the following meanings.

-v

Turn on verbose mode. This option can be given multiple times to increase the level of verbosity.
-n
Do not actually install index information. Useful in conjunction with -v for seeing what centipede is up to.
-f ldapfilter
Specify a filter used to select the entries for which to generate indexing information. ldapfilter should be a string LDAP filter as described by RFC 1588.
-F
Generate full, as opposed to relative, index information.
-R
Generate relative, as opposed to full, index information. Full information is still generated if there is no previous information available from which to generate the relative information. This is the default.
-t directory
Specify the directory in which to create temporary files, find existing index information, and put new index information. The default is whatever is used by tempnam(3).
-b binddn
Specify the DN to authenticate with when extracting index information.
-p passwd
Specify the password to use for simple authentication when extracting index information.
-m authmethod
Specify the authentication method to use when extracting index information. authmethod should be either "simple" or "kerberos".
-B binddn
Specify the DN to authenticate with when installing index information.
-P passwd
Specify the password to use for simple authentication when installing index information.
-M authmethod
Specify the authentication method to use when installing index information. authmethod should be either "simple" or "kerberos".
-c cachesize
Specify the size in bytes of the cache used when building the new index information. Upping this number can cause a big performance boost, if you've got the memory for it.

14.1 An Example

Suppose you are running an LDAP server on the host babs.com for an organization called "BabsCo" based in the US, and you want to participate in the c=US indexing scheme described above by generating index information for the cn, sn and objectclass attributes in all the people entries in your subtree. You want to install the index informatioin in the indexing slapd running on the host vertigo.rs.itd.umich.edu under the c=US entry. This way, when an LDAP client connects to the slapd on vertigo and does a subtree search of c=US, slapd can consult the index information to tell whether it should refer the client to your server or not. You could accomplish this with a command like this:
$(ETCDIR)/centipede -f '(objectclass=person)'
-m simple -b <your-rootdn> -p <your-rootdnpw>
-s "ldap://babs.com/o=BabsCo, c=US"
-d "ldap://vertigo.rs.itd.umich.edu/c=US"
cn sn objectclass
Note the -b and -p options can be used to authenticate as an entity able to read all the information you want.

14.2 Limitations

This is all very experimental at the moment, and is subject to change. The scheme is very promising, but lots of stuff needs to be worked out, such as how clients discover indexing servers, how indexing servers discover index sources, how best to maintain the information, etc.

Currently, centipede only handles value-based index information. A future version of centipede will allow other types of index information to be manipulated (e.g., word-based indexes, substring indexes, phonetic indexes, hash indexes, etc.). A future version may also allow weights to be generated for the index values.

Finally, centipede works strictly over LDAP at the moment. If and when the Common Indexing Protocol develops, centipede may change to use CIP instead.


[View Next Section] [View Previous Section] [Return to Table of Contents]

Send comments about this page to: ldap-support@umich.edu