The University of Michigan's
Online Directory Service Strategy Report:
A Working Draft

By Chuck Lever
cel@umich.edu

Last updated on Sat Mar 21 11:44:36 EST 1998

Abstract

We document challenges faced by the University of Michigan's Online Directory Service, enumerate a set of solutions that the Directory Team plans to take to address many of these issues, and explore the risks associated with our suggested solutions.

Special note: This report is designed to be printed, or viewed with a web browser.


Contents

  1. Executive Summary

  2. Introduction
    1. Service Description
    2. Data Delivery Issues
    3. Data Quality Issues
    4. Cost Issues

  3. Tactical Directions
    1. Reinforce current service
    2. Improve Data Quality
    3. Migrate to CGI-based LDAP client
    4. Select and adopt a vendor-supported standard server
    5. Deploy new directory-related services

  4. Server Software Evaluation Criteria

  5. Deliverables Timeline

The contents of this report are Copyright © 1998 by The Regents of the University of Michigan. All rights reserved.
Any trademarks or service marks appearing in this report are the property of their respective owners.


Executive Summary

The University of Michigan's X.500 directory service, as it was originally known, was not a "designed" service, because it was created in the earliest days of online directories when not much was known about how directory services could be used. As well, directory services were born before the advent of the World Wide Web. Some of their function has been subsumed by the Web's. We now find ourselves in the position of having to reverse engineer our directory service, because not very much is known or was recorded about the purposes or mission of the directory service. This process will require some effort as we formally analyze the directory data model and applications which use directory data, such as Targeted E-mail. Because the directory service has been around for a while, we also need to determine what is most valued about this service so we can maintain those parts of it while trimming away that which is not as useful.

The current service is a quagmire, admittedly, of unknown dependencies, ancient software that no longer builds, inappropriate operational procedures, and damaged reputation. Our task is to take this chaos and focus it into a relevant, reliable, and scalable service that meets and exceeds the expectations of our customers in steps that clearly demonstrate progress and improvement. Improving the quality of an existing service upon which so many depend requires careful study of the existing implementation, and careful management of changes.

Our eventual goal is to run the directory service on a standards-based, vendor-provided solution that requires much less programming or documentation support from ITD staff. While ITD was the birthplace for much of this technology, it is no longer capable of sustaining a serious LDAP software development effort. Our strategy, therefore, is to choose a product that will scale to support our lifetime customers, that integrates with our chosen security infrastructure, and that requires less support effort from us. We know that our security integration requirements often fly in the face of using unaltered off-the-shelf software. Under this software base, we want to engineer a secure, high-performance, high-reliability hardware foundation and data store. And using this software base, we want to provide accurate and relevant data to enable our customers to build new applications and trust that our service will be there to support them.

In the last four months, the Directory Services team has spent considerable effort attempting to reconstruct a vision of how this service should work. This effort has included:

All of these things have helped us to improve substantially the base of technology and information from where we start today. From here, we plan to strengthen further the current server deployment, develop new services and improve data quality and diversity, and within the next year, replace the current service with a high-performance, standards-compliant, vendor-provided solution.


Introduction

Service Description

The University of Michigan's Online Directory Service provides a computerized directory of persons associated with the University of Michigan. The Directory is a phonebook-like database of faculty, staff, students, and alumni of the University. Of course, like all computerizations of paper-based technologies, the Online Directory Service promises to add value to the concept of a phone book. For example, with the Online Directory, you can: Currently, the same process that provides the contents of the campus phone book also provides the contents of the Online Directory. There are additional data feeds that add transient populations not included in the phone book, such as the University's temporary workers, to the directory. Anyone who can use a standard Lightweight Directory Access Protocol [LDAP] client on the Internet can search our Online Directory. Directory information is made available via a directory master server which tracks changes and feeds three shadow servers, which handle the bulk of directory lookup requests.

Who are customers of the Online Directory Service?

Most of our customers want one of three services: first, to search for contact information for people at the University; second, to provide a simple e-mail address that others can use to contact them; and third, to create mailing lists. Almost everyone who has used a computer here at the University has had some need for one of these three basic services. There are also departmental customers who want a feed of HR data with which to populate their own directories. Finally, we have identified a class of customers who use the directory to determine University affiliation information. This is interesting for units who, for example, deliver services based on eligibility criterion that includes University affiliation.

A brief overview of some common directory terms

Directory data is stored in entries in a fashion similar to how data is stored in relational databases. A directory entry is equivalent to a row in a database table. Each directory entry contains a set of attributes, which are similar to fields in a database row. The number, name, and data type of the attributes in an entry is determined by that entry's object class. Directories use object class definitions as templates when creating a new entry. Every attribute in an entry has a data type, which determines what kind of data may be stored in the attribute, and how it is compared to (or matched with) other data. A single attribute can have one or more values, values comprising the actual data in the directory.

That's a lot to consider, so let's look at an example: the entry in our directory containing information about me. My entry is in the object class umichPerson. That means, among other things, it has an attribute for my title, for my business phone and address, and so on. It does not have attributes such as the member attribute, since that contains values that are the names of each member in a group, and I'm not a group, I'm a person. The values of the attributes in my entry are string data describing these things about me. The attribute containing my business address, in fact, contains more than one value -- each value is one line of my street address here at Argus.

Overall directory organization appears more like the Unix file system than a relational database. Directory servers maintain entries in more or less a tree organization called the directory information tree, or DIT, for short. Each entry has a distinguished name, or DN, which is a unique path through the DIT to that entry. No two entries share the same DN. A DIT looks more like a computer's file system than it does a database. An example of a DN might look like this:

cn=Charles E Lever, ou=Information Technology Division, ou=Faculty and Staff, ou=People, o=University of Michigan, c=US

As you move to the right in a DN, you move towards the "root" of the tree, and the components become more general: c=US stands for the United States. Moving to the left in a DN means moving towards the leaves of the tree, and the components become more specific: ou=Information Technology Division is one unit out of all the units under ou=Faculty and Staff. The leftmost component of the DN is the actual entry to which the DN refers. cn=Charles E Lever is the entry that contains information about me.

A directory search operation is accomplished when a client specifies a search filter. This is similar to using SQL to search a relational database. A simple search filter names an attribute to search, data values to search for, and what matching process to use. For example, you can search for all entries that have a homePostalAddress attribute that contains the string "Westwood" (substring match), or you could search for anyone whose last name (surName attribute) is "Smith" (exact match). The server can return zero or more results, depending on its data contents, the type of matching requested, the access you have to the directory, and how many search entries it tried to look through to satisfy your request. More complicated filters can be formed by using logical operators to combine simple filters. You can also limit searchs to a particular subtree of the DIT, or even to only one level of the DIT (in other words, all of the entries at a particular level of the DIT, but none of the children of those entries).

On with the show

Delivering accurate directory information to our customers on a timely basis is the fundamental mission of a good directory service. It is useful to separate this mission into two major categories: delivering directory contents quickly and reliably, and ensuring the contents of the directory are accurate and up-to-date. As it turns out, issues involved with delivering directory information focus mainly on the technical implementation of the directory server software, while issues involved with the accuracy of the directory information are more concerned with how the data itself is managed. In the next section of our strategy report, we spell out the technical and business process challenges that face our service. There are also important questions about a directory service's funding model, since it is not possible to charge our users directly for the service.

Data Delivery Issues

Data delivery issues directly affect how easy it is to get to the contents of the directory. User perception of the directory service results from how quickly information can be obtained from it; how quickly and easily can one connect (bind) to the directory, how fast search operations occur, how often trouble occurs when updating an entry, and so on. Also related to delivery issues are factors that determine how quickly problem recovery can occur (that is, problem tracking, capacity planning, back-up efficiency and disaster recovery), how interconnected the directory service is with other parts of the University's computing infrastructure, and how efficiently operational support can be carried out.

Data Quality Issues

Data quality issues have to do with how usable our directory information is. For instance, it is difficult to answer questions about a population of students or faculty from information stored in the Online Directory because entries for people have not been expired on a regular basis for several years. So we must create new business processes (or revise existing ones) that sustain or improve the quality of the contents of our Online Directory.

Data quality professionals often use four orthogonal dimensions by which to measure the quality of data, which is often difficult to quantize.

  1. Accuracy - correctly recording the facts
  2. Completeness - having all relevant information recorded
  3. Consistency - having a uniform format for recording data
  4. Timeliness - having data as up-to-date as possible

Since we currently don't have direct control of the data we are fed, we will need to collaborate with our providers to improve the quality of the data. However, directory technology allows new mechanisms for improving data quality. For example, allowing every user to have both read and write access to their directory information means they have the opportunity to keep it up to date themselves. This means that

This information model is known as self-reporting. Of course, self-reporting assumes every user will make an identical and best effort to keep his or her directory entry up to date. And of course, we must understand what data can be self-reported, and what data we must maintain.

This section of our report examines some of the processes that are already in place, and offers some choices about how we can improve them, or change them to more effectively use the new paradigms offered by directory technology.

Cost Issues

In this final section of my introduction, I'd like briefly to discuss funding models and expenses. This is important because it defends the concept of vendor-supported rather than home-grown solution. We hope to integrate an off-the-shelf solution (at least as far as that can take us), rather than construct our own solution.


Tactical Directions

I've organized the contents of this section to separate threads of effort. Each thread will accomplish one major part of our directory improvement process, and most of the work can be accomplished concurrently with other work going on in other threads. Each subsection describes milestones for a single thread, and provides a brief risk analysis and a customer impact statement.

Thread A: Reinforce current service

Thread B: Improve Data Quality

Thread C: Migrate to CGI-based LDAP client

Thread D: Select and adopt a vendor-supported standard server

Thread E: Deploy new directory-related services


Server Software Evaluation Criteria

As you can tell, this section is incomplete. However, I include it here as a reminder that this information needs to be gathered and summarized, and it is appropriate to add it here when it becomes available.

Evaluation Criteria

Consideration of Specific Technologies


Deliverables Timeline

Deliverables Timeline

We plan to deliver the items in the right column before the dates in the left column. As with any deliverables schedule, this is meant to serve as a starting point for discussion and negotiation.

Upgrade windows for production services: Thanksgiving holiday (3 days), Winter holiday (12 days), Spring break (6 days), Memorial day (1 day), Spring-Summer hiatus - July 4 (4 days), Labor day (1 day)

Due date Deliverables
April 1, 1998
  • [A] Performance and reliability enhancements for current X.500 service
  • [A] New operational procedures for handling replication problems
  • [A] Regular usage reports available via the web
  • [D] Deliver Kerberos 4 server support code to Netscape
  • [A] ISODE technical support engaged for troubleshooting
May 15, 1998
  • [C] Working beta of LDAP client
  • [C] URL referring to new LDAP client to include in Internet kits
  • [A] Monitoring enhancements
  • [A] Documentation of existing service first draft
  • [E] Mailing list server prototypes available for testing
  • [A] Completed migration to instrumented buildable R2.0v3 servers
July 1, 1998
  • [A] Operational procedures documented
  • [C] Completed LDAP client with documentation
  • [B] Data extraction jobs revised
  • [B] Purge and account lifetime policies agreed upon
  • [B] Continuous online update in test
  • [D] Selection of new directory server technology
  • [E] Selection of new mailing list server technology
August 15, 1998
  • [B] Purge policy implemented
  • [E] All affiliation information appears in Person entries
  • [B] Entity IDs replace University IDs in directory
  • [A] Migration to Solaris complete
  • [B] Continuous online update complete
  • [E] Pilot mailing list service available
October 1, 1998
  • [D] Pilot version of new directory server technology
  • [C] Pilot version of new LDAP client for use with new servers
  • [E] New account creation systems designed and in test
  • [E] Class lists on mailing list service
  • [C] Announce decommission of old LDAP clients
November 15, 1998
  • [D] Test harness constructed for new directory server technology
  • [B] New HR data feeds in test
  • [E] New account creation systems in use
  • [E] Test public key data and administration in directory
  • [E] Draft process and policy for delivering data in bulk to depts
January 1, 1999
  • [D] New directory server technology deployed
  • [B] New flat organization under ou=People
  • [B] New HR data feeds in use
  • [C] Old LDAP clients decommissioned
  • [E] Mailing list service roll-out
  • [E] Bulk data delivery processes in place