The University of Michigan's
Online Directory Service Strategy Report:
A Working Draft

By Chuck Lever

Last updated on Sat Mar 21 11:44:36 EST 1998

Abstract

We document challenges faced by the University of Michigan's Online Directory Service, enumerate a set of solutions that the Directory Team plans to take to address many of these issues, and explore the risks associated with our suggested solutions.

Special note: This report is designed to be printed, or viewed with a web browser.

Abstract
We document challenges faced by the University of Michigan's Online Directory Service, enumerate a set of solutions that the Directory Team plans to take to address many of these issues, and explore the risks associated with our suggested solutions.
Special note: This report is designed to be printed, or viewed with a web browser.

Executive Summary
Introduction
Tactical Directions
Server Software Evaluation Criteria
Deliverables Timeline

The contents of this report are Copyright © 1998 by The Regents of the University of Michigan. All rights reserved.
Any trademarks or service marks appearing in this report are the property of their respective owners.

Executive Summary

The University of Michigan's X.500 directory service, as it was originally known, was not a "designed" service, because it was created in the earliest days of online directories when not much was known about how directory services could be used. As well, directory services were born before the advent of the World Wide Web. Some of their function has been subsumed by the Web's. We now find ourselves in the position of having to reverse engineer our directory service, because not very much is known or was recorded about the purposes or mission of the directory service. This process will require some effort as we formally analyze the directory data model and applications which use directory data, such as Targeted E-mail. Because the directory service has been around for a while, we also need to determine what is most valued about this service so we can maintain those parts of it while trimming away that which is not as useful.

The current service is a quagmire, admittedly, of unknown dependencies, ancient software that no longer builds, inappropriate operational procedures, and damaged reputation. Our task is to take this chaos and focus it into a relevant, reliable, and scalable service that meets and exceeds the expectations of our customers in steps that clearly demonstrate progress and improvement. Improving the quality of an existing service upon which so many depend requires careful study of the existing implementation, and careful management of changes.

Our eventual goal is to run the directory service on a standards-based, vendor-provided solution that requires much less programming or documentation support from ITD staff. While ITD was the birthplace for much of this technology, it is no longer capable of sustaining a serious LDAP software development effort. Our strategy, therefore, is to choose a product that will scale to support our lifetime customers, that integrates with our chosen security infrastructure, and that requires less support effort from us. We know that our security integration requirements often fly in the face of using unaltered off-the-shelf software. Under this software base, we want to engineer a secure, high-performance, high-reliability hardware foundation and data store. And using this software base, we want to provide accurate and relevant data to enable our customers to build new applications and trust that our service will be there to support them.

In the last four months, the Directory Services team has spent considerable effort attempting to reconstruct a vision of how this service should work. This effort has included:

assembling a team of developers and operations staff
radically improving the performance of the existing service
finding ways to eliminate replication problems
generating automated reports describing directory usage
researching modern directory technologies, including the Netscape Directory Server, U-M's LDAP reference implementation, and X.500 enterprise directory server products
assembling documentation describing directory data feeds
beginning work on an CGI-based LDAP client

All of these things have helped us to improve substantially the base of technology and information from where we start today. From here, we plan to strengthen further the current server deployment, develop new services and improve data quality and diversity, and within the next year, replace the current service with a high-performance, standards-compliant, vendor-provided solution.

Introduction

Service Description

The University of Michigan's Online Directory Service provides a computerized directory of persons associated with the University of Michigan. The Directory is a phonebook-like database of faculty, staff, students, and alumni of the University. Of course, like all computerizations of paper-based technologies, the Online Directory Service promises to add value to the concept of a phone book. For example, with the Online Directory, you can:

find a person whose name you can't spell accurately
find information for members of transient affiliates, like temporary employees
look up a person's name from their phone number
retrieve additional information about a person, such as the URL of their home page, or their University affiliation
modify your own directory information and have the changes made publicly available almost instantaneously
maintain an e-mail address for yourself @umich.edu which forwards to any mailbox you choose
create and maintain public electronic mailing lists

Currently, the same process that provides the contents of the campus phone book also provides the contents of the Online Directory. There are additional data feeds that add transient populations not included in the phone book, such as the University's temporary workers, to the directory. Anyone who can use a standard Lightweight Directory Access Protocol [LDAP] client on the Internet can search our Online Directory. Directory information is made available via a directory master server which tracks changes and feeds three shadow servers, which handle the bulk of directory lookup requests.

Who are customers of the Online Directory Service?

Most of our customers want one of three services: first, to search for contact information for people at the University; second, to provide a simple e-mail address that others can use to contact them; and third, to create mailing lists. Almost everyone who has used a computer here at the University has had some need for one of these three basic services. There are also departmental customers who want a feed of HR data with which to populate their own directories. Finally, we have identified a class of customers who use the directory to determine University affiliation information. This is interesting for units who, for example, deliver services based on eligibility criterion that includes University affiliation.

A brief overview of some common directory terms

Directory data is stored in entries in a fashion similar to how data is stored in relational databases. A directory entry is equivalent to a row in a database table. Each directory entry contains a set of attributes, which are similar to fields in a database row. The number, name, and data type of the attributes in an entry is determined by that entry's object class. Directories use object class definitions as templates when creating a new entry. Every attribute in an entry has a data type, which determines what kind of data may be stored in the attribute, and how it is compared to (or matched with) other data. A single attribute can have one or more values, values comprising the actual data in the directory.

That's a lot to consider, so let's look at an example: the entry in our directory containing information about me. My entry is in the object class umichPerson. That means, among other things, it has an attribute for my title, for my business phone and address, and so on. It does not have attributes such as the member attribute, since that contains values that are the names of each member in a group, and I'm not a group, I'm a person. The values of the attributes in my entry are string data describing these things about me. The attribute containing my business address, in fact, contains more than one value -- each value is one line of my street address here at Argus.

Overall directory organization appears more like the Unix file system than a relational database. Directory servers maintain entries in more or less a tree organization called the directory information tree, or DIT, for short. Each entry has a distinguished name, or DN, which is a unique path through the DIT to that entry. No two entries share the same DN. A DIT looks more like a computer's file system than it does a database. An example of a DN might look like this:

cn=Charles E Lever, ou=Information Technology Division, ou=Faculty and Staff, ou=People, o=University of Michigan, c=US

As you move to the right in a DN, you move towards the "root" of the tree, and the components become more general: c=US stands for the United States. Moving to the left in a DN means moving towards the leaves of the tree, and the components become more specific: ou=Information Technology Division is one unit out of all the units under ou=Faculty and Staff. The leftmost component of the DN is the actual entry to which the DN refers. cn=Charles E Lever is the entry that contains information about me.

A directory search operation is accomplished when a client specifies a search filter. This is similar to using SQL to search a relational database. A simple search filter names an attribute to search, data values to search for, and what matching process to use. For example, you can search for all entries that have a homePostalAddress attribute that contains the string "Westwood" (substring match), or you could search for anyone whose last name (surName attribute) is "Smith" (exact match). The server can return zero or more results, depending on its data contents, the type of matching requested, the access you have to the directory, and how many search entries it tried to look through to satisfy your request. More complicated filters can be formed by using logical operators to combine simple filters. You can also limit searchs to a particular subtree of the DIT, or even to only one level of the DIT (in other words, all of the entries at a particular level of the DIT, but none of the children of those entries).

On with the show

Delivering accurate directory information to our customers on a timely basis is the fundamental mission of a good directory service. It is useful to separate this mission into two major categories: delivering directory contents quickly and reliably, and ensuring the contents of the directory are accurate and up-to-date. As it turns out, issues involved with delivering directory information focus mainly on the technical implementation of the directory server software, while issues involved with the accuracy of the directory information are more concerned with how the data itself is managed. In the next section of our strategy report, we spell out the technical and business process challenges that face our service. There are also important questions about a directory service's funding model, since it is not possible to charge our users directly for the service.

Data Delivery Issues

Data delivery issues directly affect how easy it is to get to the contents of the directory. User perception of the directory service results from how quickly information can be obtained from it; how quickly and easily can one connect (bind) to the directory, how fast search operations occur, how often trouble occurs when updating an entry, and so on. Also related to delivery issues are factors that determine how quickly problem recovery can occur (that is, problem tracking, capacity planning, back-up efficiency and disaster recovery), how interconnected the directory service is with other parts of the University's computing infrastructure, and how efficiently operational support can be carried out.

Scalability and reliability

Scalability

reliability

Scalability is often affected by more than just how powerful a server engine might be. For example, we might be able to stuff half a million entries into our directory server, but if we can't run our "daily" backup within 24 hours, or can't quickly restore the directory in case of a disaster, then the directory service doesn't scale well. Likewise, even if we can get a large number of users into the directory, can it support an increasing number of search requests and modifications to directory entries?

In particular, the Online Directory Service faces all of these scalability challenges:

Total number of directory entries
Search rate (caused by our e-mail forwarding system) and complexity
Modification rate
Backup and restore time

A challenge related to scalability is reliability, because often reliability suffers as a service scales up. The Online Directory is not useful if its users cannot quickly find what they are looking for, discover out-dated or inconsistent information, or cannot modify their entry or create mail groups when they need to. And, as the size of any directory grows and its user base becomes more dependent on it, the effect of unreliability is greatly amplified.

Replication is a basic part of any distributed database, and directory service is no exception. Replication of the directory provides solid guarantees against problems that erode service reliability and scalability by keeping several separate copies of the directory on different hardware and in different locations. Disk media failure, power and network outages, and server overload conditions have greatly reduced effects on a properly replicated service. The underlying replication mechanism should be robust, taking advantage, for example, of transaction processing to help the mechanism tolerate faults.

There are at least four different mechanisms for maintaining data in separate databases and directories:

LDAP or DISP replication - standard directory replication protocol
metadirectories - bridging information across different directory technologies
multi-master replication - directory server peers provide mastering for different parts of the same entries
bulk transfer of flat files - the recipient directory is built up from scratch using a flat file version of the source directory

Each may be appropriate depending on the server technology running on each end of the connection.

As it happens, the University of Michigan has one of the largest user bases in the world for several of its infrastructural directory-style services (Kerberos authentication, online directory services, and others). What this means is that this University is constantly pushing the performance and scalability envelope for these types of services. As well, we are early adopters of many technologies because our technological needs are often ahead of the state-of-the-art requirements. Both of these factors work against our efforts to provide a reliable and scalable directory service.

Compliance with contemporary standards

LDAP was "invented here." In fact, the University of Michigan still provides public access to a reference implementation that is free, and that many around the Internet still use. However our reference implementation is for version 2 of LDAP; Netscape, many would agree, now provides the reference implementation for version 3. Our reference implementation is incomplete; client referrals, for example, are not supported in the clients or libraries distributed with the reference implementation. As well, patches are made available, but are never integrated into the distribution we provide. The last full version release of our reference implementation was a few years ago. It is likely that this release will be the last.

Since Netscape, Microsoft, and Novell recently adopted LDAP as their system-wide distributed directory access protocol, LDAP has seen significant face-lifts. New protocol versions means older clients don't support many necessary and important features of newer servers, such as:

SASL multi-protocol authentication
LDAP client-side referrals
Schema information embedded in the directory
Virtual attributes that flag entries with children
Standardization of access control and replication
True object-orientation (real inheritance capabilities)
RDN components that exist as searchable attributes

And as LDAP becomes more wide-spread, our directory servers will be required to support access by new clients. These new clients will appear in the form of embedded applications of LDAP. For example, e-mail clients will want to store their configuration and address book information in directories, and access that information via LDAP.

Unfortunately, the LDAP standards only apply to the protocol by which clients access a directory server; i.e. how to contact a server, how to specify directory searches, what kind of server responses there will be, and so on. The LDAP standards do not specify how the data is organized in a directory. A good way to think of this is that LDAP is, in some ways, equivalent to SQL(tm) -- and it certainly would be silly for an SQL standard to specify the data contents, function, and organization of a database. So eventually we must confront the data compatibility issues that exist between, say, the Microsoft Active Directory and any more-or-less X.500-compliant directory. An Active Directory not only stores the same data in different attributes, it also organizes and controls access to the data very differently. It is clear that some translation or adapter software will have to exist to move data back and forth between Active Directories and our Online Directory, should we choose to maintain compliance with the X.500 standards.

LDAP-compliant clients can also make dangerous assumptions when accessing an X.500-compliant directory. For example, the ISODE X.500 server implementation supports the ability for newly created entries to inherit certain attributes from the parent. This feature does not exist in any LDAP server we know of. This is important for us, however, because our current directory clients rely on newly created entries inheriting the proper object class from their parents. Without this object class, the new entries are all but invisible to searches.

We should also consider how much DAP-compatibility we need to retain. Are we interested in continuing to service DAP requests from around the world, as we do with our current service? If we throw DAP out the window, are we abandoning potential customers of the University, and disconnecting ourselves from the rest of the world, who still use and depend on DAP for their directory access?

Authentication

Certainly the implementation of a directory service is made much simpler when only a limited few can update the contents of the directory. However, as part of the democratized environment at the University, Online Directory users can modify their own directory entries to reflect changes in their personal information. As well, it is easy to implement a flexible e-mail forwarding service by adding a mail forwarding address attribute to each person's entry. A person can advertise a single convenient e-mail address to their friends and colleagues. Moving to a different e-mail server then becomes as easy as updating their mail forwarding address in their Online Directory entry -- their advertised e-mail address doesn't change.

In an effort to pursue the ideal of "single signon" for all infrastructural services on the University of Michigan campus, Kerberos was adopted by most service providers in the University of Michigan computing environment as the network authentication service of choice. This selection of Kerberos requires that it be supported by any online directory service deployed here that allows its users to bind to the directory and make modifications to it.

There are two varieties of Kerberos, past and future, that we need to consider as we evaluate directory server software:

Kerberos 4 -- MIT and AFS: Infrastructure services provided under the auspices of the University of Michigan computing environment currently use the AFS flavor of Kerberos 4 for authentication service. Kerberos 4 actually comes in several flavors; one from Transarc, one from MIT, and several flavors from Unix vendors such as Sun Microsystems.
Kerberos 5 -- MIT and Microsoft: Microsoft's NT operating system will become increasingly important to our computing infrastructure in the coming years. As Microsoft has adopted Kerberos 5 as its network authentication service, we must recognize the requirement for our infrastructure services, and thus our online directory service, to support Kerberos 5 as well. Our best guess is that Kerberos 5 support will be required late in 1998 or early in 1999.

Interoperability

Server-server -- a vendor collaboration challenge

There is a growing market for products that bridge the gap between directories that don't use a common replication protocol. These products, known as metadirectories, can help us to provide directory data to customers such as the Medical Center, who run GroupWise(tm), a product that doesn't speak LDAP. Departmental e-mail server products often have their own built-in directory component, which should be populated appropriately with the same data that populates our online directory.

The LDAP standards are still evolving in the area of backup and replication. This continues to be a concern as more vendors, more standards, and more departments enter this part of the equation.

Client-server -- a software distribution challenge

We can't control the distribution and use of clients in the same strict ways that we can control the selection and use of server software. When changes are made to server software that are incompatible with clients or that will break features of the service if the clients aren't changed as well, we will be responsible in no small part for helping our users obtain and install new versions of the client software. As the size of the University of Michigan computing environment increases, this task approaches unmanageability.

Therefore, any strategy we have for the directory service must address the constraints placed on it by the the client-server architecture. In other words, we must design the service for greater client compatibility, or we must limit the amount of version and protocol deviation that can occur between a client and server implementation.

Operational support

Training operational personnel

By using a common hardware and OS platform, and by taking advantage of a vendor-created solution that includes full documentation and training resources, we can economically maintain a staff of several operations staff who are trained to handle problems with the directory service.

Platform (hardware/OS) support

A common hardware and OS platform means we can maintain a high level of expertise to provide hardware and OS problem resolution, and can maintain the OS and hardware at a late revision level. As well, we should depend on a standard administrative service to support our systems, provide security patches and a standard application base, and performance and security monitoring.

Backup/Restore

Our backup mechanism will need to scale as well as the directory server software itself. Directory backup must be accomplished "on-line"; that is, the directory must be available for searches and modification during the backup process. Our backup mechanism must also maintain the same security and access control that is set up in the directory itself. Regular restoration drills will provide us with training and data to help us analyze capacity and scaling issues.

We get a steady stream of requests for day-old data. One reason for this is that current LDAP client software is clumsy when it comes to maintaining e-mail lists. However, like AFS, we can provide a repository containing yesterday's directory contents that our customers themselves can use to recover from their own mistakes.

Disaster Recovery

Disaster planning should include splitting the existing hardware among several independent sites on campus. We've identified at least three sites that could house a directory replica, including the Argus machine room, the School of Education, and the first floor of the North Campus Computing Center. Currently all three replicas are co-located at Argus with the umich.edu mail500 servers, which are in turn co-located with the IMAP servers. The master server resides in the North Campus Computing Center machine room.

As well, we need to gather data about how quickly we need to, and can afford to, recover from a large-scale "act of God" disaster. The directory service itself depends on few other services, namely having connectivity, and access to DNS and Kerberos services. Issues include:

Maintaining recent backup media off-site
Making spare hardware and hot site environments available
Building fault-tolerant hardware/OS solutions

The "people" data in the directory comes mostly from HR databases, although some nontrivial number of entries are set in "non-batch-update" mode, meaning the data is not necessarily a copy of what is in the HR database. The master copy of the "group" data, however, is the copy stored in the Online Directory, so the cost of losing this data is much greater than is the cost of losing the "people" data.

Service Monitoring

Most directory server products have monitoring facilities built right in, including SNMPv2 agents and standard MIBs. We will also need to make sure other pieces of the service are monitored, including the success of the backup process, replication status, and so on.

Capacity planning and performance monitoring are key to the success of a large-scale service. We can purchase or easily create automated report generators that can digest log information generated by the server software. Enabling the creation of regular performance and capacity reports is a requirement for the server software.

Data Quality Issues

Data quality issues have to do with how usable our directory information is. For instance, it is difficult to answer questions about a population of students or faculty from information stored in the Online Directory because entries for people have not been expired on a regular basis for several years. So we must create new business processes (or revise existing ones) that sustain or improve the quality of the contents of our Online Directory.

Data quality professionals often use four orthogonal dimensions by which to measure the quality of data, which is often difficult to quantize.

Accuracy - correctly recording the facts
Completeness - having all relevant information recorded
Consistency - having a uniform format for recording data
Timeliness - having data as up-to-date as possible

Since we currently don't have direct control of the data we are fed, we will need to collaborate with our providers to improve the quality of the data. However, directory technology allows new mechanisms for improving data quality. For example, allowing every user to have both read and write access to their directory information means they have the opportunity to keep it up to date themselves. This means that

the information can come directly from its source, potentially improving the accuracy of the data
keeping the information up to date is scaled across all users, rather than bottle-necked through a few data-entry personnel, improving the timeliness of the directory data
directory information is forms-based, improving the overall consistency of the directory data

This information model is known as self-reporting. Of course, self-reporting assumes every user will make an identical and best effort to keep his or her directory entry up to date. And of course, we must understand what data can be self-reported, and what data we must maintain.

This section of our report examines some of the processes that are already in place, and offers some choices about how we can improve them, or change them to more effectively use the new paradigms offered by directory technology.

Directory data management

Updates from the master personnel database

Currently the Online Directory contents are updated on a scheduled basis from three administrative data sources. This update process requires that the Online Directory be in read-only mode while the update runs to prevent directory users from losing changes they may make during the process. It would be an improvement if this process happened more often, and less disruptively.

We have explored ways of making this happen. We could split the directory into several databases per server, keeping one in read-only mode while leaving the others in read-write mode during an update. Access control in modern LDAP implementations makes it possible to effect an "entry lock" which prevents users from changing their own entries while they are being updated by this process, instead of putting an entire directory into read-only mode. As we test and deploy LDAP servers, we will evaluate the potential of using this mechanism to provide a continuous update process.

Of course, shifting away from X.500 server technology means that the current update software will have to be re-implemented, since it depends on the directory existing on disk in EDB format. LDAP servers don't use EDB, but can export their directory data into LDAP Directory Interchange Format [LDIF] files. LDIF is a flat file ASCII representation of a directory, useful for transporting a directory from one hardware architecture to another, or for backing up a directory. The update software can be changed to use LDIF instead of EDB, or it could function entirely on-line by issuing LDAP commands while the directory is left in read-write mode.

We would also like to hone the data extraction process that provides our data feed from the personnel databases. These jobs haven't been changed in several years, although University business processes and policies have changed, and our experience with the data feed has shown that the original jobs aren't quite what we need. We might get better quality data if we used a feed from the appropriate Data Access datasets, for example, instead of the myriad of jobs now in place to provide our feed.

Our current update process is also error-prone. And when an error occurs during this process, it is usually a big error. Parts of the DIT are deleted, or thousands of entries disappear. The reason for this is that the update process is complicated and operates on the directory as a whole using heuristic rules. As the directory has grown, catching errors in the update process becomes very difficult because it has to process more than a hundred thousand entries. This is an important factor in the quality of our service -- preventing and recovering from errors in the monthly update. We must find ways of more thoroughly checking the results before they become a permanent part of the directory data.

Expiry of Group and Person entries

The accuracy of the information in the Online Directory not only hinges on whether a person's entry is correct, but also whether directory users can trust that if information about a person appears in the directory, then he or she must still be associated with the University of Michigan. Part of the monthly directory update process attempts to discover and flag entries that should be expired and eventually removed from the Online Directory.

Infrastructure services in the University of Michigan information environment have an arguably unique problem: it is very difficult to tell when a person is no longer associated with the University. For instance, if a student graduates, her association may continue if she chooses the University as an employer, or if she wishes to continue her education. Otherwise, she becomes an alumni. New services like U-M Online serve the alumni of the University of Michigan using many pieces of information technology infrastructure that are shared with those currently affiliated with U-M. Also, an individual can be faculty, staff, or a student, or any combination of the three.

The procedures and data sources that determine when a person is no longer affiliated with the University are not timely or complete, and as a result, numbers of users and amounts of data have been building up in the UMCE. As well, service providers do not have a clear method of removing old and expired data and users. Services such as the Online Directory Service and the Institutional File System [IFS] also must handle data for expired accounts. What happens to a user's entry data when a he is removed, and then returns to the University? Or what if he is removed by accident? Expiry policies must be applied by all services consistently; otherwise a user can end up with, for example, an IFS home directory but no Online Directory entry. Further confusing matters, a person may be affiliated with the University, but not eligible to access computing services that are provided here. Is the Online Directory a repository for all who are affiliated with the University, or only those affiliates who are also eligible to access and use computing resources that are provided here?

Finally, purging directory entries often leaves "dangling DNs." In other words, if my entry is purged, there are still references to my entry left in groups of which I may have been a member, or in proxy attributes for entries for which I may have had administrative authority. We'll need to create processes by which these dangling references are discovered and removed from the directory.

Data privacy

One of the biggest issues on-line these days is unwanted electronic mail, nick-named "spam." Because the Online Directory Service provides an e-mail list service (X.500 groups), and publishes the e-mail addresses of every person in the directory, it becomes an easy information source for those wanting to abuse electronic mail services. Unfortunately, it is very difficult to tell the difference between a directory user who is legitimately sending electronic mail and a user who might be attempting to use the directory to send spam. Any effort to hide e-mail information makes the directory significantly less useful to legitimate users. We can, however, make it more difficult for automated processes, like web robots, to scan our Online Directory for e-mail addresses.

Any directory of this type contains not only useful information about each of its users, but also "metainformation" about the population who maintain information in the directory. We should expect that applications for this metainformation will be created and used, especially since U-M is a research university, but also because the population in the Online Directory Service is large, and more-or-less complete. However, making the directory available in bulk can be an invasion of our users' privacy, and should be embarked upon carefully. The presence of Social Security Numbers in Online Directory entries bears some mention at this point, although the use of SSNs as unique identfiers is being phased out by the University. The Social Security information that currently exists in each user's directory entry is an example of data that should not be made available in bulk exports to researchers or departments. Data guidelines in the U-M Standards Practices Guide can help us resolve these issues.

Are we to continue providing bulk feeds for departmental e-mail services that include built-in directories, such as GroupWise or Lotus Notes(tm)? Acquiring the data is expensive for us, and providing it to departments may cause unwanted exposures of some of the data. If we continue to provide this service, we should understand its economic implications, and how to filter sensitive information, if necessary. As well, it may be easy to provide such a feed once a month, or even once a week, but what if the department wants updates more often than that? When is it appropriate to use a flat-file data feed, versus something more complicated, like a metadirectory?

As mentioned before, LDAP access control mechanisms are evolving as the LDAP standards evolve. This evolution may make it easier to store all information about users, and use Access Control Instructions [ACIs] to hide information that they may not want to make publicly available. Users or administrators can carefully tailor the information that anonymous binders can view using ACIs. Currently, "Don't Publish" information in the phone database is not included in the Online Directory. Having more complete information about each user can greatly increase the value of the Online Directory.

For example, an administrator can create a new attribute for Person entries called hiddenHome. Then, she creates an ACI or ACL that operates on a filter such as "hiddenHome=FALSE" that makes the homePhone and homePostalAddress attributes readable by anyone. Otherwise, if hiddenHome in a Person entry contains TRUE, these two attributes in that entry become invisible to everyone. Since users can set the hiddenHome attribute in their own directory entry, they themselves can choose to hide or expose their home information, even though it can still be contained in the Online Directory.

DIT reorganization

There are a couple of good reasons to consider reorganizing our directory information tree. First, the current organization only allows the representation of one University affiliation per person. For example, my entry is under the ou=Information Technology Division, ou=Faculty and Staff subtree, implying that I'm an employee of the University who works for ITD. What if I were to enroll as a student? Or if I split my University appointment with a part-time job at the School of Social Work? That information wouldn't appear, since my first affiliation already places me at a certain point in the directory hierarchy.

We think that the affiliation information really belongs in an attribute in every person's entry, instead of determining a person's entry's location in the directory tree. That way, a person's entry can reflect every affiliation, not just one. Of course, this will require some reorganization of the directory tree, and it will require that our DN's will have to be rewritten. My DN, for example, might be rewritten cn=Charles E Lever+uid=cel, ou=People, o=University of Michigan, c=US. Remember that DNs are all unique, so I've added on +uid=cel in order to distinguish my entry from other "Charles E Lever" entries in the directory.

The second reason for considering DIT reorganization is that the main campus directory, ours, must be compatible to some extent with PC-based departmental directories. One of the major incompatibilies between an X.500 compliant directory and Microsoft's Active Directory is the DIT organization (and entry contents, which are covered elsewhere). We'll need to explore how our DIT organization must change in order to interoperate with Microsoft's AD.

Storing directory backups and FOIA

It bears mentioning that the University, and all of its data, are subject to FOIA laws. As managers of human resources data, we must decide whether we want to be prepared to answer a FOIA request. We usually don't have any use for directory data any older than a few months; keeping it less than six months (the FOIA limit) shouldn't be a problem.

It is also important to mention again that backups must be subject to the same access control and privacy requirements as the directory data had when it was online.

Account creation mechanisms

As well, we can easily see using an LDAP-accessible directory as a scalable distributed authorization server. Such a server might replace or augment AFS pts groups, providing authority checking for network services not related to AFS. Another use might be for maintaining subscription information for UMCE services. We can add a serviceSubscription attribute to each person's entry, and populate it with values reflecting a person's subscription to each UMCE service. Using LDAP access control, we can prevent updates to this attribute by anyone other than the subscription authority.

Currently, such authorities are lumped into one category: cn=Manager. It would make more more sense if we had many administrative entities, and better used X.500 or LDAP access control to allow administrative authorities fine-grained access to parts of entries, or parts of the directory tree.

Year 2000

Evolving administrative infrastructure

We will also need to explore getting data from new sources. For example, the ID card database may be an appropriate source of information for the directory, in addition to our current feeds, because it may have a more complete listing of employees and students than do the databases from which we are currently fed. The unique identifier assigned to each student and employee is changing from an SSN-based ID number to something called an Entity ID, produced by the ID card database. It may also be the case that many of the populations that are currently missing from our data feeds, such as bargained-for employees, are present in the ID card database or some other database to which we can readily get a data feed. We should establish formal categories for inclusion in the directory; we believe this has never been done.

Eventually it may even be appropriate for the directory database to be the master source for some parts of the University's personnel data. As described earlier, the directory provides an authenticated means for users to update their own personal information on-line. We might be able to take advantage of this in order to feed these changes back to the HR databases, or become the master of this data, feeding our data to HR. This is appealing because it could potentially reduce the complexity of the processes that maintain this data for the University.

Auxilary Services

Alumni Directory

The Office of Development wishes to maintain an online directory of alumni of the University of Michigan. This directory would provide a networking (in the traditional sense) service for those trying to locate information on University alums. We feel LDAP technology is appropriate for providing public accessibility to this data with Kerberos authentication, and we have experience deploying this technology.

Targeted E-Mail

The Targeted E-mail Service needs to convert University IDs (basically, SSNs) into uniqnames. Currently, it does this by searching the Online Directory. Manager privileges are required by the search application since University IDs are not publicly accessible. A second directory lookup is required to convert uniqnames to each user's mailboxes. We think there is room for improvement in this process.

Mailing List Service

The Online Directory is currently the University's mailing list service. However, directory technology falls short of several critical requirements, such as:

keeping the distribution list private
document and list archival, search, and retrieval
providing list content in digest form on a regular basis
authentication of moderators
ability for non-University users to subscribe and unsubscribe themselves

We feel that a real mailing list service would add value and important features to the existing X.500 group framework.

Phonebook-like listing for University business units

The directory contains phone listings for most people at the University. However, there are no listings for business units and departments, as there are in the paper phonebook. It would add value to the Online Directory to make available business unit and department contact information as well as people contact information. We could form a collaborative agreement with Marketing Communications that provides the paper phonebook by adding electronic means for submitting departmental contact information.

User certificate management and scaled-up PK administration

Directories will become a crucial component of public key infrastructure as public key technology increases its presence here at the University. We need to understand what kind of performance, reliability, and storage requirements that the directory will have to provide support for public key. We should also study existing standards and implementations, including PGP, X.509, the Netscape Certificate Authority, and products marketed by Entrust.

Feeding data to new directory services, cross-replication, and bulk data feeds

In previous sections I outlined a requirement to provide directory data feeds to departments. We should analyze this need and create processes for setting up feeds to departments to provide directory data for departmental e-mail services. This might even become a charged-for service.

Cost Issues

In this final section of my introduction, I'd like briefly to discuss funding models and expenses. This is important because it defends the concept of vendor-supported rather than home-grown solution. We hope to integrate an off-the-shelf solution (at least as far as that can take us), rather than construct our own solution.

Realities of a general-fund service -- no cost recovery

In addition to all of that, vendors are charging database level license fees for directory software, since it is an enterprise-wide application built on database technology. And, we're still calculating the monthly cost of our data extraction jobs. There are two separate black-borders, and part of a third, running into the hundreds of dollars a month per job, that specify our extraction requirements.

Finally, we must support specialized authentication (Kerberos), the specialized update processing to convert HR data to directory data, and we must plan for capacity and model our data organization. This requires several full-time development staff, each with unique fields of expertise.

There is no charge-back mechanism in the directory, so we don't recover our costs for providing this service. We feel the directory service is infrastructural, and should be covered without a per-use or per-user fee. This will require a significant commitment from ITD to cover the expenses for this service. I think we'll have to make a better case for this.

Software Integration v. Development

Let it be noted that this approach has some limitations. First, maintaining a corporate relationship can sometimes be as much work as developing software. As well, vendors have been known to take work done here at U-M and sell it back to us at a high premium. Some would argue that having access to a copy of buildable source for the server software enhances our ability to troubleshoot and resolve problems; most vendors are not interested in providing such a luxury. Finally, reliance on a vendor means we are at the whim of their support service and their development staff.

Tactical Directions

I've organized the contents of this section to separate threads of effort. Each thread will accomplish one major part of our directory improvement process, and most of the work can be accomplished concurrently with other work going on in other threads. Each subsection describes milestones for a single thread, and provides a brief risk analysis and a customer impact statement.

Thread A: Reinforce current service

Buildable source: In order to service and modify the existing servers, we must be able to build from available source code. At the moment, the running servers do not match the source code we have available. We will reconstruct our relationship with the ISODE Consortium, recover source and documentation, apply the latest patches, test, then migrate to new servers built from our source code. We will also attempt to acquire and deploy the latest possible version of this software that is compatible with our current deployment.
Run on Solaris: Because our operational services will be subsumed by ITD OM during the summer, we must migrate to an operating system that ITD OM can support. We also want to take advantage of the administrative support of the PowerAdmin service, which can offer performance and security monitoring, automated patch distribution and software upgrades, and other important services. Our current server software will not run on Solaris, only on SunOS, so we will need to migrate to a new version of the server software before migrating to Solaris.
Instrument the server software: This effort involves creating automated report generation, crafting specialized network monitoring tools, and modifying parts of the server software in order to understand our performance and reliability problems. Gathering this information will allow us to correct these problems, greatly improving the perception of the service, as well as enabling new applications that require high reliability and performance.
Improve recovery from replication problems: We have new "surgical" techniques that allow us to correct replication problems in just a few minutes, often without putting the master into read-only mode. We will use these techniques instead of fully replicating the directory data when replication problems occur. Also, by referring all of the account creation tools to the master directory server, the replication mechanism is removed from the account creation process. This will mean that if replication is stuck or broken, account creation can continue normally (as long as the service is in read-write mode).
Construct data reports for the directory contents to understand how the directory is being used and abused: These data reports will help us project resource requirements for our services, and are an important part of the capacity planning process. We will also use these reports to identify misuse of the directory, and to help reorganize the directory to make it more useful.
Review, optimize, and document the current configuration: Most documentation for the server software is missing, and little documentation was produced about design issues or about basic care-and-feeding. Some investigation of the current server configuration has already paid off by helping us resolve performance and perceived reliability issues. We will continue reviewing the configuration of our servers and continue studying how they interact with services that are closely related (e.g. the e-mail forwarding service provided by umich.edu).

Customer impact:

Required resources: We can use 2.5 FTEs with software development skills to handle the server upgrades, software research, and documentation. These tasks will probably take the developers about 8 months (elapsed) to complete. We will need access to test hardware, and monies to upgrade current server hardware and provide additional spare components.

Risk analysis: There may be a limit to how much we can improve the current service. SunOS debugging tools are somewhat limited compared to what is available on Solaris. Also, support for threads on SunOS is an older variety, less standard, no longer supported, and harder to use, than it is on Solaris. Instrumenting the server software may introduce timing or performance problems that break the current service. We may encounter existing service dependencies that prevent some of these changes. Finally, there may be little useful information in the reports we generate.

Thread B: Improve Data Quality

Refine our data extraction programs: Our current data extraction programs haven't been reviewed or modified for several years. We will review the current programs, refining them to improve the quality of data in the Online Directory. We will focus on removing century dependencies, and getting ready for the deployment of PeopleSoft HR applications, as well as specifying more clearly what data and data sources we need.
Review and improve the current directory organization: We don't believe the current directory organization serves our customers well. We will reduce the complexity of the directory tree, and increase the amount and usability of the data that is contained in each entry. We will also study the organization and data format of PC-based directories for which we will be responsible for providing data feeds, and modify the current organization of the Online Directory to better suit exporting its data.
Assemble an account lifetime policy with other UMCE service providers: Creating such a policy will allow us to expire dead entries to improve the accuracy of directory contents. This is an important but complicated issue because it is difficult to identify the status of individuals associated with the University. We will pursue policy revisions and clarifications with our customers who have appliations that use the data in the directory, such as the Medical School and the Library, with other teams in the UMCE, and with the maintainers of our data sources.
Improve the mechanism that updates the directory from HR data: We will create new requirements for the program that processes updates the directory from personnel databases, then implement the improvements. We plan to make the update process less disruptive, more frequent, and provide better expiry information. We will improve the reliability and recover characteristics of the process to prevent major outages or large numbers of missing entries.

Customer impact:

Required resources: 2 FTEs with data modelling and software development skills can accomplish these tasks in about 10 months (elapsed).

Risk analysis: There may be a limit to the amount of reorganization we can accomplish with the current server software, since it is less flexible and more fragile than more current directory products. We may encounter political complications while attempting to clarify our account lifetime policies. Altering the data extraction programs may prove to be expensive or technically difficult; the HR databases may not contain the data we need, or may not maintain the data at a level of quality that is acceptible to our customers.

Thread C: Migrate to CGI-based LDAP client

Develop client prototypes to understand technical implementation issues: We will use rapid application development techniques to create one or more prototypes of a CGI-based LDAP client that will allow searches and modifications to LDAP-accessible directories. The prototype will help us design a scalable, secure, and easy-to-use CGI-based LDAP client.
Study and document client requirements: We will run some focus groups and brainstorming sessions to discuss how the new client will look and feel, it's interface paradigm, and what our users may need or want in a new client. We will document our research and use it as design input for the real client.
Contract with DIS to implement a Kerberized CGI-based LDAP client that uses "protected web space" for authentication: We will contract with Departmental Information Systems (formerly Web Services) to design, implement, and deploy a CGI-based LDAP client that uses the Kerberos-protected web space with SSL.
Create end-user documentation: We will write end-user documentation that we will include with the LDAP client, and make available in the ITD documentation database. We will also spend effort training consultants to use the new client.
Deploy the new client: We will make the new client available along side the other GUI clients during the summer. During this time we will evaluate and improve the client's GUI, feature set, and documentation.
Decommission support for old GUI clients: We will announce as widely as possible that the old GUIs are no longer supported for modification of directory entries, and advertise the URL of the new client.

Customer impact:

Required resources: 2.5 FTEs with project management, software development, and user interface design skills can accomplish this project in about 5 months. We will also need ongoing hardware resources on which to deploy the CGI scripts once the client is done.

Risk analysis: This solution may not be as scalable or secure as we hoped. Differences in browser implementations may make the GUI or even the feature set quite different for different browsers. We may not produce a client that is significantly better than the old clients. We must at least deliver an URL to the "Internet Access Kit" team before they finalize their '98 Kit in May, for delivery to incoming students during Summer Orientation.

Thread D: Select and adopt a vendor-supported standard server

Document directory server requirements and regression tests: We will carefully study and reverse engineer the current implementation in order to gather requirements for the new service, and to provide a set of regression tests with which we can evaluate the new service implementation. We will analyze and document our operational requirements.
Form corporate relationship with vendors to understand their intentions, directions, and level of support: We will contact a short list of vendors who market enterprise-class directory server products, and acquire candidates for evaluation.
Construct test harnesses: We will construct a test harness from spare hardware to simulate the load a server may experiences during production use. We will also test basid functionality, such as Kerberos authentication, replication, access control, and backup.
Select a product and vendor: We will choose a product based on our server evaluation criteria and the results of our tests.
Build on corporate relationships: After choosing the product, we will build our relationship with the product's vendor. For example, we can offer to help evaluate early software releases and provide large-scale testbed for vendor products in exchange for improved technical support for us, allowing us to provide customer requirements directly, and having access to nondisclosed information to help us make long-term decisions.
Migrate from current software base to vendor's software: We will develop a detailed plan of how to migrate to the new server software which will include how we will back out. We will execute this plan during an extended school holiday such as Winter or Spring Break.

Customer impact:

Required resources: It will take 3 FTEs with directory administration, performance modelling, and software development skills about 6 months (elapsed) to accomplish this effort. This effort will also require temporary hardware resources while testing and migrating to the new server software.

Risk analysis: It will be impossible to completely reverse engineer the current service -- some pieces will be missing when we eventually cut over. Constructing a test harness can be difficult and expensive, and may not always reveal all the problems. Some vendors are uncooperative. Given the age of our current system, it is likely we will have to use drastic cut-over instead of a gradual migration.

Thread E: Deploy new directory-related services

Strengthen support for mailing lists

We will work with, specifically, the Mailing List, Targeted E-mail, and Class List services to migrate mailing lists with special requirements to more appropriate technology.

Integrate UMCE account creation with IAA

We will address account lifetime issues with other parts of the UMCE which face similar challenges, especially ABS, IAA, IFS, and IMAP.

Collaborate with the Office of Development to create and manage an LDAP-accessible alumni database

The Office of Development wants to provide a database of alumni that is publicly accessible, and allows alumni to self-report changes. We believe that LDAP is the right technology to help them achieve this goal. We will work with them to provide this service to alumni.

Store much more information in the directory

We will explore maintaining new information in the directory that can enable new applications. Some examples of this include:

e-mail client configuration
UMCE computing environment information, such as the UID integer, IFS home directory, and more
all University affiliation information
a format-independent personal e-mail address book
Public Key information (X.509 or PGP)
all personal information, but kept hidden from the public

Evaluate replacements for mail500

New technologies are becoming available that might replace our current mail500 e-mail forwarding system with something more efficient and manageable. We will explore these technologies and help migrate our e-mail service to them.

Bulk data delivery to departments

It is clear there is need to provide HR data to departments that maintain their own directory databases. Most departments would prefer that the contents of their directory matches the contents of ours. We will develop mechanisms to provide bulk data feeds to departmental customers, either via flat file data extract, or using a metadirectory product, or both.

Customer impact:

Required resources: This effort will require the allocation or migration of ongoing resources with which to evaluate and provide new services. We will also need to identify hardware and operational resources on which to base new services.

Risk analysis: There may be some limits to what even the new directory server software will allow us to accomplish. We still need to study carefully metadirectory and public key technology to understand how it works and what it will mean to deploy it. Account lifetime issues may be too complex to address.

Server Software Evaluation Criteria

As you can tell, this section is incomplete. However, I include it here as a reminder that this information needs to be gathered and summarized, and it is appropriate to add it here when it becomes available.

Evaluation Criteria

Performance and Scalability

Reliability and Availability

Backup and Recoverability

Service Development Costs

Directory Standards Compliance

Support for Kerberos Authentication

Consideration of Specific Technologies

U-M's LDAP Reference Implementation

Netscape's Directory Server 3.0

Solstice Directory Server from Sun

ISODE's Enterprise Directory Server R4.0

Microsoft's Active Directory

Novell's Directory Server

A home-grown SQL-based directory service

Via metadirectory from Zoom-It

Deliverables Timeline

**Deliverables Timeline**
We plan to deliver the items in the right column before the dates in the left column. As with any deliverables schedule, this is meant to serve as a starting point for discussion and negotiation.
*Upgrade windows for production services:* Thanksgiving holiday (3 days), Winter holiday (12 days), Spring break (6 days), Memorial day (1 day), Spring-Summer hiatus - July 4 (4 days), Labor day (1 day)
Due date	Deliverables
April 1, 1998	[A] Performance and reliability enhancements for current X.500 service [A] New operational procedures for handling replication problems [A] Regular usage reports available via the web [D] Deliver Kerberos 4 server support code to Netscape [A] ISODE technical support engaged for troubleshooting
May 15, 1998	[C] Working beta of LDAP client [C] URL referring to new LDAP client to include in Internet kits [A] Monitoring enhancements [A] Documentation of existing service first draft [E] Mailing list server prototypes available for testing [A] Completed migration to instrumented buildable R2.0v3 servers
July 1, 1998	[A] Operational procedures documented [C] Completed LDAP client with documentation [B] Data extraction jobs revised [B] Purge and account lifetime policies agreed upon [B] Continuous online update in test [D] Selection of new directory server technology [E] Selection of new mailing list server technology
August 15, 1998	[B] Purge policy implemented [E] All affiliation information appears in Person entries [B] Entity IDs replace University IDs in directory [A] Migration to Solaris complete [B] Continuous online update complete [E] Pilot mailing list service available
October 1, 1998	[D] Pilot version of new directory server technology [C] Pilot version of new LDAP client for use with new servers [E] New account creation systems designed and in test [E] Class lists on mailing list service [C] Announce decommission of old LDAP clients
November 15, 1998	[D] Test harness constructed for new directory server technology [B] New HR data feeds in test [E] New account creation systems in use [E] Test public key data and administration in directory [E] Draft process and policy for delivering data in bulk to depts
January 1, 1999	[D] New directory server technology deployed [B] New flat organization under ou=People [B] New HR data feeds in use [C] Old LDAP clients decommissioned [E] Mailing list service roll-out [E] Bulk data delivery processes in place

The University of Michigan's Online Directory Service Strategy Report: A Working Draft

Abstract

Contents

Executive Summary

Introduction

Service Description

Who are customers of the Online Directory Service?

A brief overview of some common directory terms

On with the show

Data Delivery Issues

Scalability and reliability

Compliance with contemporary standards

Authentication

Interoperability

Operational support

Data Quality Issues

Directory data management

Account creation mechanisms

Year 2000

Evolving administrative infrastructure

Auxilary Services

Cost Issues

Realities of a general-fund service -- no cost recovery

Software Integration v. Development

Tactical Directions

Thread A: Reinforce current service

Thread B: Improve Data Quality

Thread C: Migrate to CGI-based LDAP client

Thread D: Select and adopt a vendor-supported standard server

Thread E: Deploy new directory-related services

Server Software Evaluation Criteria

Evaluation Criteria

Performance and Scalability

Reliability and Availability

Backup and Recoverability

Service Development Costs

Directory Standards Compliance

Support for Kerberos Authentication

Consideration of Specific Technologies

U-M's LDAP Reference Implementation

Netscape's Directory Server 3.0

Solstice Directory Server from Sun

ISODE's Enterprise Directory Server R4.0

Microsoft's Active Directory

Novell's Directory Server

A home-grown SQL-based directory service

Via metadirectory from Zoom-It

Deliverables Timeline

The University of Michigan's
Online Directory Service Strategy Report:
A Working Draft