Re: [netatalk-admins] Problems with sherlock searching netatalk


Subject: Re: [netatalk-admins] Problems with sherlock searching netatalk
From: Rob Newberry (rob@eats.com)
Date: Wed Mar 17 1999 - 12:17:31 EST


> I'm pretty certain that I know why Sherlock doesn't work. I believe that
> Sherlock index files for full text searches (and possibly how it tracks
> "found" files) is done via file IDs. If you think about it, that makes
> sense, because using file IDs would keep index files much smaller.
>
> i have two comments here:
> 1) sherlock doesn't use index files for find file searches. in fact,
> you can't create text search index files right now. as i don't
> have the sherlock find content db specs, i can't implement things
> in afpd to create them either. in the future, i believe that
> sherlock will be able to create index files on the server. in
> that situation, of course, whoever created the file has the
> rights to change it.

Just to be clear, I never said that Sherlock used index files for "found
files". I meant that it might somehow use "file IDs" for found files.
For "Find File", I know that Sherlock uses FBCatSearch if available, and
if not, does a recursive enumeration. However, WHEN it finds a matching
file, it is still POSSIBLE that it resolves the file ID (using
FPCreateID), and stores that somewhere to reference when the user clicks
the item in the found list -- that's what I meant by saying "(and possibly
how it tracks "found" files)". If your experience shows that it doesn't
use file IDs in this situation, that's groovy.

Most likely, "Find File" in Sherlock uses the Alias Manager to create a
temporary alias record in memory to the found item, and stores that. If
your volume supports file IDs, the alias manager will most likely use that
file ID to create the alias. If not, it will create an alias using DID +
name, which SHOULD still work.

I am aware that you can't index things on the server because you don't
have the find content db specs. I suspect that it's not an easy task
either -- otherwise, AIAT (which is what Sherlock uses) wouldn't have
taken Apple so long to produce. It is conceivable that Apple will NEVER
document the format of these files, and consequently, we may never be able
to produce them independently. This is doubtful, though -- someone could
probably reverse engineer the format of those files. Still, until either
Apple documents it or someone figures it out, "afpd" obviously can't do
the indexing -- that's why I said it would be cool, but currently only
Apple possesses that knowlege.

Since the feature to index remote volumes has been disabled on the Mac,
we're hosed for content searches of afpd volumes until this information is
available. I forgot that, so even user's home volumes are off limits for
Sherlock indexing right now.

I still think it's likely that file IDs are used in the index files for
purposes of reduced file sizes. Otherwise, it would have to store entire
alias records. Taking a quick look at a copy of TheFindByContentIndex on
my machine, I don't see things that look like alias records. I'm still
betting on file IDs being necessary for full-text indexes.

But the fact that you can't index remote files certainly means we won't
know for a while :-).

> 2) let's just say that your "certainty" is misplaced. one of the
> reasons why i added persistent did support was to see if it would
> fix the sherlock problem. it doesn't. all it does is turn up
> another problem which is most likely the underlying problem
> causing sherlock to fail. actually, it turned out that all i
> really had to do was pretend to add persistent dids to really
> test things out.

I don't exactly understand this. Netatalk has always "pretended" to have
persistent DIDs. If I go do a "Find" and then click on an item, surely
the DID would have been persistent that long -- I haven't unmounted the
volume! What is this "underlying problem"? Is this the "incorrect
ordering of CNIDs that you're talking about?

> hmm. i did just have a thought. it may be that sherlock is incorrectly
> making some assumptions about the order the cnid's should follow. i'll
> go fiddle a little to see if that's it.

That seems pretty odd. My guess would be that Sherlock creates an alias
record for each file that it finds, and stores a reference to that for
when the user clicks the item. Creating an alias should resolve for items
if the DID is still the same. Or, if you support file IDs, then the file
ID should be used by the Alias Manager, in which case, it should also
resolve.

Rob



This archive was generated by hypermail 2b28 : Sat Dec 18 1999 - 16:16:27 EST