Question on Rebuild Pictures References

Fred Slota · January 23, 2022

So, I'm watching the process rebuild Comic Book references for my personal scans, marveling at how quickly and smoothly it is running through 974,129 individual Comic Book issues in the database, and I have a question...

Why is it running so smoothly?

I have 23,382 scans, from a fraction of the total available titles and a fraction of the total available publishers. If this process were organized by Publisher and Title, a check could be performed on the directory listings to determine if there is a folder for a given Publisher or Title, skipping over large swathes of non-existent pictures and speeding up the process.

Fred Slota · January 23, 2022

Further thinking suggests that the process might work better if the whole thing was reversed. I'm suggesting that the process find pictures and attempt to match them to database items, as opposed to what appears to be finding database items and attempting to match them to scans.

Currently, my official Pictures directory has about 780k pictures and the database has nearly 1.1 M individual entries. Ideally, every picture goes with a matching database entry, meaning that the process is accessing every picture, and is making 320k attempts to access pictures that don't exist, or about 29% that don't exist. For my personal scans, that ratio rises to about 98% that don't exist.

If the process ran by walking through the Pictures directory structure instead, you would effectively remove the attempts to access pictures that don't exist.

Plus, by walking through the Pictures directory, a report could be generated for orphaned pictures

Peter R. Bickford · January 24, 2022

We have to start from the database entries and find scans that match, since punctuation etc. makes the reverse of the process problematic. E.g., a title like "Bob: The Living Superman" has a picture folder of "Bob- the Living Superman" [since ":" is an illegal character in Windows--there are hundreds more like this]-- you can't start from the picture folder and reliably find which character was replaced (by the "-") to get the database entry.

Fred Slota · January 25, 2022

Okay, so the entire process can't work backwards. But what about a process that uses my first post in this thread, and once we've determined that the title directory exists, then run through the jpgs to match to the database items in that title?

Clear all database picture data.

For Each (database Publisher)
- If (Publisher folder exists)
  - For Each (database Title of Publisher)
    - if (Title of Publisher folder exists)
      - For each (jpg in Title of Publisher folder)
        
        If (database Item of Title of Publisher exists)
        
        update database from jpg info.
        
        Endif ; database Item of Title of Publisher
      - Next ; jpg in Title of Publisher folder
    - endif ; Title of Publisher folder
  - Next ; database Title of Publisher
- endif ; Publisher folder
Next ; database publisher

Or possibly more efficient, use an internal flag on database items. Clear all flags at the start, set the flag for an item when it is updated from jpg info, and then at the end clear the database info for all items where the flag is still unset. This way, every database item's picture info is touched once and only once.

Fred Slota · January 25, 2022

22 hours ago, Peter R. Bickford said:

We have to start from the database entries and find scans that match, since punctuation etc. makes the reverse of the process problematic. E.g., a title like "Bob: The Living Superman" has a picture folder of "Bob- the Living Superman" [since ":" is an illegal character in Windows--there are hundreds more like this]-- you can't start from the picture folder and reliably find which character was replaced (by the "-") to get the database entry.

Actually, I believe you can. With the understanding that there should be one and only one title from a given publisher that would match.

Wildcards.

I.[Title] LIKE "Catwoman_ Lonely City" successfully found items with the Title "Catwoman: Lonely City". AND with Publisher and Bob (the Living Superman)'s your uncle.

Incidentally, what would happen in ComicBase if there were two titles from the same publisher that both produced the same pictures folder? Human sacrifice, dogs and cats living together... MASS HYSTERIA?

Steven L. Dasinger · January 25, 2022

Quote

Incidentally, what would happen in ComicBase if there were two titles from the same publisher that both produced the same pictures folder?

Couldn't happen. In such a case one would be 1st series and the other 2nd Series (as an example).

If you are talking about the same Publisher, same Title, same Year, Month and Day (not very likely), they could randomly name one 1st and the other 2nd (or some other change so they don't have the same Publisher/Title name).

If you are referring to the DOS name on disk, the same applies. Something would be changed to make it unique.

Steven L. Dasinger · January 25, 2022

Quote

Actually, I believe you can. With the understanding that there should be one and only one title from a given publisher that would match.

Wildcards.

It isn't quite as simple as a substitution.
CB has Unicode characters. DOS doesn't.

A CB Title like:

"Art of Nausicaä of the Valley of the Wind, The: Watercolor Impressions"

Would have a DOS name of:

"Art of Nausicaa of the Valley of the Wind, The- Watercolor Impressions "
Note the last 'a' in Nausicaa.

If you do a Find in CB for Title Like (or Contains) 'Nausicaa', it won't find anything.
(There is some programming being done where you can type in 'Nausicaa' in the Find search box at the top of the CB window, it will find the Title. But no in a Find/Advanced Find window).

A program can usually do anything (within limits).

One simple method to get what you are looking for is to just store the translated DOS name in a CB Table. Then it would an easy match from Windows File name to CB Title (at the expense of storing all that information).

Fred Slota · January 25, 2022

You were assuming that they were originally identical titles. I was pondering if they were punctuated differently in a way that would end up clashing? Suppose a Publisher decided to introduce a pair of cliffhanger anthology series like DC Challenge, with two separate titles, say "And Now?" and "And Now!"? But I guess the answer is that ComicBase would append some parenthetical to at least one of the titles so they would be functionally separate, as in all those "(Series xxx)" addendums.

Ah well....

Still leaves the skeleton program I listed above as functional, I think.

Question on Rebuild Pictures References

Recommended Posts

Fred Slota

Fred Slota

Peter R. Bickford

Fred Slota

Fred Slota

Steven L. Dasinger

Steven L. Dasinger

Fred Slota

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity