|
|
Welcome to the Invelos forums. Please read the forum
rules before posting.
Read access to our public forums is open to everyone. To post messages, a free
registration is required.
If you have an Invelos account, sign in to post.
|
|
|
|
Invelos Forums->General: Website Discussion |
Page:
1... 10 11 12 13 14 ...26 Previous Next
|
goodguy's Credit Lookup Plus |
|
|
|
Author |
Message |
Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting GSyren: Quote: I don't think we need to go into per-profile. Last keep it simple, at least for now. Cool. It can always be added later. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Sorry, a few days offline... Quoting mediadogg: Quote: Man I hope you guys are not totally sick of me, but I am determined to get this damned thing right. If I were smarter, it would be faster. Sorry.
Anyways, is there anybody that can confirm how many profiles in this XML file have credits for "ziyi zhang" spelled exactly that way? (I know there are 366 profiles in the file. But do I really have two profiles with no valid credits???? If so, which two?)
I appreciate the help in advance. I am so dizzy with code variations and watching progress bars ... As far as I can see, the two profiles are the German version of 4011976827085 and 4011976829584. Both Swiss profiles are complete crap: - the EAN 401x belongs to Germany - there are no swiss cover justifying a swiss locality profile for this titles - the credits are either from the cover or from IMDb - in one case the credited as is entered as role name ... but they are valid "Ziyi Zhang" profiles by the means of the CLT, both have FN="Ziyi", LN="Zhang" and CreditedAs="" The German profiles, which are not listed by the CLT have a credited as entry CreditedAs="Zhang Ziyi", which makes them to profiles for "Zhang Ziyi" but they are NOT countable for "Ziyi Zhang"! From this example 364 is the correct number of profiles. Quote: There are multiple cases: (1) - credited only in the creditedAs field (2) - credited in both creditedAs and F/M/L (3) - credited only in F/M/L (4) - neither F/M/L is only valid, if there is no CreditedAs entry... Let me use a (syntactical wrong) meta program code: if ( CreditedAs != "") then SearchString := CreditedAs else SearchString := FirstName + " " + MiddleName + " " + LastName endIf // Remove unneeded spaces SearchString := Regex.Replace( SearchString, "\s+", " ") SearchString := Trim(SearchString) | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) | | | Last edited: by AiAustria |
| Registered: March 14, 2007 | Reputation: | Posts: 4,678 |
| Posted: | | | | AiAustria,
I think I understand what you want, but let's take one thing at a time. First you and mediadogg agree on what profiles should be included in the output.
When I see and can test the finished CLTBoss I will make sure that the extracted data and profile count(s) that CLTinfo delivers meets with your approval. | | | My freeware tools for DVD Profiler users. Gunnar |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | I have found a way to analyze the results using CookTop XPath tool. Not sure what your point is the about meta code. We have had the discussion before, unless you think somehow I didn't understand. Aside from the fact that your code will often crash when run with the plugin API, it would not return the same results as the CLT tool, which is what I am trying to match. The CLT results DO NOT ignore multiple spaces. That is part of the problem.
99% of the problems I am having, have nothing to do with the search code - it is mostly the tricks I am inventing to getting around the timing dependencies of screen scraping sequential web pages with no notification of when the pages are completely downloaded. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Just out of curiosity: Do you have an example, where the CLT respects double white space?
Remark: In my code snippet I need the replacement to remove the double spacing I inserted some lines above by the dumb concatenation for F+M+L.
Why the snippet: To clarify that it is not necessary, nor allowed, to compare the F/M/L fields, if there is CreditedAs entry given.
My opinion on taking care of double spacing: From the view of the pupose of the CLT it is a bug to differentiate between single and double spaces (I can't reference a rule forbiding double spaces, but I can't imagine any reason why this should be allowed within name fields, in none of them (F/M/L/CreditedAS). | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 14, 2007 | Reputation: | Posts: 4,678 |
| Posted: | | | | Pardon me for butting in here, but it seems to me that since there are two programs involved, it is important that we make it clear which program is responsible for what.
My understanding is that CLTBoss should extract all profiles that contain the searched name, regardless of where the name was found, and it would be up to CLTinfo to count the relevant credits.
If that is the case, then AiAustria’s code snippet would be directed to me, not to mediadogg. If we’re not 100% in agreement on who is responsible for what, then things are going to be very confusing. | | | My freeware tools for DVD Profiler users. Gunnar |
| Registered: March 14, 2007 | Reputation: | Posts: 4,678 |
| Posted: | | | | Here's a radical idea:
Skip the current output from CLTBoss and only output full Profiles with the same syntax as the Profiler export, but with an additional <Variants> node to show what was searched for. I know that Jim considered something like this as an option, but not quite as "brutal" as this.
Then the client(s) - like CLTinfo, or whatever - can sort out the details. The upside of this would be that one could use other programs, like ProfilerQuery, to extract other information that one might desire.
This would mean more or less rewriting the whole CLTinfo, but I could live with that.
Any thoughts?
Edit: Or produce both types of output, but I don't quite see the use for the more limited output. | | | My freeware tools for DVD Profiler users. Gunnar | | | Last edited: by GSyren |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | The (only) purpose of the CLT was (is) to find the name an artist most often uses, the name we consider as his common name.
This is and will allways be the main issue to be solved by any CLT tool. And, therefore 364 is the correct count for Ziyi Zhang.
That said, I'm open to any information provided additionally... But the 366 example does not seem to be of any value, because it lists 364 Ziyi Zang profiles with two Zhang Ziyi profiles, which are rather randomly selected out 317 existing Zhang Ziyi profiles.
And yes, it was an idea popped up a while ago, to scan for all known name variants at once and leave the counting to the presentation part of a solution. But this would mean a lot of extra work: - there would be an array neede to collect all valid name variants - some help for the user entering them would be nice (eg. automatic flipping, removing middle initials, ...) - name variants that come up while scanning should be added automatically - ... - then there would have to be single CLT call and scraping process for each and every name variant - then the presentation tool would be requested to separate the whole bunch into the different variants and count them
I did not follow this idea, since, from my point of view it is more iportant to get a stable and futureproof CLT tool than getting high sophisticated new features. But it would save a lot of time, before setting up common name threads, if all the information for one person could be gathered with one single step | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 14, 2007 | Reputation: | Posts: 4,678 |
| | Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting GSyren: Quote: Ok, that's not how I initially understood that it should work, but if that's what you guys agree on, then that's fine with me. We are actually all on the same page. Remember I considered the 366 a bug and asked for help figuring it out. I did finally discover a bug in my duplicate detection code, which allowed the extras to slip through and somehow get allocated as having a match when they didn't. And I also stated up front that my goal was to match the CLT exactly. So, guess what, we are all saying the same thing. The biggest limitations for me are my poor programming skills and frustration with the plague that has been cast upon us all. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | That being said, please remember that the CLT tool is code, written by a human, many years ago. If anybody thinks that code cannot have bugs, then they are mistaken. Not saying it does, just making the point that the only way I could guarantee identical results would be if I duplicated their program logic identically, including bugs.
So, it is entirely possible, that I could have code that matches CLT 999 times, and then for some weird case, have a difference for some reason I could not predict (example, people have talked about CLTPlus crashing on corrupt profiles, or profiles that won't even download into DVD Profiler). So, please don't try to hold CLTBoss to a higher standard than it is possible to reach.
Once it is (ever?) released, if you detect an error, all I need is the profile ID (s) that are included in the result that does not match the search, or the profile ID (s) that should have been included, but were not. Period. I can figure out the rest. A lecture on why the profile was poorly created, or the messy database, or tutoring on how to write code, does not matter in terms of the goals of CLTBosss. Match the search. Period. And BTW, if the results are different from the CLT results, but the matches are accurate, then as far as I am concerned, there is no bug. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Still waiting for Tom Cruise to finish, so I will post a bit of a preview what I am trying to perfect:
I have given up trying to choose a set of "delays" that always work with an unpredictable network.
So, here is what I will provide, three ways to scrape:
(1) Click-scrape-next: this is a very fast and very accurate if you have less than 10 to 20 pages. Takes 10 min.
(2) An auto-scrape that uses AutoIt to press a "Scrape Displayed Page" button combination with javascript to click on the page, in a loop. The start and end pages of the loop, I scrape off the CLT screen. After each page, if I get less than 25 profiles, then I scrape again, and if still not, I add to an error list. At the end of the all pages scraped, the error report is presented and the user can manually click on the few pages that had errors, before running the XML scan.
(3) A pop-up scrape, that interrupts the user after each page that somehow triggers the complete download of the data. This is annoying, but you can be "sure" to get all the profiles scraped before running the XML scan. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 14, 2007 | Reputation: | Posts: 4,678 |
| Posted: | | | | Quoting AiAustria: Quote: And yes, it was an idea popped up a while ago, to scan for all known name variants at once Sorry if there is something I have missed, but does this mean that CLTBoss will now only scan one single name at a time? Regardless, it would still be advantageous to have the actual search argument as part of the output. | | | My freeware tools for DVD Profiler users. Gunnar |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting GSyren: Quote: Quoting AiAustria:
Quote: And yes, it was an idea popped up a while ago, to scan for all known name variants at once Sorry if there is something I have missed, but does this mean that CLTBoss will now only scan one single name at a time?
Regardless, it would still be advantageous to have the actual search argument as part of the output. The current design and implementation of CLTBoss is to allow for scanning as many variants as it finds in the variants table. The resulting set of profile IDs collected and the resulting XML scan will be a collection that includes the results for all variants. The user has the option to use this feature or not. That being said, given the issues I am having stabilizing the scraping operation, I obviously am focusing on getting 1 right. Then I will worry about more than 1. But the design remains the same. As we all are (or have been) programmers, I am sure you understand that it is very difficult to rip out features and change a program design after the fact. I thought the idea of multiple variants was a good one, so I took a shot at it. In hindsight, it might have been better to go with the idea of an automatic generation of variants from a single search field. If I live long enough, and if people continue to care, I have a list of things to try in "son / daughter of CLTBoss" which includes: - separate plugin - use of Chrome instead of IE - single search field with auto variants And BTW, I am not going to spend time proving anything about the Profiler database. If you have a point to make, YOU prove it with a profile example. You show me, and I will write code to handle it. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Actually, what I am leaning towards is a plugin that has NO search smarts. It would be a solid, reliable, as fast as possible scrape of the CLT - get the list of profileIDs, grab the Invelos XML and sayonara. Then all the search smarts, multiple views, etc., "SuperCLTPlus" or whatever, would be in an external tool(s). Actually CLTBoss will deliver that, along with its extra baggage. Since it you can now scrape the profiles and immediately dump an Invelos Collection based on the profiles, you can completely ignore the CLTBoss XML scan and not worry about my interpretation of the search. I scrape, you search. Already possible! (yes, I was thinking ahead). | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Sorry, one more. But you will love it. Something you might not have noticed is that CLTBoss will load the CLTPlus XML. If one trusts CLTPlus scraping (when it doesn't crash on a corrupt profile), then go ahead and load it into CLTBoss. You will have a set of credited profiles that you trust. Simply ask CLTBoss to download the Invelos XML into a collection and then search / sort / dance to your heart's content. Don't worry about how many spaces I squeeze out! Oh yes, it might not be your Birthday, but you can go ahead and celebrate anyway. (Edit: If you want the current CLTBoss just for the purpose of using the XML download, I can post you a link in your PM. But it will still be unreleased for testing of the scraping and XML scan) | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
|
|
Invelos Forums->General: Website Discussion |
Page:
1... 10 11 12 13 14 ...26 Previous Next
|
|
|
|
|
|
|
|
|