MONDAY, MONDAY Bumped
Update: Though we are not quite at the threshold where Blogger takes exception to the number of comments on any one thread, seemingly around the two hundred mark, I think it prudent to re-up the post and carry on afresh here. Thank You.
The Mamas and the Papas had plenty to say on the subject. The McCanns, on the other hand, had nothing to say on the subject, either when asked by police in 2007 or since (in Kate McCann’s ‘Account of the Truth’).
And now it appears they are silent once more – deaf to the question of why a computer file generated by CEOP and archived against a date of 30 April 2007 should have appealed for help in finding Madeleine McCann, who was not due to go missing until 3 May! The man who genuinely should know the answer, former CEOP supremo Jim Gamble, has also ‘assumed the foetal position’.
One cannot help but wonder whether Robert Murat booked his urgent early morning flight to Praia da Luz having read the CEOP announcement the night before. Or whether Kate really did take her famous ‘tennis photo’ on the morning of Tuesday 1 May, when Murat was heading home to Portugal.
You see, if Madeleine’s disappearance was known about on the Monday, it would have been when the child was still perfectly well and able to scamper around a tennis court the following morning. Should she then have been extricated from the family’s holiday apartment on account of some incapacity, this might suggest that CEOP also knew about that incapacity in advance.
You can hear the chorus from wherever you sit: “Oh no they didn’t! Kate McCann was confused. The ‘photo was already available to CEOP’s ‘mccann.html’ file (at 11.58.03)! The link was only broken temporarily - until the McCanns managed to communicate the image!” That very day - Monday 30 April; the morning when Madeleine’s group of infant crèche captives actually had an hour’s mini-tennis planned for 10 .00 a.m.
A ‘pic’ prepared within the hour then. Unless of course it was taken on the Sunday evening, following that impromptu social tennis session for newly-arrived adult guests (another truth accounted for by Kate McCann in her book). It does seem rather strange that a moment in time captured immediately following a group tennis session, be it a group of adults or a group of children, should show not a semblance of any one’s presence save that of the subject and her photographer.
And what of those CEOP internet ‘home pages’ that appear suddenly to have gone ‘tits up’ in October 2007? You know, the 10 October edition that cites the latest news to the 8th of the month and the 13 October edition that forgets all about it, but instead seeks to rival Reuters with a reference to what happened no later than the 2nd. Surely that and other strange perturbations can have nothing to do with the McCanns’ return to the UK, having been declared arguidos on 7 September, nor Jim Gamble’s protestations of their innocence a month to the day thereafter, and which were quoted in the Daily Mirror of the same date (7 October):
"We absolutely support the McCann family, they are to be applauded for their tireless work to keep the campaign to find their daughter in the public consciousness."
No, of course not. Pure coincidence, nothing more.
The current ‘hot topic’ though is that ‘30 April 2007’ archival date attributed by the Wayback Machine to certain CEOP internet files; files that make explicit reference to Madeleine McCann, the little girl who was not destined to leave the Ocean Club, Praia da Luz, until 3 May.
Whilst interpretation of the information they contain, both visually and in terms of their source code, suggests very strongly that the incriminating date (30 April) is in fact correct, there is a rump of detractors who remain adamant that neither of the two files, which feature heavily in the dispute, was composed, ‘crawled’ (archived), or whatever on 30 April, but that they were legitimately configured on some indeterminate later date and simply ‘misfiled’ by the Wayback Machine, which dropped a stitch somewhere along the line. As a staunch proponent of the WBM’s inadequacies has put it quite recently:
“The same process that archived with an erroneous date will have updated the index with the same erroneous date.”
Note the involvement of a single process, an (as in one) erroneous date, and the inclusion of the latter within the (solitary) index.
Since the keepers of the Wayback Machine have been alerted to these specific shortcomings, they are no doubt busily preparing an announcement to the effect that, having identified the process in question and corrected the system error responsible for appending that one false date (in nearly twenty years of operation) they have ‘fixed the problem’, and we can all now go back to work.
Unfortunately no.
The whole being the sum of its parts in this matter, archive.org will have to do rather better than that. Considerably better in fact. They will have to examine the architecture of their entire system if they are to convince anyone other than themselves that the ‘error’ which has been brought to their attention is confined to the archiving of but two files in 485 billion, since there is now further evidence that it just might have been a tad more widespread. Either that or CEOP have even more explaining to do.
The Wayback Machine is something of a technological wonder of the modern world. Its database is unimaginably large and its retrieval systems concomitantly complex. Nevertheless, at the touch of a button almost, it is possible to establish just how many files associated with a specific URL it has actually recorded over time, even those files set up and administered by CEOP – all 8779 of them according to recent estimates (see following):
For larger image, right click open in new tab.
If one takes the trouble to review this inventory, it very quickly retraces events back to….30 April 2007. And what should we find listed among all those separately identified files with their unique URL terminations? Why, two image files labelled ‘madeleine’, recognizable as ’madeleine_01.jpg’ and ‘madeleine_02.jpg’:
For larger image, right click open in new tab.
There can be no question that the ‘madeleine’ referred to here is Madeleine McCann, as these terminators are exactly those employed within the structure of the CEOP home page as visible (and archived) on 13 May 2007, a construct which, incidentally, features several references to ‘mccann.html’, another data structure that according to WBM detractors was not created until later that year. (Why on earth would anyone program a computer to access a non-existent file? I ask myself):
For larger image, right click open in new tab.
To judge from the foregoing, either The Wayback Machine could be off-line for a considerable period, while their ‘techies’ rebuild almost their entire indexing and retrieval systems, or J. Gamble Esq. had better come up with some convincing explanation as to what CEOP would have been doing with photographs of Madeleine McCann barely two days into the McCann family’s fatal 2007 vacation.
Martin Roberts
78 comments:
@Seahorse
I shouldn't be opening the batting here, but someone appears to have raised the drawbridge!
Having just now repeated your 'Google' exercise, it seems Google is demonstrating that 'Nelson touch' and cannot now see that link on p.2 to which you referred earlier.
And there's more: Searching the www.codexgeo address directly yields a solitary page on Payday loans, the design of which was copyright protected - in 2013!
Maybe it's my eyesight or some form of digital dyslexia that's failing me here, but would you mind terribly repeating those initial steps of yours, just to see if they lead to quite the same place?
If the 'twin towers' can be made to disappear, then anything's possible at ground level!
Kind regards
Martin R.
Dr. Roberts
I found it by including the excluded search results and whaddya know? Our trusty old manual uploaders have thoughtfully preserved what seems to be the entirety of the Dictionary of Scottish Architects [you can't make this stuff up] at archive.is
http://tinypic.com/view.php?pic=2nh40aw&s=8#.VZgAV_nWzhU
Googling Dictionary of Scottish Architects leads us to this rather pointed url:
http://www.scottisharchitects.org.uk/
And yes indeed googling the codexgeo.co.uk url brings us to payday loans page. Maybe they should have reserved that domain before attempting this trick?
I'm not actually pointing the finger at you Seahorse. This seems to have been a cute little game of 'lets see who will follow these bread crumbs'. Someone was bound to..
Cheers
whodunnit
@Seahorse
Hold your seahorses!
Bizarre thought it might seem, is there not a slight chance that the codex data you turned up is in fact accurate?
Look at the page to which I think it relates (following the links you provided). The 'menu' extends a very long way beyond the scope of the text to which it relates (lots of linking options for the inquisitive).
Looking across then to the densely coded page that is displayed courtesy of archive.org (again following the command string you provided), each line appears to describe two versions of the same file, each subtly different one from the other. There are clearly a lot of them, but with data transfer rates being extraordinarily high these days, is there not the chance at least that the whole bag of tricks was picked up at exactly the same time (just as the entire menu appears on the page to the viewer almost instantaneously)?
I'm asking the question here, not arguing the point.
What do you think?
Kind regards
Martin R.
@Whodunnit 16.54
Hello there
So those reams of coded files have more of a connection to architecture than debt (makes sense since the word 'building' occurs in every instance, 'money' in none).
And the relationship between said code and the money lending site is not as I had imagined. No matter, even if the comparison was but a fortuitous one, I think my question to seahorse a little earlier may yet have some merit.
That said, this could prove an interesting test case of its own - a legitimate block capture or a lure for Hansel and Gretel.
How fascinating.
Kind regards
Martin R.
Dr. Roberts
There isn't much doubt that the specific codex.co.uk url-address belongs to the payday loan page. I'm not sure why the Scottish Architects Dictionary was preserved under this address at archive.is, where, I must emphasize, anybody can upload and preserve web pages. It is a mystery.
Randomly checking codes on individual pages of the dictionary preserved at archive.is so far yields the date and time stamp under the heading 'you are here' only in the page for Gilbert Mackenzie Trench d. 1979. Poor old Frederick Thomas Pilkington d. 1898 does not even rate a date and time stamp in his page code. Not sure yet if this pattern will hold up.
I'm not sure but I think the uploader can manipulate the code in pages manually uploaded to archive.is
Cheers
whodunnit
Uh oh, another of my replies has flown the coop.
So to recap until it finds its way home again, I pointed out that:
1. Anybody can upload web pages to archive.is
2. Randomly checking codes on individual pages of the dictionary so far only one page has yielded the time and date stamp. Other pages appear not to possess any time/date stamp at all.
Cheers
whodunnit
Oops also, the codex.co.uk url-address does seem to belong to the payday loan site. I'm not sure why the Dictionary of Scottish Architects was preserved at archive.is under that url.
whodunnit
@Whodunnit 17.44
Are you implying that this is a 'tailored' outcome by any chance?
What's the (experimental) relevance here of archive.is to archive.org (the trail identified by Seahorse). Is it that by and large they provide the same info. and their 30.4 representations in this case are the same?
Sorry to appear slow here but I'm not lately accustomed to such levels of exploration.
Kind regards
Martin R.
@Dr Roberts,
I just wrote a reply (luckily in Word), but noticed previous blog closed for comments, so will post it here, after which I will read your new blog.
--------------------------------------------------------------
I have no grounds of suspecting there are any more as I assume the google search would have thrown others up as well.
Further on I can’t find any link between CodexGeo and Ceop.
It seems the original owners of Codexgeo let the domain registration lapse and some payday loan company has now taken it on. Most likely not of any relevance.
Whilst searching for any history about Wayback crawls to see how many were run in 2007 I drew a blank, but I came across some information that may or may not be relevant.
Most data collected by Internet Archive was donated by Alexa who used their own proprietary crawler (ia_archiver) for their own purposes (e.g. web ranking). https://en.wikipedia.org/wiki/Heritrix
Internet Archive indexed/archived this donated data which could then be accessed through the classic Wayback Machine.
In 2003 Internet Archive started to develop their own open source Heritrix crawler.
In December 2006 Internet Archive received a grant from the Mellon Foundation. With this grant they decided to do a 2 billion page web crawl using Heritrix . This crawl ran from 4th June 2007 to 15th December 2007. http://web.archive.org/web/20071112175931/http://wa.archive.org/aroundtheworld/index.html
They asked libraries/memory institutions from around the world to nominate URLS to be part of this crawl. 572 .uk URLs were nominated amongst which ceop.gov.uk, but not codexgeo.co.uk. In total 18000 URLs were nominated worldwide.
http://web.archive.org/web/20100705230141/http://wa.archive.org/aroundtheworld/countriesoz.html (scroll down to UK and select ‘see 572 sites’)
There is surprisingly little information about this epic crawl since its completion, but I found this little snippet by Gojomo:
“Later this year, the ‘Around the World’ collection will be merged into the classic worldwide collection, and it will all be available via the new Wayback interface. But for now, this alternate entry point is the chance to try the new crawl and Wayback.”
https://iawebarchiving.wordpress.com/2008/01/02/access-to-around-the-world-in-2-billion-pages/
From the 572 nominated .uk domains I randomly picked a handful to check if there were any with an unusual amount of captures on 30th April 2007. Another blank. Realising I couldn’t possibly check all of them, I googled the ‘20070430115803’ timestamp and found the codexgeo information.
Of course the 2 billion page crawl took place more than a month after the 30th April, but maybe something went wrong with such a momentous crawl or with the consequent indexing/archiving process of it.
Just a little calculation: if 2 billion pages were crawled between 4th of June and 15th December 2007 (195 days = 16,848,000 seconds) that would mean 2,000,000,000 / 16848000 seconds = 118.7 pages per second.
So 3786 pages for ceop and 16033 pages for codexgeo all captured on 30.04.07 at 11:58:03 seems extraordinarily high.
I don’t know what crawls Alexa or Internet Archive did prior to the 4th of June, but probably nothing like the ‘2 billion page’ scale. I also don’t know if ceop.gov.uk and codexgeo.co.uk were part of the same crawl.
As always this case throws up more questions than answers.
I am thoroughly disappointed that Wayback who pride themselves on their OPEN source Heritrix crawler, are not so OPEN about this situation.
Kind regards,
Seahorse
Dr. Roberts
Are you implying that this is a 'tailored' outcome by any chance?
I'm not sure how but I think it COULD have been. Pages preserved at archive.is are all manually uploaded. It does not 'crawl' pages on it's own. Syn or Nuala, can't remember which, kept trying to use archive.is to prove her muddled 'points' about WBM but I wasn't having it because 1. The pages are manually uploaded copies of WBM captures--archive.is codes and pages are not subject of the inquiry!! and 2. I have no idea and lack the expertise to figure out how the coding might be corrupted and/or manipulated by uploading an already crawled page to archive.is but corruption and manipulation both seem feasible.
Cheers
whodunnit
@ Dr Roberts
I just tried the google exercise in google.com and it's the 8th entry on page 1 for me.
In google.co.uk it's the 1st entry on page 2 for me.
Hope that helps.
Seahorse
Anon @18:24
Further on I can’t find any link between CodexGeo and Ceop.
It seems the original owners of Codexgeo let the domain registration lapse and some payday loan company has now taken it on. Most likely not of any relevance
1. I wouldn't expect a link between CodexGeo and Ceop and nobody has argued that there is.
2. Browsing the url at WBM we find assorted IT/software related sites parked at the codex dot co dot uk domain dating back to 1997, none of them having a thing to do with Scottish Architects, not even in 2007.
So to recap, we still don't know why someone manually uploaded the Dictionary of Scottish Architects to archive.is under the codex.co.uk url and we still don't know how at least one of them contains a date/time stamp it couldn't possibly have given the history of the domain.
Cheers
whodunnit
I see there are "spam" problems again. I shall check the spam folder frequently, but as you might appreciate I cannot tie myself to the PC.
A Google account, bogus or otherwise?
Himself, I see my missing comment has reappeared up there, along with multiple copies of Seahorse.
Cheers
whodunnit
@ Dr Roberts,
Sorry for all the duplicate posts. I thought it didn't want to upload.
Hope you can delete the duplicates.
Thanks,
Seahorse
Another little Blogger foible, it don't like multiple URLs. They go straight to the spam folder.
@ whodunit 18:24
A 9 November 2007 forum post (9th post) refers to http://www.codexgeo.co.uk/dsa/building_full.php?id=M014206
See here:
http://www.hiddenglasgow.com/forums/viewtopic.php?f=3&t=714&start=135
Is it not possible that the original owner let the domain registration lapse and the payday loan person registered it at a later date?
Seahorse
@Whodunnit/Seahorse
I am acknowledging - promise.
It looks like 'H's 'comments coach' driver's had a stroke at the wheel!
For myself, if the archive.is site is derivative I'd be tempted to cut it loose and focus on the WBM, since that's where the problems sit in the first instance.
Regards both
Martin R.
Is it not possible that the original owner let the domain registration lapse and the payday loan person registered it at a later date?
Seahorse
Clearly it is possible but it hardly seems likely that Scottish Architects would sit on a domain for such a short length of time as to be undetectable by actual web crawlers. In addition, I punched into the WBM the exact url provided by the poster to that forum in December '07 and got:
eh? Sorry, that doesn't look like an valid URL.
Captures of the url in 2007 only occur in January, which shows only a placeholder for the domain so it is possible the Dictionary existed at that url in April, but by November of 2008, the only capture that year, it was back to the same placeholder.
By April 17, 2009, the only capture of the year, we find this text: "hello welcome to codex a lot of new changes are happening and it's exciting times. we will be launching our new site shortly."
In the background is a pic of a computer moniter. The text at the top reads: Codex--Business IT, Internet, IP Telephony, Data Cabling
The only capture in 2010: Same.
2011: Jan--same. Sept--slightly different placeholder--"Delivering powerful user friendly technology", Nov: same as Sept
2012: All captures same as 2011
2013: Single capture, October 19, same text slightly different graphic.
So if the domain ever belonged to Scottish Architects, Dec. '07 poster to 'Hidden Glasgow Forum' notwithstanding, there is no evidence of it at WBM--the most reliable source on this particular subject I can think of.
Cheers
whodunnit
Dr. Roberts
For myself, if the archive.is site is derivative I'd be tempted to cut it loose and focus on the WBM, since that's where the problems sit in the first instance.
Oh absolutely. There is no point at all taking uploads and codes at archive.is into account in this discussion. codex.co.uk Scottish Architects Version is just a distraction but the fact that it exists at all on a manual upload site is interesting in itself.
Cheers
whodunnit
@Whodunnit 19.48/Seahorse 18.14 (and any number of occasions before that!)
I'm sorry if my way of thinking about these things offends those with a penchant for absolute reference to technical terminology, but might we have been overlooking something rather simple here?
To borrow from Seahorse:
“Of course the 2 billion page crawl took place more than a month after the 30th April, but maybe something went wrong with such a momentous crawl or with the consequent indexing/archiving process of it.”
(This is the retrospective date argument, which does not stand up to scrutiny at all. It could only have occured if someone programmed it to happen. Who would do such a thing, and why?)*
“Just a little calculation: if 2 billion pages were crawled between 4th of June and 15th December 2007 (195 days = 16,848,000 seconds) that would mean 2,000,000,000 / 16848000 seconds = 118.7 pages per second.
“So 3786 pages for ceop and 16033 pages for codexgeo all captured on 30.04.07 at 11:58:03 seems extraordinarily high.
“I don’t know what crawls Alexa or Internet Archive did prior to the 4th of June, but probably nothing like the ‘2 billion page’ scale. I also don’t know if ceop.gov.uk and codexgeo.co.uk were part of the same crawl.”
3786 pages for CEOP then and 16033 pages for codexgeo, all captured on 30.04.07 at 11:58:03
And all captured prior to the ‘big crawl’, therefore neither a product of it (see * above), nor influenced by it. Spontaneous retrospection by a machine is as real as spontaneous human combustion. It simply doesn’t happen.
Supposing no dramatic technological leap therefore between 30.4 and the ‘big crawl’, let’s further suppose a page capture rate of 118.7 URL’s per second for the earlier date. Better yet, let’s err on the side of caution and suppose that, to accompany the extraordinarily HIGH number of CEOP pages captured on 30.4, there was (by the then current standards) an extraordinarily SLOW rate of capture – one page/URL every 10 seconds, say.
3786 pages @ 10 secs each = 37860 seconds or 631 minutes or 10.5 hrs!
Whether these URLs were captured concurrently or consecutively, the first of them would have been correct come what may. One of them at least may be correctly attributed to 11.58.03 on 30 April, just as a broken clock tells the correct time twice every day.
Of course we do not know which one that might have been. But we do know that if, in extremis, it took 10.5 hours to gather them all, then the last one would have been entered at around 22.28, still on 30 April – never mind the fact that CEOP should not be expected to have had sight of any pictures or prepared any McCann related material for another three days!
“Ah, but the codex site suffered a similar ‘glitch’ with even larger numbers etc., etc.”
So what? The above (entirely hypothetical) scenario acknowledges the near-impossible likelihood of that very same class of error affecting the CEOP domain, and in as extreme a fashion as is likely ever to arise. The numbers are simply not big enough for the ‘error’ to have made any meaningful difference to the issue - that of premature data collation on the part of CEOP.
According to the WBM's own FAQs:
“There is a 3-10 hour lag time between the time a site is crawled and when it appears in the Wayback Machine.”
“Please note that there is a 6 - 14 month lag time between the date a site is crawled and the date it appears in the Wayback Machine.”
I think an eight year wait is long enough don’t you?
Kind regards
Martin R.
@ whodunnit. 19:21
The domain did not belong to the Dictionary of Scottish architects, but to Codex Geodata (based in Stirling) who designed databases and offered archiving services etc.
First Wayback capture I can find of www.codexgeo.co.uk is on 23rd December 2004.
It seems they designed and stored the Dictionary of Scottish Architects under the DSA directory.
Later the Dictionary of Scottish Architects moved to its own domain.http://www.scottisharchitects.org.uk/ --> first Wayback capture of this in July 2006.
On their website under acknowledgements they say "We are also indebted to Codex Geodata for its support in the form of data design and Web database development".
Codex Geodata did the same for Scottish Church Heritage Research (SCHR) which was filed under the SCHR directory on codexgeo until they moved to their own domain at http://www.scottishchurchheritage.org.uk/ --> first capture of this site in February 2007
Last capture of Codex Geodata on codexgeo.co.uk was on 18th May 2013
Then first capture of Payday loans on codexgeo.co.uk is on 30th September 2013
Seahorse
@Seahorse--You are quite right. Will re-group and get back to you. Scratch off an afternoon for me!
Cheers
whodunnit
Dr Roberts @ 21.39
Nailed once again, we should not lose sight of the issue statement which is the McCann html being found on 30/04 whether the other 3875 URLs are wrong doesn't matter. There are certainly some factors which point to only a certain number being true, I'm still of the opinion that the jpegs stand out. All jpegs suffered no repetitions (there's 3 in total the 2 Madeleine and 1 for Ceop awards).
Skyrocket1 has pored on comm another professors opinion, it doesn't help any as far as I can see but is worth a read (the thread is closed to guests now and they have lost some inputs)
Regards
https://github.com/internetarchive
The numbers are simply not big enough for the ‘error’ to have made any meaningful difference...
That has always been the case. And will always be the case, whichever type of "error" is postulated, whichever scenario is probed with jpegs etc. The numbers will always be on our side. Overwhelmingly so. Crushingly so.
But the license to hold to account upon that basis isn't ours. And the ability to obtain proof is beyond us. Tragically.
I could weep with frustration.
Hence. Just my opinion BTW. To those still trying. Huge respect
@HKP 00.22/Anonymous 08.05
Things are always a little clearer in the morning:
We now have two pages – two (assumed) errors, which must have occurred for the same reason (ref. Eddie and Keela). The probability of their not having done so would be inter-galactically small.
We may submit either page to definitive examination therefore (it is unnecessary to analyse both), since the pages are unquestionably independent of each other in every respect, save being indices of the same, solitary, (assumed) error.
Looking at the one of greater interest, the DATE of relevant record will have been unaffected by whatever the (assumed) error is supposed to have been (see foregoing, Seahorse-inspired, calculation) and remains to be explained, since it cannot be attributed to the (assumed) error in question.
“But we’ve said all along that THE DATE is the error”
1. Computers are never deliberately programmed to return a positive outcome for a null result. That is the essence of a conditional search (ask Google). Hence what is witnessed on date X is recorded on date X.
“We’ve said all along that THE DATE is the error”, since (and the following quotes ARE genuine)…
“What we DO know is that an October page ended up in the April folder”
Because…
“Something may have indexed into 20070430 115803 with a wrong date and time - trust me, computers do this because their stupid programmers don't think of everything for all circumstances”
1. Immediate (churlish) response: ‘May have’? – then you prove it did and we can reject the null hypothesis (the conventional baseline position for any genuine experiment).
2. Nor do database programmers set parameters which demand the random assignment of data in retrospect. The purpose of their overall objective would be completely compromised if that were the case.
Inadvertent non-random assignment, however, would be evidenced wherever the appropriate conditions were met – and that would imply its recurrence many, many times (twice only would reflect a near infinitely small probability), given an archive of 485 billion instances.
3. If 30 April 2007 capture is a false reading, then archive.org should shut up shop. Immediately.
They haven’t. So what might that tell us?
Case closed!
Kind regards all
Martin R.
@ al,
I didn't mean to muddy the waters.
As HKP says madeleine_01 and madeleine_02 do stand out.
So does mcCann.html.
Almost hidden amongst the numerous duplicates of the other pages.
There is a duplicate for madeleine_02.jpg though:
(S(beokrn453z22tm55hjfuox45))/madeleine_02.jpg
http://www.ceop.gov.uk/madeleine_02.jpg
I'm not technically minded enough to make sense of it all.
Best regards,
Seahorse
Martin,
But let's not diminish the achievement of what you have guided here. Screw "reputation", and screw people who are so bound to competitive theories that they are not prepared to see others make an attempt without damning them first.
Himself is too modest (oh yesssss he is!) He has said more than once that the only hope may be a "black door". We were presented with a door left ajar, and brave souls such as HKP, Seahorse and Mrs whodunnit (xx) have had the damned good sense to try and give it a push.
Inadvertent non-random assignment, however, would be evidenced wherever the appropriate conditions were met – and that would imply its recurrence many, many times (twice only would reflect a near infinitely small probability), given an archive of 485 billion instances.
ACROSS 8 YEARS OF NEAR CONTINUOUS OPERATION, SCRUTINY, CHECKING AND UPDATES.
(GitHub)
Thank you Dr Martin Roberts for highlighting how egos will never solve this sad and tragic affair!
@Seahorse 11:41
Muddy?! On the contrary my good man.
Had you not provided the data you did, I should never have entertained the possibility that 3786 URLS could be captured in a fraction over half-a-minute (based on your estimates). I simply defended your contention by extending the allocated time. You found the gem - all I did was polish it.
So now we have a situation in which the 'error' argument, whether general or specific, is simply untenable.
Furthermore, if we take into account Mr Chris Butler's last known admonishment ('Ask the police.' cf. 'Ask the dogs, Sandra.") we may but wonder why on earth the police should be interested in the 'housekeeping' of archive.org at all.
Making a genuine mistake, even a computational one, is not a Federal offence, is it?
Kind regards
Martin R.
(GitHub)
5 July 2015 at 11:53
Black door? Back door.
http://1.bp.blogspot.com/_w-8JKaTohe4/TQ4hRiqq36I/AAAAAAAAeaE/nWy3_UQmnFw/s1600/in_through_the_back_door.jpg
@ Dr Roberts, 12:13
Well thank you kind Sir. Though I'd prefer 'good woman' :-)
I am glad to have been of help. Fact finding is my thing but I am usually slow on the uptake of what those facts mean. So it's good to see that the brilliant minds of yourself, HKP and whodunnit can turn these facts into something far more comprehensible.
It's all becoming quite clear now!
Best regards,
Seahorse
@Seahorse 31.21
I do apologize. If you look back through comments past you will notice that it's not the first time I have been mistaken in that regard.
You are one of a valuable (and valued) trinity, together with Whodunnit and Resistor. One more and we'd have another 'Bletchley Circle'.
Many thanks for your contributions to the debate
Kind regards
Martin R.
@Anonymous (GitHub) 11:53
'Himself' has something of an Engineering background I believe. He will know without further explanation that the hands of a clock do not travel backwards unless coerced by those of a human.
He can 'use a spoon', you see. He ought also to patent his own 'bullshit detector' - as near infallible as the WBM!
btw: I recognise a number of analogies to this CEOP affair in the film, All The President's Men. Do watch it if ever you have the opportunity. (25 yrs. later and we witness the collapse of the twin towers under Republican president George Bush - Leopards? Spots?)
Kind regards
Martin R.
Himself,
Ha ha, yes! Back door!
Brain fried.
Very best wishes to you.
Himself 13:02
Good to know your finger is on the pulse
Respect
Martin R.
5 July 2015 at 14:06
'The arrow of time dictates that as each moment passes things change, and once these changes have happened, they are never undone.'
Scroll down for two short clips, the Arrow of Time the Second Law of Thermodynamics.
The Arrow of Time
http://onlyinamericablogging.blogspot.com/2011/08/arrow-of-time.html
Kudos to all of you! Dr. Martin, HKP, Agnos, Seahorse, Resistor..you're all such an inspiration!
I told my sister the other day I wish I hadn't gotten involved with this case. That old familiar feeling of depression and hopelessness I get whenever I delve into these issues was beginning to overwhelm me. But here, thankfully, I am not alone in what I 'see'.
'If I could turn back time..' I would save all the little children who have been abused and 'disappeared'.
Cheers
whodunnit
@Whodunnit 17.23
You must include yourself when writing out the place settings.
I have in fact prepared a little something for Monday’s breakfast tray. Goudi would no doubt have designed it somewhat differently, but then he never had the pleasure of listening to James Taylor. (I had something of his in my mind this morning, but can’t quite remember how it goes now - South, at a guess).
Do let me know if you get it won’t you? Hopefully we may then continue our congress in this matter.
Kind regards
Martin R.
@Seahorse 11:41
You are of course correct madeleine 02 appears twice, once in It's own url and the other with the designation you state. Interestingly madeleine 01 appears with this designation only. The shear randomness can also be demonstrated through these designations which repeat anything from single occurrence to 123 times (based on the general search not specific) You would have thought that some sort of trend would have been visible but apparently not. Mccann.html still stands to be disproven imo.
@Whodunnnit
'Agnos' commented just three days ago:
"...the "issue", it transpires, was not the issue at all."
Sadly I am led to concur.
'Breakfast' has therefore been postponed until further notice.
Kind regards
Martin R.
Just to be clear:
Although there may have been no intent to mislead, it was logical to assume that there is something wrong when an "archive's" records are in contradiction of such widely held accounts of particular events. This caused me to come to erroneous conclusions having not appreciated that not only does the "archive" not (necessarily) represent actual previous web content; nor, in this case, did it represent the appropriate time and date of capture.
For this reason it is my inderstanding that a person will never sustain a burden of proof in a court of law merely on archive.org - but that it might demonstrate circumstantial evidence.
This implies no intent whatsoever of creating false images.
Agnos
Good morning!
I haven't had enough time these last 24 hours to keep up--what has happened?
Cheers
Whodunnit
@Agnos 14.01
It is neither my wish nor my intention to drag you back into this debate, which I have to say has left me somewhat exhausted. However, I am not entirely clear whether the following represents a personal disclaimer of yours or a proxy excuse that might be advanced by others (parentheses mine):
"Although there may have been no intent to mislead, it was logical to assume that there is something wrong when an "archive's" records are in contradiction of such widely held accounts of particular events.
(this is the 'archive' contradicting what we have previously been told scenario)
"This caused me to come to erroneous conclusions having not appreciated that not only does the "archive" not (necessarily) represent actual previous web content; nor, in this case, did it represent the appropriate time and date of capture."
(appearances would suggest that not only is the retrieval of visual and other elements 'fluid', so too are their schedules of ingress).
Ironically I have just bemoaned to 'Himself' how opinions elsewhere appear to be governed by 'what looks right' (or wrong), with no appeal whatsoever to the functional requirements of the WBM as determined by its overseers.
I am not a database programmer myself (no more than many other voices in this case I suspect). Hence I am inclined to pay attention to others far more expert in such matters when they debate, albeit academically, aspects of database management of particular interest, e.g. the time-stamping of records and the prohibition, for purposes of data integrity/security, of data transfer.
It would be pompous of me to go on at length as I am not equipped to do so. I shall simply append a few links I found to be of interest when attempting recently to come to terms with the possibility even of backdating entries:
http://serverfault.com/questions/119614/what-possible-events-could-cause-a-mysql-database-to-revert-to-a-previous-state
https://en.wikipedia.org/wiki/Trusted_timestamping
https://www.anf.es/pdf/Haber_Stornetta.pdf
Finally, wait for it (and yes that is the link!):
https://books.google.co.uk/books?id=2cGPtTQ_eJMC&pg=PA27&lpg=PA27&dq=Does+a+timestamp+system+ever+backdate+records&source=bl&ots=pkyluFmyoq&sig=AkEqVK8HTozAdGLybBuhL0y81IM&hl=en&sa=X&ei=B3aaVcvDFca4-QHf6bHAAg&ved=0CEYQ6AEwBw#v=onepage&q=Does%20a%20timestamp%20system%20ever%20backdate%20records&f=false
Kind regards
Martin R.
@whodunnit
Hi, no mystery from me! I just thought it timely to clear the decks. We might yet meet for breakfast though!!
All the best,
Agnos!
Hi Martin,
You have read my mind! Don't worry about dragging me back. If there is any issue at all that you feel I might be able to help with, then be assured I am reading. The above disclaimer protects both sides I think. Should circumstantial evidence ever be called for...
I very nearly posted earlier today on this very issue of "timers". I 'll regroup and post a little later. It is a truly fascinating topic. Kids today have it easy as you will see!
Kind regards
Agnes
@Whodunnit 16.23
"What has happened?"
A: Not a lot. I set you an (inconsequential) puzzle earlier. Now Agnos has set me one, although he perhaps did not intend to do so.
I'm open to correction always, but until anyone can explain why certain of its programming should invoke a file that didn't exist, or other aspects attempt to assign data to a previously non-existent location (e.g. date and time), then reliability of the WBM remains the safer bet in my view.
That said, for reasons I would rather not elaborate, I suspect a full explanation of their 'error' will not be forthcoming from IA (if that's the correct abbreviation/acronym).
Kind regards
Martin R.
My "erroneous conclusions" were to have considered this to be an "archive" as I understand a proper archive to be. This appears to be archive.org's estimation too, since data can migrate in time. But we shall see! No greater mystery than that. Sorry if I confused things!
Agnos
@Agnos 16.33
Tiffany's perhaps?
Whilst I could manage the air fare I'd first have to renew my passport!
Truth being stranger than fiction, the possibility of our being unwitting spectators to something along the lines of 'The Pelican Brief' may at least be entertained. It only requires the substitution of protected land for protected sea-birds.
I'm strangely reminded of that long-standing group, Freeport Convention. Didn't they record 'Tell Me Lies?' (No, sorry, that was Fleetwood Mac). Silly me.
Kind regards
Martin R.
Dr. Roberts
A: Not a lot. I set you an (inconsequential) puzzle earlier. Now Agnos has set me one, although he perhaps did not intend to do so.
And I have failed to comprehend! My only excuse is my time is severely divided into manageable segments dictated by the health of others. So while it is indeed a hot, sunny day where I am, I sit at my computer, cool as a cucumber, only while I may.
I'm open to correction always, but until anyone can explain why certain of its programming should invoke a file that didn't exist, or other aspects attempt to assign data to a previously non-existent location (e.g. date and time), then reliability of the WBM remains the safer bet in my view
And mine as well.
Cheers
whodunnit
First another disclaimer. It is many years since my life was dominated by all things binary, and even then it was very much the theoretical ground that grabbed me (still to my bemusement). I still write code, but purely for amusement now. I use Python and some C. And like yourself, I am a little exhausted by this! And frustrated beyond belief.
Perhaps if I can contribute anything , it might be to simplify (?), though always risking that what I say might either be banal, or totally lacking point. The programmers at archive.org are obviously top drawer. The link I included yesterday (GitHub) is the repository of their opensource code for the entire archive.org operation. It is immense and it is best practice.
Here, for example, is an internal link to the repository that documents their server:
https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#advanced-usage
Database management is a rarefied field that would require inside knowledge. Movements and personnel access will be logged with an audit trail that only those with license to ask the pertinent questions can demand. Hence, the same old, same old frustration: why do they sit on their hands? As you say, it would be impossible not to be anything other than pompous to try and speculate about this. We don't know what gravity archive.org even place upon their web data. Do they consider it to be an archive as an academic might demand , or is it a "digital playground" (a suggestion I read on one digital forum!). Database systems for many large institutions (I believe) are very much "black boxed" affairs. They might be tailored for specific applications, with "in house" programmers having to learn how to address the "black box", whilst nevertheless remaining independent of it. Simple answer: we cannot know the relationship between timestamps and possible error.
The code that we see at GitHub is not database management per se. It is code addressing a database for the purpose of crawl, storage and retrieval, with a tailored system of indexing. That would be my understanding.
What we can know is what we have always known. What are the chances of so few data being dragged into the error. It is not jut a case of migrating content. Time and date are the basis of all WBM indexing and URL retrieval. How could it not imply the possibility of the entire system migrating! We come back to the same arguments.
Part 2 ahead:
Warning: Banality!
But what is a "timer error"? To me, it says little more than a "glitch". There is too much labrynthine speculation perhaps. "Time" is easy today! Is there any suggestion that it might have been the WBM clock that missed a beat, or that it glitched during the primary "capture"/crawl? Was the clock set wrong? Was it's data corrupted in transfer?
Here is the most banal piece of Python code imaginable (Py and Java are both object oriented languages) :
from time import *
x = localtime()
print x
No surprises!! I get the Year, Month, Day, Hour, Minute, Second....at the very moment I hit enter! time is a module that is native to the distribution of Python and common to ALL computers running it's code. Java has it's own variation. Here, for example, is the documentation for Python's handling of "time events":
https://docs.python.org/2/library/time.html
WBM and it's database will be handling some very complex tasks of timing: complex, yes; but "novel", no. As soon as I import the "time" module into some code (line 1 above) I have at my disposal a set of "black boxed" tools that can measure everything from processor speeds to the passage of millennia! There is nothing at all esoteric about it! It is industry standard and will return values of accuracy that are limited only by the power of the computer hosting the code. "Timers" do not go wrong. "Time" can be reset or overridden but this is exceptional and systemic. It would require intervention, and an archive would surely want to avoid such interventions!
This is just good old Python. Other systems and language distributions will have their own sets of tools. It is very complex, perhaps. But it is not esoteric.
If we are led to believe that having returned a date/time value, this value might have subsequently become corrupted, then I ask the same question: why in such rare instances. Every capture and every retrieval passes through the same code! The same margins of probable "error" must apply throughout.
I don't think I've helped! All that I can come back to is the very unsatisfactory "explanation" which appears to me to defy probability. However hard we try to constellate some kind of meaningful diagnosis of this "error", we remain on the outside, looking in.
Sorry I'm not able to help more. We probably all need some rest!!
Kind regards
Agnos
The above is part 2! Trying again!
A
Anonymous
7 July 2015 at 20:55
Send it the once only, for the most part I now have the spam folder permanently open, and shall do my best to rectify in the shortest possible time. Thanks.
Himself,
Ok thanks!
@whodunnit,
The legalese was just to be mindful. Of course we still know the odds.
For now...until Tiffanys!
Agnos
@Agnos 20.53/21.28
I've been away for a while, but just to say I got it the first time - as well as the second (with examples). In the real world litigants are allowed, through their legal representatives of course, to call 'expert witnesses' to examine such issues. I wonder what category of genuine 'expert' would dare disagree with you? An unemployed one perhaps.
Many thanks for the encapsulation.
The silence from 'Frisco is deafening. Have they heard of/from the Carlyle Group I wonder? )Just musing).
Kind regards
Martin R.
@Whodunnit 17.23
"And I have failed to comprehend! My only excuse is my time is severely divided into manageable segments dictated by the health of others. So while it is indeed a hot, sunny day where I am, I sit at my computer, cool as a cucumber, only while I may."
Don't be concerned in the slightest, except of course for the health of others. Mine was just a cryptic declaration of intent which, in the light of later realisations I have decided to modify for the time being.
I need to gauge the wind a little more precisely, that's all.
Kind regards
Martin R.
Dr. Roberts
Now I am very sorry if I somehow caused the modification of your intent, if for no other reason than the burning pain of my curiosity might overwhelm me. No matter, I'm certain your good sense and wisdom will govern whatever you do whenever you do it.
@Agnos---sometimes we have no choice but to be mindful of legalese.
Cheers
Whodunit
Curioser and curioser! Indeed. Lucky I am a mouse.
Glad to be singular.
“So much oppression, Can’t keep track of it no more.” R. Z.
Respect
Martin,
To quote you from above: Things are always a little clearer in the mornings.
We mustn't exhaust ourselves with this. @whodunnit has put a lot into perspective. I think that between us, we each know where our "flags are placed".
1. Archive.org are not under any obligation to provide any further explanation. Not to us.
2. It may well be that I/we are simply unable to grasp the nuance of the proposed "errors". But it appears to me that we are left with old chestnut of a card with PTO on both sides.
The computer appears wrong / There must be a "glitch".
It is the date that is wrong/There is an error with the time!
3. The fora debates, will, I believe, only exhaust us further. They are not going to prove anything. I for one, am going to consign them to history on this matter. The very fact that people are still struggling so contentiously with this, is testament only to how little archive.org have really said!
4. Is this a dead end?
5. If I read your nuance correctly, then perhaps not. But there is no hurry on that score. Whilst some people might still want to bury themselves in the technicalities, there might also be some merit in simply collating a little "portfolio" of concerns. If I read you correctly. It might lead nowhere, but who yet knows? (At least not quite yet)
Anyway, I will certainly keep reading, as per above.
Regards again, and as always to whodunnit!
Agnos
@Whodunnit 00.56
"Now I am very sorry if I somehow caused the modification of your intent"
Relax! You did no such thing. It was a self-inflicted postponement. I was a button press away from bringing our paradox to the attention of a significant other before something prompted me to check my stride - Nothing at all to do with the details of our discussion to date.
I'm with Agnos in 'holding' at this stage. We've enough in hand I believe.
Kind regards
Martin R.
@Agnos 9.53
Sage as ever. A 'time out' is called for and no mistake.
I should like just to watch the smoke drift for time and observe where it gathers.
As 'Himself' has noted, and not infrequently, it's not so much the questions (nor the answers, even though they might point in unexpected directions). It's the overarching 'Why?'
Knit that together and the elements will become clear in an instant.
Kind regards
Martin R.
@Anonymous 18:13
There are a good few captures for other sites on 30/04/07 within 3 minutes either side for Amazon.com and within minutes & hours before and after for Yahoo, CNN, Apple, YouTube etc proving that Wayback was working fine before and after the critical timestamp.
@Anonymous 18.13/HKP 19.32
Are we to suppose that this fandango will mark the end of the Wayback Machine's 'archive' being called upon for evidential support in various legal disputes from this point forward?
Personally, I suspect their reliability will progress unchallenged, thereby emphasising over time just how improbable was the once-in-a-lifetime 'error' occasioned by interaction with the CEOP web-site on 30 April 2007.
A couple of ostensibly unrelated remarks appear germane:
Kate McCann's "Police don't want a murder in Portugal" uttered early in September 2007 (when she failed to specify which police she had in mind).
And the more recent comment of John Tully's, when speaking on behalf of the Police Federation:
"...it's not surprising there is resentment of significant resources diverted to a case that has no apparent connection to London."
Appearances are so often deceptive, aren't they?
Draw enough tangents and the circle will eventually reveal itself.
Kind regards both
Martin R.
HKP 14 July 2015 at 19:32
Greetings
I take it that in the absence of indications to the contrary we are entitled to rely on a (crucial) proposition that both the auditable Wayback Machine timestamps and history bar tell us within explicit error margins the time of every capture. Therefore it is my view that the question as to the state of the Machine at the time of capture does not arise since every capture = ‘online’. I hope to be corrected if I am wrong.
Have you by any chance been using any of the tools at https://archive.org/help/wayback_api.php in your investigations? I will of course understand if you wish to keep your ‘toolbox’ private.
The following may be relevant to our little exchange.
With regard to the alleged datestamp ‘error’… It is my recollection that archive.org, having initially said little, suggested in their last response to a concerned user that he should ask the “police” and they (archive.org) have been silent since.
The “police” reference sounds a bit confrontational, doesn’t it? Discourteous? Perhaps, but not something to discourage the curious: general considerations of what is known often give rise to new discoveries and proofs. Let’s bear in mind that archive.org may be feeling rather uncomfortable with this CEOP ‘glitch’.
It so happens that the two forgoing paragraphs implicitly refer in part to a very interesting recent exchange on this thread between Agnos and Martin R. in which Agnos mentions a comment some software guys made to him.
Most grateful for your post.
Good wishes
If one thinks one has something to contribute, contribute one must (courteously)!
Having posted the above, could not help but notice on the screen that with minor manipulation 16 July 2015 at 16:45 contracts to 1635 1645.
Respect and good wishes to Agnos and Martin.
Martin’s 14 July 2015 at 20:05 post acknowledged and I intend to comment shortly.
Thank you, archive.org. Now we all can sit quietly, doing nothing.
“Sitting quietly, doing nothing,
Spring comes, and the grass grows by itself.”
Matsuo Basho
Dr. Roberts
Personally, I suspect their reliability will progress unchallenged, thereby emphasising over time just how improbable was the once-in-a-lifetime 'error' occasioned by interaction with the CEOP web-site on 30 April 2007.
Of course it will be ignored thus rendering the entire episode non-existent. Bolstered by the fact that not only have all external indicators/link to the page capture been removed but the internal coding on remaining pages have been manually manipulated to erase all traces of April 30, 2007.
Appearances are so often deceptive, aren't they?
Depressingly often, I'm afraid. My 'default setting' if you will is to distrust any pronouncements that are amplified in the media or via official channels unless or until they can be verified.
Cheers
whodunnit
@Anonymous 17.31
"Thank you, archive.org. Now we all can sit quietly, doing nothing."
My wife accused me of that only last week. I thought I was ahead of the game!
A 'quicky' to HKP:
In the days of yore, when execution of a computer program was a largely linear affair, a common 'bug' was that of an unintentional/unresolved 'loop', i.e. an unwarranted/ceaseless iteration.
I wonder if such a procedural flaw could possibly be behind those repeated URL archivals you've been looking at?
Just a thought.
Kind Regards
Martin R.
@ Anonymous 16:45
http://web.archive.org/cdx/search/cdx?url=archive.org&from=2010&to=2011
This is the example from Wayback CDX server API which has a link further down the page you specified so no mystery data. Of course all 30/04 data for ceop has been whooshed by now.
@ Martin R 19:46
That was one of my (many) thoughts somehow it repeated itself when finding a url but it's not as simple as that, the designations within the url were of a 'set' some of which contained 124 different URLs ie. training, vacancies, news articles etc. There again mcann.html only featured once! We're not going to get to the bottom of this without IA and it's clear this incident has been confined to the bowels of history, never to be mentioned again.
@HKP 16.7.15, 20.04
Thank you for those observations, and for all your diligent analyses.
Regards
Martin R.
HKP 16 July 2015 at 20:04
Most grateful for your post.
I take it that “mystery data” in your post refers to “your toolbox” in mine.
“We're not going to get to the bottom of this without IA and it's clear this incident has been confined to the bowels of history, never to be mentioned again.”
Let’s hope time will tell.
I can see you’ve done a lot of good work.
Good wishes
Hi, HKP.
I have just read the 79 (67+12) pages of posts at http://maddiemccannmystery.forumotion.co.uk/ ‘Re: CEOP show Maddie is missing on 30th April 2007’ to familiarise myself with the discussion in which you and ‘Resistor’ participated. The penultimate post seems to be yours (Hongkong Phooey on Sun Jul 12, 2015 9:43 pm). Did the discussion stop there or continued elsewhere?
Now I see even better that you’ve done a lot of good work, and I salute you.
Many thanks and good wishes.
Peace to the Little One
Good Day !!
I am Joshua Brown, Working as a Reputable, Legitimate & an accredited money
Lending company. I want to use this medium to inform you that we render reliable beneficiary
assistance as we'll be glad to offer loan at 2% interest rate to reliable individuals.
email:fidelitybankloanoffer4@gmail.com
Services Rendered include:
*Home Improvement
*Inventor Loans
*Car Loans
*Debt Consolidation Loan
Please write back if interested.
Upon Response, you'll be mailed a Loan application form to fill. (No social
security and no credit check, 100% Guaranteed!) I Look forward permitting me to
be of service to you. You can contact me via e-mail: fidelitybankloanoffer4@gmail.com
Yours Sincerely,
Joshua Brown
Thanks so much! Worked great
Business Directory Companies
Post a Comment