The thought struck me about a year ago – and then I promptly filed it in my mental recycle bin – but I have ressurected some thoughts about the next killer app: A tool that can do for photos what tools such as Word, Perl & Python do for text strings.
Think about it – Google has a nice image search, but it is incredibly weak. All picutures searched are really text searches: Looking at the image name, a description (where available) or the page context where the image is embedded.
Image search has not progressed beyond associated text searches (and, therefore, storage and/or organization of same is done by associated text means).
What I’m envsioning – and, no I’ve no idea of how to do this – is a tool that can catalog/identify a picture by its contents: It will be able to determine that a photo/graphic is an image of a vase, even if the image is on a page devoted to auto parts and is named doggie.jpg.
The implications for such a tool are enormous, not the least of which are the following:
- Image searches can be supplemented by textural tools – as today – but the primary search should be of the image itself. So a search for “vase” will find the doggie.jpg image.
- This means that images can be cataloged in some manner, and stored with this image-specific information. With this type of information – much like text information in a database – an image can, for example, resided anywhere on a user’s hard drive but be part of a “Family Pictures” album, which contains pictures from all over the hard drive. The organization will handled by the image metadata (potentially databased) and not by the system file stucture (i.e., no /my_pictures/family_pictures)
- This metadata will allow many-to-many relationships between images/grouping heirarchies. For example, there is no need to have a copy of the “mydog.jpg” for it to be included in two galleries (mypets and family_pictures, for example). While this can be handled with a database and some logic today – by assigning this image to two different buckets – the metadata will create the associations seamlessly.
- As the software improves, it can be made to learn, much like a Baysian spam filter. It could learn that I don’t want non-jpg images in certain galleries and so on.
OK, even beyond the tough-nut-to-crack of how to figure out what an image is by the image itself there are other obstacles.
- Text is text – the only weirdness is different alphabets. With Unicode, some overhead but doable. But there is no single Unicode for images: A jpeg is a vastly different beast than a vector graphic. And what of 2-D image vs. a 3-D one?
- If you treat an animated GIF or Flash graphic as a single graphic (do you? good question), we’ve added another dimension to the image: time. Again, how to capture and represent that?
- Following from above, MPEG, MOV and so on are essentially images with a time element. Will these be included in the tools to analyze graphics?
- How is a picture analyzed to get it’s data and metadata? Is the image’s text or binary code analyzed, or is the image somehow scanned to get its properties?
- How does this handle ASCII text? While a vestige of the text-only Web/Internet, it’s art that it actually text. And it makes no sense as text; how about as a text-based graphic?
- Will there emerge some standard representational container for image? I.e., an image is cataloged by color depth, type, image subject, dimensions (2-4) and so on? Probably, but this will probably only emerge after a couple of methods of gathering and storing the data have been in use for some time, and will lead to some sort of standards war, like the one DVD read/write format issues.
- What will the atomicity of the data be? For example, a picture of a football player. Will the data be able to see just the football player (potentially the location), or will it also recognize and store that this image has two arm, two legs, a football, and the jersey number “6”? These would be powerful filters, but tough to do, I assume.
I know little about processing of images – even the basics – so I could be barking up a dangerous tree, or I could be wishing for something that already, in some form, exists.
I know stenography is a well-studied field, and I would expect that a lot of what I’m seeing to be related to work in this field. However – again – I know little about it, so I may be preaching to the choir.
I guess that I’m just getting better and better at search/text manipulation and all that, and I’m getting to better understand what it possible and (currently) is not. This all just lights up my mental bulb, saying that the next avenue to work on in a similar manner is non-text data: graphics.
Just wait ’till it comes. It’ll rock your world.
And you read it here first….