A good way to solve the software monopoly problem

stormbind said:
You would start by downloading the contents of DMOZ.org which is publicly available (whole, or in part) as an XML file. This is what Google and most other search engines do.

What is done after that varies from one search engine to another. There are some XML-based RDBMS available but you would probably parse it into some propietory format.

You could write a program to crawl the sites referenced, supplementing the content but if your net connection is fast enough that could be done in real-time during searches to get actually current results (none of that two-weeks out of date nonsense that Google returns ;)).

Current computers do not have the spare capacity to make light work of this, or mine certainly does not, but with the rate of HDD and CPU improvements I doubt it would be long until they do..


DMOZ only has 5,112,742 while google has 8,168,684,336 and yahoo has 20 billion (or so they say)
 
MarineCorps said:
DMOZ only has 5,112,742 while google has 8,168,684,336 and yahoo has 20 billion (or so they say)
stormbind said:
You could write a program to crawl the sites referenced, supplementing the content
The above is what Google does.

Yahoo is slightly different because it owns it's own directory, similar to DMOZ.
 
Aphex_Twin said:
@Stormbird
Not just storrage is the problem. If everyone crawled the net to build their own Google the Internet would crawl to a stand-still. A yet better idea would be hashed, distribuited indexes spread over diferent servers (how Kazaa/e-mule works). This eliminates the problem of having one central place to hold data and is actually more solid (potentially). Also, since there's no central space and no centralized authority running the show there could be no legally binding way to force information on and off. I think the know-how and the capabilities are already avalible for GNU - all it takes is some initiative...
That is a good idea :goodjob:

There will always be the robots.txt file for forcing data on/off as you put it.
 
MarineCorps said:
Still thats quite a leap.
Well, it is not technically difficult to do; just requires hefty resources and a measurable incentive.

Besides, DMOZ references the number of web sites in it's index. Google references the number of web pages in it's index. On average, how many pages does a web site have?
 
stormbind said:
Outlaw software patents. There are plenty of alternatives for PC, for those who want them. Examples include:


Hurd

and many, many more..

With add-ons, most can run common Microsoft and Macintosh applications with varying degrees of compatibility.


Though I have a LiveCD of the Hurd , I don't expect it to come out before Duke Nukem Forever . ;)
 
aneeshm said:
Though I have a LiveCD of the Hurd , I don't expect it to come out before Duke Nukem Forever . ;)

LOL. I stopped waiting for Duke Nukem Forever when I found out that apparently it uses a Steam-type compulsory product activation scheme ... wikipedia says it is Steam like so I'm assuming it's going to be required.
 
Don't they call it "Duke Nukem NeverEver"? ;)

The Hurd is an interesting concept (~ what OOP is to programming HURD promises to be for OS). Though attracting too little attention right now.
 
aneeshm said:
Though I have a LiveCD of the Hurd , I don't expect it to come out before Duke Nukem Forever . ;)
Still, it exists for those who want to use it. Afaik, it uses X11 so I'll scream and hide under a rock if it's ever a success.
 
stormbind said:
You would start by downloading the contents of DMOZ.org which is publicly available (whole, or in part) as an XML file. This is what Google and most other search engines do.

What is done after that varies from one search engine to another. There are some XML-based RDBMS available but you would probably parse it into some propietory format.

You could write a program to crawl the sites referenced, supplementing the content but if your net connection is fast enough that could be done in real-time during searches to get actually current results (none of that two-weeks out of date nonsense that Google returns ;)).
Firstly, someone still has to write software to search a database - Google could switch to providing that software (similar to how they do their desktop search now).

Secondly, as has been pointed out, DMOZ isn't anything on what Google has, and they're hardly going to give away their database for free. It's possible that with time that people could work on an open source database, but I don't see it happening for a while, and it would need money to maintain it to be reasonably up to date - it's going to be a long while before connections are fast enough that the average user can search the entire web in real time.

Having said that, I'm sure Google are well aware that they need to keep moving to stay in business. In the future, there'll be vastly more information on the Internet, such as videos, which will be a lot harder to categorise and search, both in terms of writing the software, and the processing power required. Imagine an image search that actually worked on recognising what was in the images, rather than the text in the websites they are on.
 
EzInKy said:
There is an easier way to solve the problem, don't use their products. Civ3 can be played on Linux, Mozilla is an awesome browser, MPlayer is a decent multimedia app, and OpenOffice can handle most Word files.

This is truth, but Microjunk doesn't want you to know that. There is little propaganda about the whole open-source thing, and that's why they are still the standard. Most people don't see any alternative to their products.

The major problem of open-source and GNU is that there is only a little "elite" that knows about them. If everyone knew that there is product for free in the web that does everything that Word does and is more stable then nobody would be using Word. Microsoft is trying to hide the Open-Source community from the rest of the world.
 
Back
Top Bottom