Click here to Skip to main content
15,900,973 members
Please Sign up or sign in to vote.
3.00/5 (3 votes)
See more:
In the last few days I decided to create a new dictionary application, I searched the internet for how to implement database files and search algorithms but I found nothing

If you have any idea about how dictionaries are created and how a its database files are structured, please help !!

Note: I don't want to use DMS (data base management system) like MS SQL server or MySql, I just want to use text files or custom format file
Posted
Updated 25-Sep-11 9:19am
v2

Your question is far too vague; what do you mean by "a new dictionary application"? What type of dictionary, and what functions do you expect it to provide?
You then ask how to structure a database for a dictionary, and immediately follow that up by saying you don't want to use a database but a text file. I would suggest that a full language dictionary will not be very efficient unless it is using some form of database for quick lookup. Take a look at SQL Express Compact Edition[^] as a starting point.
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 25-Sep-11 12:32pm    
Efficiency is a bottleneck; please see my solution.
--SA
Yasser Sobhy 25-Sep-11 16:43pm    
thanks a lot,

when I said how to implement database files I didn't meant database management systems, but I meant a database file (a file that contains data, any file can contains data and can be a database for your application)

I would suggest that a full language dictionary will not be very efficient unless it is using some form of database for quick lookup.

No, there are a lot of dicts out there that doesn't use database files (here I mean DMS) and they use other special files
With all the approaches, the problem is performance of the application in part of assess to storage system and search for the word requested be the user. A good dictionary should show the close entries when a user just starts to print a work letter by letter and point to more and more precise position in the list of entries (or create such list of entries on the fly) as the user adds letters.

I have implemented such system and know that the bottleneck is the search in the storage system. To achieve required performance, the storage should better be local, presented in a binary form and indexed by the first letters in a special way. So, the storage should contain two or more separate storage parts: one for index (loaded in the memory) and another one is a complete dictionary. Index should point to the positions in the stream for each dictionary entry.

So, my comment to the XML storage suggested by DominicZA. There is a ready-to-use standard for it, called XDXF, see http://xdxf.sourceforge.net/[^]. You can find number of dictionaries in this format. (Real problem is not the software. The real barrier is to obtain actual data, the dictionary itself; there are too many words :-).)

As performance is an issue, I implement it the way the original XDXF format is also accepted, but it should be re-worked to a binary format and indexed, so on second use the indexed binary version could be used. Some implementations using XDXF directly exist, by I never tried them (you may want to find out them and try). Probably my requirements to performances are too high :-), but I really enjoy answers with the delay which no one can possibly notice.

—SA
 
Share this answer
 
Comments
Yasser Sobhy 25-Sep-11 16:54pm    
thanks a lot this what I meant the performance

<p style="background-color:#00FF00;">the storage should better be local, presented in a binary form and indexed by the first letters in a special way</p>
this exactly what I want

<p style="background-color:#00FF00;">I have implemented such system </p> can you provide me with any detailed information or if that solution isn't commercial can you give us a link to source code or implementation details

thanks
Sergey Alexandrovich Kryukov 25-Sep-11 21:38pm    
You are welcome. Unfortunately, I cannot provide this material in near future. It's done for my personal use and not prepared for publication, which I would not mind to do, but it my need too much time, taking into account that I have some other works waiting for publication.

Good luck, call again.
--SA
 
Share this answer
 
Comments
BillWoodruff 25-Sep-11 16:43pm    
I was just about to insert a comment referring to your articles, Mehdi :)
I searched the internet for how to implement database files and search algorithms but I found nothing

I can't believe that no one yet tried anything similar till now. There are two things, or you didn't searched properly or you didn't searched hardly enough.

What approaches did you tried till now? Are you having any particular problem in one of this approaches?

You question is too generic, it is like asking, guys, I need to build a house, where should I start?!?

Get a concert question and spend some time on describing your limitations, requirements, etc.

Cheers!
 
Share this answer
 
Comments
Yasser Sobhy 25-Sep-11 16:32pm    
what I mean is that I want to create a dictionary (English <> Arabic)

"I can't believe that no one yet tried anything similar till now. There are two things, or you didn't searched properly or you didn't searched hardly enough."

actually I searched and found some results but they are very poor
and gave me nothing, no one had explained how data files are structured or how to search for a word in these files when a user types a word

"You question is too generic" "Get a concert question and spend some time on describing your limitations, requirements, etc."

these will be described now, later in this page

"Cheers!"
thanks
I would recommend starting with XML to store your words. Depending on whether you are using C# or C++ will determine your next step!
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 25-Sep-11 12:31pm    
First, there is a standard, secondly, I would not like the performance. I have implemented a really fast dictionary (which is really fast on huge volumes and does not depend on dictionary size) -- please see my solution.
--SA
<b>what have been said by SAKryukov is what I really want</b>

1-A good dictionary should show the close entries when a user just starts to print a work letter by letter and point to more and more precise position in the list of entries (or create such list of entries on the fly) as the user adds letters.

2-the bottleneck is the search in the storage system. To achieve required performance

3-the storage should better be local, presented in a binary form and indexed by the first letters in a special way.

and for XML files and SQL compact the right answer is

(Real problem is not the software. The real barrier is to obtain actual data, the dictionary itself; there are too many words Smile | :) .)

thanks SAKryukov and well done :)
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900