|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionI've wanted to write an application that would allow me to execute queries against my music collection for some time now. Having been burning my CDs to MP3 and WMA for some years now, it contains thousands of songs, and I've been looking for something a bit more robust than what most music players I've used (like the Windows Media Player) allow. With the introduction of LINQ in .NET 3.5, this seemed like a good opportunity to create an app I've been thinking about for a while, and at the same time dig into a new technology. This set of code is the result. My ultimate goal is to create a set of libraries that can be used to allow for executing SQL queries against any data source that can represent itself in a relational, semi-relational, and tabular format. A future article will explore UpdateSince the initial posting of this article I've continued to toy around with this set code and have made some modifications and additions.
BackgroundThe first thing I came up against with LINQ and the vast majority of examples for its usage was they all assume implicit knowledge of the structure of the underlying data and compile time knowledge of the structure of the desired results. I'm sure this holds true for the vast majority of "line of business" applications where LINQ will really pay dividends. Creating a LINQ query to aggregate and manipulate a sales report or a product catalog page entails detailed knowledge of the data and the requirements of the resulting aggregates. Looking at LINQ to SQL, it generates SQL statements to match the runtime LINQ expression execution, and provides a very nice Object Relational Mapping tool that, in the past, required extensive infrastructure coding or third party tools. The compiled code wouldn't know the structure of the underlying data (other than that it consists of name value pairs) nor would it know what fields or aggregations were needed, as the exact data manipulation and result structure is defined at runtime, not compile time. Building up a LINQ statement like: EnumerableRowCollection<DataRow> query =
from order in orders.AsEnumerable()
where order.Field<bool>("OnlineOrderFlag") == true
orderby order.Field<decimal>("TotalDue")
select order;
requires knowing, at a minimum, the order and number of clauses in the Using the CodeThe structure of the code consists of a number of layers:
The UI interacts with the data provider layer much like it would with any other provider: DataSet data = new DataSet();
MusDbConnection connection = new MusDbConnection(
"root='C:\\Music';recurse=true;FileExtensions=wma,mp3");
connection.Open();
IDataAdapter adapter = new MusDataAdapter("SELECT * FROM media", connection);
adapter.Fill(data);
The A Note about the Unit TestsMost of the unit tests will fail when you run them from the attached code because they are written against a set of media files that, for obvious reasons, I haven't included with the download. They do provide basic examples of the SQL syntax that works, and should be pretty easy to point at some other set of media and made to pass again. FileDbProvider LayerThis assembly ties together execution with the file property readers with SQL parsing and execution. It operates much like other SQL data providers. The connection string is of the format name=value, with a semicolon separator like most other database connection formats. Supported connection string parameters are:
Aside from using the SqlLinq layer to process SQL statements, the other thing that is done in this assembly is providing an implementation of File.PropertyProviderThis assembly contains the base functionality for reading the properties of a file. The interface public interface IFilePropertyReader
{
string Namespace { get; }
bool Match(string filePath);
IEnumerable<KeyValuePair<string, object>> Read(string path,
IEnumerable<string> selectProperties);
IEnumerable<KeyValuePair<string, object>> Read(string path);
IEnumerable<KeyValuePair<string, Type>> GetFields();
}
In addition to implementing By way of example the class that basic file properties is declared like: [FilePropertyReader("*")]
[PropertyTableName("File")]
[TypePropertyReader(typeof(FileReflectedPropertyReader))]
class FilePropertyReader : PropertyReader
The Finally this assembly dynamically loads other file property readers at runtime. It does this by reflecting over and assembly that ends with .PropertyProvider.dll looking for classes that implement Media.PropertyProviderThis assembly contains an implementation of [FilePropertyReader("mp3")]
[FilePropertyReader("wma")]
[FilePropertyReader("asf")]
[FilePropertyReader("wmv")]
[PropertyTableName("Media")]
class MediaFilePropertyReader : IFilePropertyReader
I started with some simple code that could read ID3 v1.1 tags, but wanted something a bit more robust (ASF container formats, other ID3 tag versions etc.). A bit of searching lead to the Windows Media Format SDK, and now the retrieval of the actual tag values is all done using WMVCore.dll and the interfaces The meat of retrieving the actual tags has a basic Win32 flavor to it; calling methods twice to get buffer sizes, and then passing them back in to be filled, null terminated strings etc. If you've worked with Windows APIs at all, nothing too surprising with this piece. private static IWMHeaderInfo3 GetHeaderInfo(string path)
{
IWMMetadataEditor editor;
WMFSDKFunctions.WMCreateEditor(out editor);
IWMMetadataEditor2 editor2 = (IWMMetadataEditor2)editor;
editor2.OpenEx(path, FILE_ACCESS.GENERIC_READ, FILE_SHARE.FILE_SHARE_READ);
return (IWMHeaderInfo3)editor2;
}
private static void GetAllMediaTags(IDictionary<string, object> tags, string path)
{
IWMHeaderInfo3 headerInfo3 = GetHeaderInfo(path);
try
{
ushort wAttributeCount;
headerInfo3.GetAttributeCountEx(0, out wAttributeCount);
for (ushort wIndex = 0; wIndex < wAttributeCount; wIndex++)
{
WMT_ATTR_DATATYPE wAttribType;
string pwszName = null;
ushort wAttribNameLen = 0;
ushort pwLangIndex = 0;
uint pdwDataLength = 0;
// get the length of this attribute name
// and value in order to alloc the buffers
headerInfo3.GetAttributeByIndexEx(0,
wIndex,
pwszName,
ref wAttribNameLen,
out wAttribType,
out pwLangIndex,
null,
ref pdwDataLength);
pwszName = new String('\0', wAttribNameLen);
ReadAndAddAttribue(tags, headerInfo3, wIndex, pwszName, pdwDataLength);
}
}
finally
{
((IWMMetadataEditor)headerInfo3).Close();
Marshal.FinalReleaseComObject(headerInfo3);
headerInfo3 = null;
}
}
The tags from each media file are returned from the loader in an The SqlLinq LayerThis is where all the good LINQ-y stuff happens! (Well actually there is more SQL ParsingSQL parsing started with a very simplistic approach using what was little more than a rudimentary tokenizer. That got me as far as the simple What it does is take in the rules for an arbitrary language in BNF format and spit out a set of parse tables to be used later by a language specific state machine parse engine. There are even a number of engines available in multiple languages including C#. This code uses a C# engine by Vladimir Morozova, which is included in the download. There was even a specification for a simple SQL syntax available (extended slightly to add column aliases, boolean literals, and to allow Parsing proceeds from the leaves to the trunk, building up the parse tree along the way. As the parser traverses tokens and rules, it yields up a Reduction rule which corresponds to the various parts of the language. A custom attribute is used to map C# classes to SQL rules and instantiate them as the parser does its work. [SyntaxNode(RuleConstants.RULE_WHERECLAUSE_WHERE)]
public class WhereClause : NonTerminalNode
{
public WhereClause()
{
}
public IEnumerable<IDictionary<string, T>> Evaluate<T>(
IEnumerable<IDictionary<string, T>> source)
{
return source.Where(CreateEvaluationFunction<T>());
}
private Func<IDictionary<string, T>, bool> CreateEvaluationFunction<T>()
{
PredicateNode predicate = FindChild<PredicateNode>();
if (predicate != null)
return predicate.CreateEvaluationFunction<T>();
LiteralNode node = FindChild<LiteralNode>();
if (node != null)
return node.CreateEvaluationFunction<T>();
Debug.Assert(false);
return null;
}
}
static class SyntaxRuleFactory
{
private static TypeLoader<NonTerminalNode, int> _nodeImplTypeMap = LoadImplTypes();
public static NonTerminalNode CreateNode(Rule rule)
{
Debug.Assert(rule != null);
NonTerminalNode node = null;
if (_nodeImplTypeMap.Contains(rule.Index))
node = _nodeImplTypeMap.CreateInstance(rule.Index);
else
node = new NonTerminalNode();// if no type is bound to the rule then
// just create a base non-terminal node
node.Rule = rule;
return node;
}
private static IEnumerable<int> GetRuleIds(Type t)
{
return t.GetCustomAttributes(typeof(SyntaxNodeAttribute),
false).Select(attr => (int)((SyntaxNodeAttribute)attr).RuleConstant);
}
private static TypeLoader<NonTerminalNode, int> LoadImplTypes()
{
TypeLoader<NonTerminalNode, int> loader = new TypeLoader<NonTerminalNode, int>();
loader.SearchDirectories = false;
loader.LoadMany(GetRuleIds);
return loader;
}
}
A SQL statement that looks like <Select Stm> [Rule Id=RULE_SELECTSTM_SELECT Class=SelectStatement]
SELECT
<Columns> [Rule Id=RULE_COLUMNS_TIMES Class=Columns]
<Restriction> [Rule Id=RULE_RESTRICTION Class=NonTerminalNode]
*
<Into Clause> [Rule Id=RULE_INTOCLAUSE Class=NonTerminalNode]
<From Clause> [Rule Id=RULE_FROMCLAUSE_FROM Class=FromClause]
FROM
<Id Member> [Rule Id=RULE_IDMEMBER_ID Class=NodeWithId]
soundtrack
<Join Chain> [Rule Id=RULE_JOINCHAIN2 Class=NonTerminalNode]
<Where Clause> [Rule Id=RULE_WHERECLAUSE_WHERE Class=WhereClause]
WHERE
<Pred Exp> [Rule Id=RULE_PREDEXP_EQ Class=EqualityNode]
<Value> [Rule Id=RULE_VALUE_ID Class=NodeWithId]
bitrate
=
<Value> [Rule Id=RULE_VALUE_INTEGERLITERAL Class=IntegerLiteral]
128000
<Group Clause> [Rule Id=RULE_GROUPCLAUSE Class=NonTerminalNode]
<Having Clause> [Rule Id=RULE_HAVINGCLAUSE Class=NonTerminalNode]
<Order Clause> [Rule Id=RULE_ORDERCLAUSE Class=NonTerminalNode]
Once the parse tree is built, then it is just a matter of traversing it in order to generate the appropriate LINQ constructs or expressions appropriate to each node. For instance, in the public override IEnumerable<IDictionary<string, object>> Evaluate(
IEnumerable<IDictionary<string, object>> data)
{
// constrain results by where clause conditions
if (WhereClause != null)
data = WhereClause.Evaluate(data);
// calculate any aggregated values
data = EvaluateAggregates(data);
// constrain aggrated values by having conditions
if (HavingClause != null)
data = HavingClause.Evaluate(data);
// order the results
if (OrderByClause != null)
data = OrderByClause.Evaluate(data);
// and post process (remove any intermediate columns not specified in the
// select clause)
return PostProcessResults(data);
}
The public IEnumerable<IDictionary<string, object>>
Evaluate(IEnumerable<IDictionary<string, object>> source,
IEnumerable<AggregateNode> aggregates)
{
IList<IDictionary<string, object>> list = new List<IDictionary<string, object>>();
foreach (NodeWithId item in GroupByItems)
{
var groupBy = from d in source
group d by d.ContainsKey(item.LookupId) ?
d[item.LookupId] : DBNull.Value
into g
select g;
foreach (var g in groupBy)
{
IDictionary<string, object> dict =
new Dictionary<string, object>(StringComparer.InvariantCultureIgnoreCase);
dict.Add(item.LookupId, g.Key.ToString());
foreach (AggregateNode aggregate in aggregates)
dict.Add(aggregate.LookupId,
aggregate.Evaluate<object>(g.AsEnumerable()));
list.Add(dict);
}
}
return list;
}
public IEnumerable<NodeWithId> GroupByItems
{
get
{
return FindDescendants<NodeWithId>();
}
}
Building ExpressionsIn order to implement the public Func<IDictionary<string, T>, bool> CreateEvaluationFunction<T>()
{
ParameterExpression param = Expression.Parameter(typeof(IDictionary<string, object>),
"arg");
//traverse the tree to generate a lambda expression and then compile into a function
return Expression.Lambda<Func<IDictionary<string, object>, bool>>
(CreateOperatorExpression(param, GetLeftExpression(param)), param).Compile();
}
protected abstract Expression CreateOperatorExpression(ParameterExpression param,
Expression left);
Type CoercionBecause the data type of each field in the input data is not known when the expression is built, type coercion is somewhat lax. If an expression is being compared to an integer literal, for example, a private LiteralNode FindCoercionType(int index)
{
if (index != 0 && index != 2)
return null;
LiteralNode node = FindChild<LiteralNode>(OppositeSide(index));
// look at what the child operand is being compared to
if (node == null && (this.Index == 0 || this.Index == 2))
node = Parent.FindChild<LiteralNode>(OppositeSide(this.Index));
// look at what the whole expression is being compared to
// if we don't find any literals in the area, look for a predicate
// expression that can drive the type coercion
if (node == null)
{
PredicateNode predicate = FindChild<PredicateNode>(OppositeSide(index));
// look at what the child operand is being compared to
if (predicate == null &&
(this.Index == 0 || this.Index == 2))
predicate = Parent.FindChild<PredicateNode>(OppositeSide(this.Index));
// look at what the whole expression is being compared to
if (predicate != null)
node = predicate.GetExpressionType();
}
return node;
}
Indexing the DictionaryIn this example usage of dynamic expression building (media file tag collections), not every row in the input data will have the same set of fields. This is because not every media file has the same set of tags embedded within it. For this reason, an extra step is taken when indexing each dictionary, first checking private ConditionalExpression CreateDereferenceExpression(ParameterExpression param,
int index)
{
NodeWithId idNode = FindChild<NodeWithId>(index);
if (idNode != null)
{
// in order to avoid KeyNotFoundExceptions this will create an expression
// of the general form:
// if(dictionary.ContainsKey(key))
// return dictionary[key];
// else
// return NULL;
MethodInfo indexerMethod = typeof(IDictionary<string, object>).GetMethod(
"get_Item");
MethodInfo containsKeyMethod = typeof(IDictionary<string, object>).GetMethod(
"ContainsKey");
LiteralNode node = GetTypeCoercionNode(index); // this is used to coerce
// the value in the dictionary to
// the correct type for comparison
Expression key = Expression.Constant(idNode.EvaluationId);
Expression containsKey = Expression.Call(param, containsKeyMethod, key);
Expression returnValue = Expression.Convert(Expression.Call(param,
indexerMethod, key), node.ValueType, node.CoercionDelegate.Method);
Expression returnNull = Expression.Constant(node.NullRepresentation,
node.ValueType);
return Expression.Condition(containsKey, returnValue, returnNull);
}
return null;
}
Putting Them All TogetherWith all of the heavy lifting done in the [SyntaxNode(RuleConstants.RULE_PREDEXP_EQ)]
public class EqualityNode : PredicateNode
{
public EqualityNode()
{
}
protected override Expression CreateOperatorExpression(ParameterExpression param,
Expression left)
{
return Expression.Equal(left, GetRightExpression(param));
}
}
Retrieving and Evaluating DataWith the current implementation, evaluating a set of data is a two step process. First, the client code has to retrieve the larger set of data to be evaluated. It can use the parse tree to guide the retrieval, but the actual de-serialization out of the source data store exists entirely outside of the SqlLinq assembly. Second, it uses the SqlLinq code to evaluate the data which applies the logic for everything but the Points of InterestI learned a lot in writing this article; from how to get tags out of a media file, to the details parsing and parse trees, and then on to LINQ and Linq Expressions. For a procedural/OO guy like myself, LINQ took some time to wrap my head around. I hope to keep extending this as time allows and see what other sorts of data stores can be evaluated with textual SQL statements... Another point of interest is that the pluggable architecture makes it straight forward to add additional types of file queries. For instance the downloadable code includes an Assembly file property reader that can be used to find assemblies that match custom search criteria.
Without too much more work this could be turned into a very generically reusable file search and analysis application. Maybe as I continue to tinker with it that's what it will become. Things to Do
History
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||