2013-04-07, 21:24
Hi There,
I think this is a great initiative, and it will be helpful in the final future state. I'm not a SQLlite expert by any means, but do have some experience in DB/query tuning - hopefully these ORCL/MSS concepts also apply here.
Couple of comments suggestions that may trigger some thoughts:
1) The number of joins on some of the views could be problematic, as previously stated. We would need to have indexes on the 'join' fields, and not necessarily the primary/surrogate keys.
2) In general we want to avoid upper functions in the 'where' clauses; this type of SQL will automatically result in a table scan (reducing performance). Better to do the upper on the actual variable before the SQL is fired; stuff like function-based indexes is an option also.
3) Beware of the 'monolithic view' or the 'one view fits all' scenario. Big fat views make it easier to code for the SQL writer (he/she only needs to know one view definition); however, as the view gets bigger over time, the performance cost of the joins outweighs the benefits.
4) Since we're in a 'read many write once' type of database, we can be agressive with the indexing, or even the denormalization for the data model for core user cases. My opinion: performance is way more important than disk space.
5) Assuming design changes go through, would existing data be migrated or recreated/rescraped?
It would be a good idea to get a few core test cases documented, so they can be traced to determine whether tuning changes help/hurt. What I would want to see is the actual SQL that is being fired (and not necessarily the view definitions), because the values in the bind variables are important.
Unsure about next steps, but I hope this helps. Excellent work!
Donger
I think this is a great initiative, and it will be helpful in the final future state. I'm not a SQLlite expert by any means, but do have some experience in DB/query tuning - hopefully these ORCL/MSS concepts also apply here.
Couple of comments suggestions that may trigger some thoughts:
1) The number of joins on some of the views could be problematic, as previously stated. We would need to have indexes on the 'join' fields, and not necessarily the primary/surrogate keys.
2) In general we want to avoid upper functions in the 'where' clauses; this type of SQL will automatically result in a table scan (reducing performance). Better to do the upper on the actual variable before the SQL is fired; stuff like function-based indexes is an option also.
3) Beware of the 'monolithic view' or the 'one view fits all' scenario. Big fat views make it easier to code for the SQL writer (he/she only needs to know one view definition); however, as the view gets bigger over time, the performance cost of the joins outweighs the benefits.
4) Since we're in a 'read many write once' type of database, we can be agressive with the indexing, or even the denormalization for the data model for core user cases. My opinion: performance is way more important than disk space.
5) Assuming design changes go through, would existing data be migrated or recreated/rescraped?
It would be a good idea to get a few core test cases documented, so they can be traced to determine whether tuning changes help/hurt. What I would want to see is the actual SQL that is being fired (and not necessarily the view definitions), because the values in the bind variables are important.
Unsure about next steps, but I hope this helps. Excellent work!
Donger