Introduzione Con il rilascio di SQL Server 2016 Service Pack 1 la tecnologia in-memory Columnstore è ora disponibile anche in standard, Web e anche le edizioni Express e LocalDB. Oltre al vantaggio di solo 1 codice di base per mantenere, questo cambiamento nella politica diventerà anche una chiara storage su disco salvaspazio a causa della sua de-duplicazione e di compressione elevati rapporti di dati e, ultimo ma non meno importante, la sua anche un serio prestazioni delle query ad-hoc Booster la differenza principale tra i sapori SQL è quanto potenza della CPU e della memoria è assegnato a compiti come (ri) costruzione dell'Indice columnstore cluster. Per esempio: (. Tempo massimo 100 Processore del processo sqlservr) con la Standard Edition di un singolo core in uso e l'interrogazione di una CCI avviene con un massimo di 2 CPU (MAXDOP2), contro sfruttando tutte le CPU disponibili in Enterprise Edition. Costruire un columnstore indice cluster (CCI) con SQL Server 2016 Standard Edition: la costruzione di una CCI con tutti e 4 i core disponibili con SQL Server 2016 Enterprise Edition: I tempi di base per il carico 7.2 GB a 60 milioni di righe da un singolo file TPCH LineItem non mostra gran parte della una differenza tra i sapori quando Bulk inserendo i dati direttamente in nessuna delle due una tabella heap o un tavolo con un CCI la differenza diventato chiaro quando mettiamo a confronto il tempo necessario per costruire una CCI su un tavolo heap o la ricostruzione di un CCI: per riassumere, l'assoluto modo più veloce per avere i dati disponibili in una tabella con un indice columnstore cluster è quello di: carico nella mucchio costruire il CCI in seguito con SQL 2016 Ent. Ed. carico diretto in CCI Per le tabelle con un cluster columnstore indice già creato assicurarsi che lo streaming direttamente nel compressi Gruppi di righe per massimizzare il throughput. Per fare ciò, la dimensione del lotto di inserimento deve essere uguale o superiore a 100K righe (102400 per la precisione). lotti più piccoli saranno scritti in tabelle negozio delta compressi prima di essere tupla si trasferisce nei suoi segmenti finali compressi gruppo di righe, il che significa che SQL Server deve toccare i dati due volte: Ci sono varie opzioni per caricare i dati e andremo oltre l'usato più di frequente quelli, come il comando inserimento di massa, BCP e SSIS. Vediamo ciò che è necessario per ottenere migliori prestazioni e come monitorare 1) T-SQL BULK INSERT Cominciamo con un comando BULK INSERT: Controllo di avanzamento dei dati di carico per verificare il numero di righe che già ottenuto caricati CCI, anche quando l'opzione Tabella di blocco viene utilizzato, interrogare un nuovo DMV chiamato sys. dmdbcolumnstorerowgroupphysicalstats: Questa DMV si rivelano anche i possibili stati del Gruppo di risorse in modo più dettagliato durante il caricamento. Ci sono quattro possibili stati del Gruppo fila mentre il caricamento dei dati. Quando si vede il INVISBILE stato come nella foto qui sotto significa che i dati vengono compressi in un rowgroup. 0: INVISIBILE (rowgroup è in procinto di essere costruito a partire dai dati del negozio delta) 1: OPEN160160160160160160160 (rowgroup sta accettando nuovi record) 2: CLOSED160160160 (rowgroup è pieno, ma non ancora compresso dal processo Mover tupla) 3: COMPRESSED160 ( rowgroup è riempito e compresso). 4 TOMBSTONE160 (rowgroup è pronto per essere garbage collection e rimosso) Specificando la dimensione batch con un valore di 102400 o superiore si otterrà il massimo delle prestazioni e dei dati sarà possibile ottenere in streaming e direttamente compressa nella sua RG finale questo comportamento verrà visualizzato come compressa. È inoltre possibile controllare un DMV che ha ottenuto introdotto con SQL2014 a controllare lo Stato rowgroup, che è il sys. columnstorerowgroups DMV: Risultati delle prove di dati Inserimento di massa in una tabella con CCI tramite il comando Inserimento di massa può essere leggermente migliorata con l'aggiunta del Batchsize102400 e opzioni TABLOCK. Questo porta un miglioramento 8 del throughput. 2) utility BCP. exe Il BCP è ancora in uso piuttosto pesantemente in molti ambienti di produzione in modo vale la pena di controllare rapidamente: per impostazione predefinita, le senta BCP 1000 righe al momento di SQL Server. Il tempo necessario per caricare 7.2GB di dati tramite BCP: 530 secondi. or160 113K rowssec Lo stato rowgroup mostra NVISIBLE il che significa che con le impostazioni predefinite del Delta Store è in uso. Per assicurarsi che il comando BCP flussi di dati direttamente nelle RGs compressi si deve aggiungere l'opzione BatchSize B per un valore di almeno 102400. Ho eseguito vari test con lotti più grandi: fino a 1048576, ma il 102400 ha dato migliore me la risultato. BCP DB. dbo. LINEITEMCCI in F: TPCHlineitem. tbl S. - c - T - tquotquot - b 102400 h tablock Lo stato rowgroup ora mostra compressa che significa che bypassare il Delta Store e dati di flussi nelle RGs compressi: Risultato: il BCP completato in 457 secondi, o 133K righe al secondo o Durante il test ho notato che le impostazioni di default SSIS 2016 utilizzano dimensioni del buffer di memoria che può anche potenzialmente limitare la dimensione del lotto di diventare meno di 100K righe. Nell'esempio seguente si vede che i dati sbarcati nei negozi delta: gli stati RG sono chiusi ei campi deltastorehobtid sono popolati, il che significa che i negozi delta sono sfruttate. Questo è stato il momento di raggiungere e verificare con i miei colleghi che per fortuna hanno notato che questo e una soluzione è già lì (vedi: capacità di flusso di dati Buffer Auto dimensionamento avvantaggia il caricamento dei dati nel CCI). Per sfruttare appieno le funzionalità di streaming CCI è necessario aumentare le impostazioni maxRows BufferSize amp memoria predefinita: Modificare il questi in 10x valori più grandi: 8211 DefaultMaxBufferRows da 10000 in 1.024.000 e il più importante: 8211 DefaultBufferSize da 10485760 a 104.857.600. Nota: la nuova impostazione AutoAdjustBufferSize dovrebbe essere impostata su true quando si carica molto larghe righe di dati. Modificare anche i valori della scheda di destinazione: 8211 righe per batch: 160 da nessuno in 102400 8211 Massimo inserimento commettere dimensione: da 2147483647 a 102400 La parità funzionalità introdotta con SQL Server 2016 SP1 apre tutta una nuova gamma di possibilità di beneficiare Speriamo le procedure dettagliate di cui sopra aiuterà a massimo fuori inserimento di massa, BCP e le prestazioni SSIS durante il caricamento dei dati in un indice columnstore cluster Quale sarà il modo più veloce in assoluto per caricare i dati da un flatfile in una tabella in SQL Server 2016 molte cose sono cambiate da quando la mia iniziale post su questo argomento molti anni fa, ike l'introduzione di tabelle ottimizzate in memoria e gli indici della tabella aggiornabile columnstore. Anche l'elenco dei veicoli di trasporto dei dati tra cui scegliere è in crescita: oltre BCP, il comando T-SQL Inserimento di massa, SSIS come strumento di ETL e PowerShell ci sono alcune nuove aggiunte, come PolyBase, R script esterno o ADF. In questo post vorrei iniziare con il controllo quanto più veloce il nuovo amplificatore di lunga durata non durevoli in memoria le tabelle sta impostando la linea di base per questi test Im usando un Azure DS4V2 standard VM con 8 cores28 GB di RAM e 2 volumi HDD con memorizzazione nella cache dell'host RW abilitato. (Entrambi Luns forniscono 275 MBsec RW rendimento, anche se l'interfaccia grafica afferma un limite di 60MBsec). Ho generato un singolo di 60 milioni di row7.2 Gigabyte TPCH LineItem file flat come dati da caricare. Come base di riferimento per per l'uso per il confronto useremo il tempo necessario per caricare il file in una tabella Heap: questo normale comando di Inserimento di massa completa in 7 minuti con una media di 143K rowssec. Abilitazione del database di test per le tabelle di memoria ottimizzato la (in SQL20142016 Enterprise amp Developer Edition) ha introdotto in memoria le tabelle sono progettati per OLTP molto veloce con molte piccole transazioni e alta concorrenza, che è un tipo completamente diverso di carico di lavoro di massa l'inserimento, ma, semplicemente fuori curiositylets fare un tentativo ci sono 2 tipi di in-memory tabelle: tavoli durevoli e non durevoli. Quelli durevoli persisteranno dati su disco, quelli non durevoli Wont. Per abilitare questa opzione, dobbiamo fare un po 'di pulizia e di assegnare un volume del disco veloce per ospitare questi file. In primo luogo, modificare il database per attivare l'opzione MEMORYOPTIMIZEDDATA seguito con l'aggiunta di un percorso di file e filegroup che conterrà le tabelle di memoria ottimizzato Contiene: La terza cosa da fare è aggiungere un pool di memoria separato per l'istanza di SQL Server in modo che possa mantenere tutti i dati che verranno caricate in tabelle in memoria separato dal suo pool di memoria di default: Associazione di un database a un pool di memoria la procedura per definire un pool di memoria separata e di impegnare una banca dati ad esso sono elencati di seguito: pool di memoria extra sono gestiti tramite il Resource Governor SQL. Il 4 ° e ultimo passo è quello di associare il database di test per il nuovo pool di memoria con il command.160 sys. spxtpbinddbresourcepool Affinché il binding per diventare efficace, dobbiamo prendere il database non in linea e riportarlo in linea. Una volta legato siamo in grado di cambiare dinamicamente la quantità di memoria assegnata alla piscina tramite il ALTER pool di risorse PoolHk con il comando (MAXMEMORYPERCENT 80). Inserimento di massa in durevole tabella in memoria Ora ci sono tutti impostati con l'opzione In-memory abilitato, siamo in grado di creare una tabella in memoria. Ogni tabella di memoria ottimizzato deve avere almeno un indice (un Range - o indice Hash) che sono completamente (ri) composta in memoria e non viene mai memorizzato sul disco. Una tabella durevole deve avere una chiave primaria dichiarata, che potrebbe poi essere supportato dall'indice richiesto. Per supportare una chiave primaria ho aggiunto una colonna ROWID1 RowNumber in più al tavolo: Specifica di una dimensione del lotto di 1 (fino a 5) milioni di righe al comando grosso inserto aiuta a persistere dati su disco, mentre l'inserimento di massa è in corso (al posto di risparmio il tutto alla fine) così facendo riduce al minimo la pressione di memoria sul pool di memoria PookHK abbiamo creato. Il carico dati nella durevole tabella in memoria completa in 5 minuti 28 secondi, o 183K Rowssec. Quello è un tempo va bene, ma non così tanto più velocemente della nostra linea di base. Guardando le sys. dmoswaitstats mostra che il waitstat n ° 1 è IMPPROVIOWAIT che si verifica quando SQL Server attende per un carico di massa IO per terminare. Guardando il contatore delle prestazioni Bulk Copy Rowssec e scrittura del disco Bytessec mostra il lavaggio a picchi del disco di 275 MBsec una volta una partita ricevuti in (le punte verdi). Questo è il massimo di quello che il disco può fornire, ma doesnt spiegare tutto. Dato il guadagno minore, ci sarà parcheggiare questo per future indagini. Monitorare il pool di memoria Via le sys. dmresourcegovernorresourcepools DMV possiamo controllare se la nostra tabella in memoria sfrutta la recente creazione di memoria Pool PoolHK: L'output mostra questo è il caso del 7.2GB (alcuni extra per l'identificativo) ha ottenuto non compressi caricati nella memoria poolHk piscina: Se si tenta di caricare più dati di quanto si dispone di memoria disponibile per il pool si otterrà un messaggio di corretta come questo: l'istruzione è stata terminata. Msg 701, livello 17, stato 103, linea 5 è insufficiente memoria di sistema in pool di risorse 8216PookHK per eseguire la query. Per cercare un livello più profondo alla allocazione dello spazio di memoria su una base per In-memory base tavolo è possibile eseguire la seguente query (tratto dal SQL Server in memoria OLTP Internals per SQL Server 2016 del documento): I dati che abbiamo appena caricato viene memorizzato come struttura varheap con un indice di hash: Fin qui tutto bene Ora lascia andare avanti e verificare come messa in scena in una tabella non durevoli esegue inserimento di massa in non durevoli tabella in memoria Per le tabelle IMND non abbiamo bisogno di una chiave primaria e quindi abbiamo appena aggiungi e l'indice hash non-cluster e impostare DURATA schemaonly. Il carico di inserimento dei dati di massa nella tabella non durevoli completa entro 3 minuti con un throughput di 335K rowssec (vs 7 minuti) Questo è 2.3x più veloce l'inserimento in una tabella heap. Per la messa in scena dei dati questo sicuramente una rapida vittoria SSIS singolo Inserimento di massa in una tabella non durevoli Tradizionalmente SSIS è il modo più veloce per caricare un file rapidamente in SQL Server perché SSIS gestirà tutta la pre-elaborazione dei dati in modo che il motore di SQL Server può spendere la sua CPU zecche sulla persistenza dei dati su disco. Sarà questo ancora il caso in cui l'inserimento dei dati in una tabella non durevoli Di seguito una sintesi dei test mi sono imbattuto con SSIS per questo post: l'opzione SSIS Fastparse and160 le DefaultBufferMaxRows e le impostazioni DefaultBufferSize sono i principali booster prestazioni. Anche il (SQLOLEDB.1) fornitore di OLE DB nativo esegue leggermente migliore rispetto al SQL Native Client (SQLNCLI11.1). Quando si esegue SSIS e SQL Server fianco a fianco, aumentando la dimensione del pacchetto di rete non sta needed.160160 Risultato netto: un pacchetto SSIS di base che legge una fonte di file flat e scrive i dati direttamente al tavolo non durevoli tramite una destinazione OLE DB esegue simile a quello del comando di inserimento di massa in una tabella IMND: i 60 milioni di righe vengono caricati in 2 minuti 59 secondi o 335K rowssec, identico al comando inserimento di massa. SSIS con Balanced Distributore dati, ma wait8230160 le tabelle in memoria sono progettati per lavorare amp blocco scrocco libera quindi questo significa che siamo in grado di caricare i dati anche tramite i flussi multipli che è facile da raggiungere con SSIS il Distributore dati Balanced porterà solo che (BDD è elencato nella sezione comune del SSIS Toolbox) Aggiunta del componente BDD e l'inserimento dei dati nella stessa tabella non durevoli con 3 flussi fornisce il miglior rendimento: siamo ora fino a 526000 Rowssec Guardando questa linea molto piatta, con solo 160 del tempo CPU utilizzata da SQLServer, sembra stiamo colpendo un po 'collo di bottiglia: ho cercato in fretta di essere creativi sfruttando la funzione modulo e aggiunto più 2 i flussi di dati all'interno del pacchetto (ogni elaborazione 13 dei dati) 160, ma quello non è il miglioramento molto (1 min52sec) quindi un ottimo argomento per indagare per un futuro post160160 L'opzione tabella in memoria non durevoli porta alcuni gravi miglioramento delle prestazioni per la messa in scena di 1.5x dati caricamento dei dati più veloce con un inserimento di massa regolare e fino a 3,6x volte più veloce con SSIS. Questa opzione, in primo luogo progettato per accelerare OLTP, può anche fare una grande differenza per ridurre la vostra finestra batch in modo rapido (Continua) La maggior parte delle persone hanno familiarità con la frase, quotthis ucciderà due piccioni con una stonequot. Se you39re contrario, la fase si riferisce ad un approccio che risolve due obiettivi in una sola azione. (Purtroppo, l'espressione stessa è piuttosto sgradevole, come la maggior parte di noi don39t vuole lanciare pietre contro gli animali innocenti) Oggi I39m andando a coprire alcune nozioni di base su due grandi funzionalità di SQL Server: l'indice Columnstore (disponibile solo in SQL Server Enterprise) e Store query SQL. Microsoft effettivamente implementata l'indice Columnstore in SQL 2012 Enterprise, anche se they39ve rafforzata nelle ultime due versioni di SQL Server. Microsoft ha introdotto Store query in SQL Server 2016. Quindi, quali sono queste caratteristiche e perché sono importanti Beh, ho un demo che introdurrà sia le caratteristiche e mostrare il modo in cui ci possono aiutare. Prima di andare avanti, io copro anche questo (e altri SQL 2016 caratteristiche) nel mio articolo CODE Magazine su nuove funzionalità di SQL 2016. Come un'introduzione di base, l'indice Columnstore può contribuire ad accelerare le query che scanaggregate su grandi quantità di dati, e Store Query tracce esecuzioni di query, piani di esecuzione e le statistiche di runtime che you39d normalmente hanno bisogno di raccogliere manualmente. Fidati di me quando dico, questi sono grandi caratteristiche. Per questa demo, I39ll essere utilizzando il database demo di Microsoft Contoso dati di vendita al dettaglio Warehouse. In parole povere, Contoso DW è come quota davvero grande AdventureWorksquot, con le tabelle contenenti milioni di righe. (Il più grande tabella AdventureWorks contiene circa 100.000 righe al massimo). È possibile scaricare il database Contoso DW qui: microsoften-usdownloaddetails. aspxid18279. Contoso DW funziona molto bene quando si desidera verificare le prestazioni su query sulle tabelle di grandi dimensioni. Contoso DW contiene una tabella dei fatti data warehouse standard chiamato FactOnLineSales, con 12,6 milioni di righe. That39s certamente non il più grande tabella di data warehouse del mondo, ma non it39s child39s giocare sia. Supponiamo che io voglio riassumere importo delle vendite di prodotti per il 2009, e classificare i prodotti. Potrei interrogare la tabella dei fatti e unirsi al tavolo Dimensione del prodotto e utilizzare una funzione RANK, in questo modo: Here39s un set di risultati parziali delle prime 10 righe, con vendite totali. Sul mio computer portatile (i7, 16 GB di RAM), la query richiede dovunque da 3-4 secondi per l'esecuzione. Questo potrebbe non sembrare la fine del mondo, ma alcuni utenti potrebbe aspettare risultati quasi immediati (il modo in cui è possibile vedere i risultati quasi immediati quando si utilizza Excel contro un cubo OLAP). L'unico indice momento ho su questo tavolo è un indice cluster su una chiave di vendita. Se guardo il piano di esecuzione, SQL Server offre un suggerimento per aggiungere un indice di copertura al tavolo: Ora, proprio perché SQL Server suggerisce un indice, doesn39t significa che si dovrebbe creare ciecamente indici su ogni quotmissing messaggio indexquot. Tuttavia, in questo caso, SQL Server rileva che stiamo filtrando sulla base di anno, e utilizzando la quantità chiave e vendite dei prodotti. Quindi, SQL Server suggerisce un indice di copertura, con la DateKey come campo chiave dell'indice. Il motivo per cui chiamiamo questo un indice quotcoveringquot è perché SQL Server quotbring lungo la fieldsquot non chiave che abbiamo usato nella query, quotfor il ridequot. Questo doesn39t modo, SQL Server è necessario utilizzare la tabella o l'indice cluster a tutti il motore di database può semplicemente utilizzare l'indice di copertura per la query. indici che coprono sono popolari in alcuni data warehousing e scenari di database di report, anche se essi hanno un costo del motore di database mantenimento delle stesse. Nota: copertura indici sono stati intorno per un lungo periodo di tempo, così ho haven39t ancora coperto l'indice Columnstore e l'archivio di query. Quindi, vorrei aggiungere l'indice di copertura: Se ri-eseguire la stessa query ho incontrato poco fa (quello che aggregata l'importo delle vendite per ogni prodotto), la query a volte sembra correre su un secondo più veloce, e ho un diverso piano di esecuzione, uno che utilizza un Index Seek invece di un indice di scansione (utilizzando il tasto data sul indice di copertura per recuperare le vendite per il 2009). Quindi, prima della Index Columnstore, questo potrebbe essere un modo per ottimizzare la query nelle versioni più vecchie di SQL Server. Si corre un po 'più veloce rispetto al primo, e ottenere un piano di esecuzione con un Indice Seek invece di un indice di scansione. Tuttavia, ci sono alcuni problemi: I due operatori di esecuzione quotIndex Seekquot e quotHash partita (aggregato) quot sia essenzialmente operano quotrow da rowquot. Immaginate questo in una tabella con centinaia di milioni di righe. Correlati, pensare il contenuto di una tabella dei fatti: in questo caso, un valore chiave singola data Andor un valore chiave unico prodotto potrebbe essere ripetuto per centinaia di migliaia di righe (ricordate, la tabella dei fatti ha anche tasti per la geografia, la promozione, venditore , ecc) così, quando la riga di lavoro quotIndex Seekquot e quotHash Matchquot per riga, lo stanno facendo su valori che potrebbero essere ripetuti in molte altre righe. Questo è normalmente dove I39d Segue l'indice di SQL Server Columnstore, che offre uno scenario per migliorare le prestazioni di questa query in modi sorprendenti. Ma prima di farlo, let39s andare indietro nel tempo. Let39s risalgono all'anno 2010, quando Microsoft ha introdotto un add-in per Excel conosciuto come PowerPivot. Molte persone probabilmente ricordo di aver visto demo di PowerPivot per Excel, in cui un utente potrebbe leggere milioni di righe da un'origine dati esterna in Excel. PowerPivot sarebbe comprimere i dati, e di fornire un motore per creare tabelle pivot e grafici pivot che si sono esibiti a velocità strabilianti contro i dati compressi. PowerPivot utilizzato una tecnologia in-memory che Microsoft chiamato quotVertiPaqquot. Questa tecnologia in-memory in PowerPivot sarebbe fondamentalmente prendere affari duplicato valori chiave keyforeign e comprimerli fino a un singolo vettore. La tecnologia in-memory inoltre scanaggregate questi valori in parallelo, in blocchi di diverse centinaia alla volta. La linea di fondo è che Microsoft ha preparato una grande quantità di miglioramenti delle prestazioni in funzione VertiPaq in memoria per noi da usare, a destra, fuori dalla scatola proverbiale. Perché sto prendendo questa piccola passeggiata nostalgiche Perché in SQL Server 2012, Microsoft ha implementato una delle caratteristiche più importanti nella storia del loro motore di database: l'indice Columnstore. L'indice è in realtà un indice in un solo nome: è un modo per prendere una tabella SQL Server e creare un compresso, in memoria columnstore che comprime i valori chiave esterna duplicati fino a valori vettoriali singoli. Microsoft ha inoltre creato un nuovo pool di buffer di leggere questi valori vettoriali compressi in parallelo, creando il potenziale per enormi guadagni di prestazioni. Così, I39m andando a creare un indice columnstore sul tavolo, e I39ll vedere quanto meglio (e più efficiente) l'esecuzione della query, contro la query che va contro l'indice di copertura. Così, I39ll creare un duplicato di FactOnlineSales (I39ll chiamarla FactOnlineSalesDetailNCCS), e I39ll creare un indice columnstore sulla tabella duplicato in questo modo ho won39t interferire con la tabella originale e l'indice di copertura in alcun modo. Avanti, I39ll creare un indice columnstore sulla nuova tabella: Nota diverse cose: I39ve specificato diverse colonne chiave esterna, così come l'importo delle vendite. Ricordate che un indice columnstore non è come un tradizionale indice di riga-store. Non vi è alcun quotkeyquot. Stiamo semplicemente indicando quali Server colonne SQL dovrebbe comprimere e posto in un columnstore in memoria. Per usare l'analogia di PowerPivot per Excel quando creiamo un indice columnstore, we39re dicendo SQL Server per fare essenzialmente la stessa cosa che PowerPivot fatto quando abbiamo importato 20 milioni di righe in Excel utilizzando PowerPivot Quindi, I39ll rieseguire il interrogazione, questa volta usando la tabella FactOnlineSalesDetailNCCS duplicato che contiene l'indice columnstore. Questa interrogazione viene eseguito immediatamente in meno di un secondo. E posso anche dire che, anche se il tavolo aveva centinaia di milioni di righe, sarebbe ancora funzionare alla quotbat proverbiale di un eyelashquot. Potremmo guardare il piano di esecuzione (e in pochi istanti, noi), ma ora it39s tempo per coprire la funzione di query Store. Immaginate per un momento, che abbiamo fatto entrambe le query durante la notte: la query che ha utilizzato il tavolo FactOnlineSales regolare (con l'indice di copertura) e quindi la query che ha utilizzato il tavolo duplicato con l'indice Columnstore. Quando ci registriamo al mattino seguente, we39d piacerebbe vedere il piano di esecuzione per entrambe le query come hanno avuto luogo, così come le statistiche di esecuzione. In altre parole, we39d piace vedere le stesse statistiche che we39d in grado di vedere se abbiamo fatto entrambe le query in modo interattivo in SQL Management Studio, trasformato nel tempo e IO statistiche, e visto il piano di esecuzione subito dopo l'esecuzione della query. Ebbene, that39s ciò che il negozio di query ci permette di fare possiamo accendere (attiva) Query Store per un database, che innescherà SQL Server per l'esecuzione e pianificare le statistiche sulle query negozio in modo da poter visualizzare in un secondo momento. Così, I39m andando a consentire il deposito di query sul database Contoso con il seguente comando (e I39ll anche chiaro qualsiasi caching): Allora I39ll eseguire i due query (e quotpretendquot che li mi sono imbattuto ore fa): Ora let39s finta correvano ore fa. Secondo quello che ho detto, l'Archivio Query catturerà le statistiche di esecuzione. Allora, come faccio a visualizzarle Fortunatamente, that39s abbastanza facile. Se ho espandere il database Contoso DW, I39ll vedere una cartella Query Store. La query Store ha un enorme funzionalità e I39ll cercare di coprire gran parte di essa nei successivi post del blog. Ma per ora, voglio visualizzare le statistiche di esecuzione sulle due interrogazioni, ed esamina in particolare gli operatori di esecuzione per l'indice columnstore. Così I39ll tasto destro del mouse sulla risorsa Top consumo di query ed eseguire questa opzione. Questo mi dà un grafico come quello qui sotto, dove posso vedere il tempo di durata di esecuzione (in millisecondi) per tutte le query che sono stati eseguiti. In questo caso, Query 1 le query sulla tabella originale con l'indice di copertura, e Query 2 era contro il tavolo con l'indice columnstore. I numeri don39t trovarsi l'indice columnstore sovraperformato l'indice Tovaglie originale di un fattore di quasi il 7 a 1. posso cambiare la metrica di guardare il consumo di memoria, invece. In questo caso, si noti che interrogazione 2 (la query indice di columnstore) utilizzato molto di più memoria. Ciò dimostra chiaramente il motivo per cui l'indice columnstore rappresenta quotin-memoryquot tecnologia SQL Server carica l'intero indice columnstore in memoria, e utilizza un pool di buffer completamente diverso con operatori di esecuzione avanzate per elaborare l'indice. OK, così abbiamo alcuni grafici per visualizzare le statistiche di esecuzione possiamo vedere il piano di esecuzione (e operatori di esecuzione) associati a ogni esecuzione Sì, possiamo Se si fa clic sulla barra verticale per la query che ha utilizzato l'indice columnstore, you39ll vedere l'esecuzione piano di sotto. La prima cosa che vediamo è che SQL Server ha eseguito una scansione di indice columnstore, e che ha rappresentato quasi il 100 del costo della query. Si potrebbe dire, quotWait un minuto, la prima query ha utilizzato un indice di copertura ed ha effettuato un indice di ricerca così come può una scansione indice columnstore essere fasterquot That39s una domanda legittima, e there39s fortunatamente una risposta. Anche quando la prima query eseguito un indice cercano, ancora eseguito quotrow da rowquot. Se metto il mouse sopra l'operatore indice di scansione columnstore, vedo un suggerimento (come quella qui sotto), con un'impostazione importante: la modalità di esecuzione è LOTTO (in contrapposizione a remare che è quello che abbiamo avuto con la prima query utilizzando il. indice di copertura). Che modalità LOTTO ci dice che SQL Server sta elaborando i vettori compressi (per i valori chiave esterna che sono duplicati, come ad esempio il prodotto chiave e la chiave data) in lotti di quasi 1.000, in parallelo. Quindi, SQL Server è ancora in grado di elaborare l'indice columnstore molto più efficiente. Inoltre, se metto il mouse sopra il compito Hash Match (aggregato), vedo anche che SQL Server è aggregando l'indice columnstore utilizzando la modalità batch (anche se l'operatore stesso rappresenta una piccola percentuale del costo della query tale) Infine, potrebbe essere chiedendo, quotOK, quindi SQL Server comprime i valori nei dati, tratta i valori come vettori, e leggerli in blocchi di quasi mille valori in parallelo, ma la mia domanda voleva solo i dati per il 2009. quindi è Server SQL scansione sul intero set di dataquot Anche in questo caso, una buona domanda. La risposta è, quotNot reallyquot. Fortunatamente per noi, il nuovo pool di buffer indice columnstore svolge un'altra funzione chiamata eliminationquot quotsegment. In sostanza, SQL Server esaminerà i valori vettoriali per la colonna chiave data nell'indice columnstore, ed eliminare i segmenti che si trovano al di fuori del campo di applicazione del 2009. I39ll fermarsi qui. Nelle successive post del blog I39ll coprire sia l'indice columnstore e Query Conservare in modo più dettagliato. In sostanza, ciò che we39ve visto qui oggi è che l'indice Columnstore può accelerare in modo significativo le query che scanaggregate su grandi quantità di dati, e il negozio di query catturerà le esecuzioni di query e ci permettono di esaminare le statistiche di esecuzione e le prestazioni in seguito. Alla fine, we39d desidera produrre un set di risultati che mostra quanto segue. Notate tre cose: Le colonne essenzialmente ruotano tutti i possibili motivi di ritorno, dopo aver mostrato le vendite ammontano Il set di risultati contiene subtotali dalla settimana terminata (Domenica) data in tutti i clienti (in cui il Cliente è NULL) Il set di risultati contiene un totale complessivo riga (in cui il cliente e la data sono entrambi NULL) in primo luogo, prima di entrare in fine SQL abbiamo potuto usare la capacità pivotmatrix dinamica in SSRS. Avremmo semplicemente bisogno di combinare i due set di risultati da una colonna e allora potremmo nutrire i risultati al controllo della matrice di SSRS, che si diffonderà le ragioni di ritorno attraverso le colonne asse del rapporto. Tuttavia, non tutti usano SSRS (anche se la maggior parte delle persone dovrebbe). Ma anche allora, a volte gli sviluppatori hanno bisogno di consumare i set di risultati in qualcosa di diverso da uno strumento di reporting. Quindi, per questo esempio, let39s assumono vogliamo generare il set di risultati per una pagina di rete web e forse lo sviluppatore vuole quotstrip outquot le righe di totale parziale (dove mi hanno un valore ResultSetNum di 2 e 3) e metterli in una griglia di sintesi. linea in modo di fondo, abbiamo bisogno per generare l'output sopra direttamente da una stored procedure. E come una torsione settimana prossima aggiunta ci potrebbe essere di ritorno Motivo X e Y e Z. Così abbiamo don39t so quante ragioni ritorno ci potrebbe essere. Noi vogliamo che il semplice query per ruotare sui possibili valori distinti per Return Reason. Qui è dove il perno T-SQL ha una restrizione abbiamo bisogno di fornire i valori possibili. Dal momento che won39t sappiamo che fino a run-time, è necessario per generare la stringa di query in modo dinamico utilizzando il modello SQL dinamico. Il modello SQL dinamico consiste nel generare la sintassi, pezzo per pezzo, riporlo in una stringa, e quindi l'esecuzione della stringa alla fine. SQL dinamico può essere difficile, in quanto dobbiamo incorporare la sintassi all'interno di una stringa. Ma in questo caso, è la nostra unica vera opzione se vogliamo gestire un numero variabile di motivi di ritorno. I39ve sempre trovato che il modo migliore per creare una soluzione SQL dinamico è quello di capire che cosa il quotidealquot generato-query sarebbe alla fine (in questo caso, visti i motivi di ritorno siamo a conoscenza).e poi reverse-engineering facendo mettendo insieme una parte alla volta. E così, ecco il SQL cui abbiamo bisogno se sapessimo quei motivi di ritorno (da A a D) sono stati statici e non cambierebbe. La query esegue le seguenti operazioni: combina i dati provenienti SalesData con i dati ReturnData, dove abbiamo quothard-wirequot le vendite di parola come un tipo di azione costituiscono la tabella delle vendite, e quindi utilizzare la Ragione ritorno dal Data ritorno nella stessa colonna ActionType. Questo ci darà una colonna pulita ActionType su cui ruotare. Stiamo combinando le due istruzioni SELECT in un'espressione di tabella comune (CTE), che è fondamentalmente un subquery tabella derivata che poi utilizziamo nella dichiarazione successiva (a PIVOT) Una dichiarazione PERNO contro il CTE, che riassume i dollari per il tipo di azione essendo in uno dei possibili valori Action Type. Si noti che questo isn39t il set di risultati finale. Noi stiamo mettendo questo in un CTE che legge dal primo CTE. La ragione di questo è perché vogliamo fare più raggruppamenti alla fine. La dichiarazione finale SELECT, che legge dal PIVOTCTE, e li combina con una successiva interrogazione contro lo stesso PIVOTCTE, ma dove abbiamo anche implementare due raggruppamenti nei set funzione di raggruppamento in SQL 2008: raggruppamento per la Data di Week End (dbo. WeekEndingDate) raggruppamento per tutte le righe () quindi, se sapessimo con certezza che we39d mai avere più codici motivo di ritorno, che poi sarebbe la soluzione. Tuttavia, dobbiamo tenere conto di altri codici motivo. Quindi abbiamo bisogno di generare quella intera query di cui sopra come un unico grande stringa in cui costruiamo le possibili ragioni di ritorno come un elenco separato da virgole. I39m intenzione di mostrare l'intero codice T-SQL per generare (ed eseguire) la query desiderata. E poi I39ll rompere fuori in parti e spiegare ogni passaggio. Quindi, prima, here39s l'intero codice per generare dinamicamente quello I39ve ottenuto sopra. Ci sono fondamentalmente cinque passi che devono coprire. Passo 1 . sappiamo che da qualche parte nel mix, abbiamo bisogno di generare una stringa per questo nella query: SalesAmount, Reason A, Reason B, Reason C, Reason D0160016001600160 Quello che possiamo fare è costruita una temporanea un'espressione di tabella comune che unisce i quotSales cablati colonna Amountquot con la lista unica di possibili codici motivo. Una volta che abbiamo che in un CTE, possiamo usare la bella piccolo trucco di FOR XML PATH (3939) a crollare le righe in una singola stringa, mettere una virgola davanti a ogni riga che la query legge, e quindi utilizzare roba da sostituire la prima istanza di una virgola con uno spazio vuoto. Questo è un trucco che si può trovare in centinaia di blog SQL. So this first part builds a string called ActionString that we can use further down. Passo 2 . we also know that we39ll want to SUM the generatedpivoted reason columns, along with the standard sales column. So we39ll need a separate string for that, which I39ll call SUMSTRING. I39ll simply use the original ActionString, and then REPLACE the outer brackets with SUM syntax, plus the original brackets. Step 3: Now the real work begins. Using that original query as a model, we want to generate the original query (starting with the UNION of the two tables), but replacing any references to pivoted columns with the strings we dynamically generated above. Also, while not absolutely required, I39ve also created a variable to simply any carriage returnline feed combinations that we want to embed into the generated query (for readability). So we39ll construct the entire query into a variable called SQLPivotQuery. Fase 4. We continue constructing the query again, concatenating the syntax we can quothard-wirequot with the ActionSelectString (that we generated dynamically to hold all the possible return reason values) Step 5 . Finally, we39ll generate the final part of the Pivot Query, that reads from the 2 nd common table expression (PIVOTCTE, from the model above) and generates the final SELECT to read from the PIVOTCTE and combine it with a 2 nd read against PIVOTCTE to implement the grouping sets. Finally, we can quotexecutequot the string using the SQL system stored proc spexecuteSQL So hopefully you can see that the process to following for this type of effort is Determine what the final query would be, based on your current set of data and values (i. e. built a query model) Write the necessary T-SQL code to generate that query model as a string. Arguably the most important part is determining the unique set of values on which you39ll PIVOT, and then collapsing them into one string using the STUFF function and the FOR XML PATH(3939) trick So whats on my mind today Well, at least 13 items Two summers ago, I wrote a draft BDR that focused (in part) on the role of education and the value of a good liberal arts background not just for the software industry but even for other industries as well. One of the themes of this particular BDR emphasized a pivotal and enlightened viewpoint from renowned software architect Allen Holub regarding liberal arts. Ill (faithfully) paraphrase his message: he highlighted the parallels between programming and studying history, by reminding everyone that history is reading and writing (and Ill add, identifying patterns), and software development is also reading and writing (and again, identifying patterns). And so I wrote an opinion piece that focused on this and other related topics. But until today, I never got around to either publishingposting it. Every so often Id think of revising it, and Id even sit down for a few minutes and make some adjustments to it. But then life in general would get in the way and Id never finish it. So what changed A few weeks ago, fellow CoDe Magazine columnist and industry leader Ted Neward wrote a piece in his regular column, Managed Coder , that caught my attention. The title of the article is On Liberal Arts. and I highly recommend that everyone read it. Ted discusses the value of a liberal arts background, the false dichotomy between a liberal arts background and success in software development, and the need to writecommunicate well. He talks about some of his own past encounters with HR personnel management regarding his educational background. He also emphasizes the need to accept and adapt to changes in our industry, as well as the hallmarks of a successful software professional (being reliable, planning ahead, and learning to get past initial conflict with other team members). So its a great read, as are Teds other CoDe articles and blog entries. It also got me back to thinking about my views on this (and other topics) as well, and finally motivated me to finish my own editorial. So, better late than never, here are my current Bakers Dozen of Reflections: I have a saying: Water freezes at 32 degrees . If youre in a trainingmentoring role, you might think youre doing everything in the world to help someone when in fact, theyre only feeling a temperature of 34 degrees and therefore things arent solidifying for them. Sometimes it takes just a little bit more effort or another ideachemical catalyst or a new perspective which means those with prior education can draw on different sources. Water freezes at 32 degrees . Some people can maintain high levels of concentration even with a room full of noisy people. Im not one of them occasionally I need some privacy to think through a critical issue. Some people describe this as you gotta learn to walk away from it. Stated another way, its a search for the rarefied air. This past week I spent hours in half-lit, quiet room with a whiteboard, until I fully understood a problem. It was only then that I could go talk with other developers about a solution. The message here isnt to preach how you should go about your business of solving problems but rather for everyone to know their strengths and what works, and use them to your advantage as much as possible. Some phrases are like fingernails on a chalkboard for me. Use it as a teaching moment is one. (Why is it like fingernails on a chalkboard Because if youre in a mentoring role, you should usually be in teaching moment mode anyway, however subtly). Heres another I cant really explain it in words, but I understand it. This might sound a bit cold, but if a person truly cant explain something in words, maybe they dont understand. Sure, a person can have a fuzzy sense of how something works I can bluff my way through describing how a digital camera works but the truth is that I dont really understand it all that well. There is a field of study known as epistemology (the study of knowledge). One of the fundamental bases of understanding whether its a camera or a design pattern - is the ability to establish context, to identify the chain of related events, the attributes of any components along the way, etc. Yes, understanding is sometimes very hard work, but diving into a topic and breaking it apart is worth the effort. Even those who eschew certification will acknowledge that the process of studying for certification tests will help to fill gaps in knowledge. A database manager is more likely to hire a database developer who can speak extemporaneously (and effortlessly) about transaction isolation levels and triggers, as opposed to someone who sort of knows about it but struggles to describe their usage. Theres another corollary here. Ted Neward recommends that developers take up public speaking, blogging, etc. I agree 100. The process of public speaking and blogging will practically force you to start thinking about topics and breaking down definitions that you might have otherwise taken for granted. A few years ago I thought I understood the T-SQL MERGE statement pretty well. But only after writing about it, speaking about, fielding questions from others who had perspectives that never occurred to me that my level of understanding increased exponentially. I know a story of a hiring manager who once interviewed an authordeveloper for a contract position. The hiring manager was contemptuous of publications in general, and barked at the applicant, So, if youre going to work here, would you rather be writing books or writing code Yes, Ill grant that in any industry there will be a few pure academics. But what the hiring manager missed was the opportunities for strengthening and sharpening skill sets. While cleaning out an old box of books, I came across a treasure from the 1980s: Programmers at Work. which contains interviews with a very young Bill Gates, Ray Ozzie, and other well-known names. Every interview and every insight is worth the price of the book. In my view, the most interesting interview was with Butler Lampson. who gave some powerful advice. To hell with computer literacy. Its absolutely ridiculous. Study mathematics. Learn to think. Read. Write. These things are of more enduring value. Learn how to prove theorems: A lot of evidence has accumulated over the centuries that suggests this skill is transferable to many other things. Butler speaks the truth . Ill add to that point learn how to play devils advocate against yourself. The more you can reality-check your own processes and work, the better off youll be. The great computer scientistauthor Allen Holub made the connection between software development and the liberal arts specifically, the subject of history. Here was his point: what is history Reading and writing. What is software development Among other things, reading and writing . I used to give my students T-SQL essay questions as practice tests. One student joked that I acted more like a law professor. Well, just like Coach Donny Haskins said in the movie Glory Road, my way is hard. I firmly believe in a strong intellectual foundation for any profession. Just like applications can benefit from frameworks, individuals and their thought processes can benefit from human frameworks as well. Thats the fundamental basis of scholarship. There is a story that back in the 1970s, IBM expanded their recruiting efforts in the major universities by focusing on the best and brightest of liberal arts graduates. Even then they recognized that the best readers and writers might someday become strong programmersystems analysts. (Feel free to use that story to any HR-type who insists that a candidate must have a computer science degree) And speaking of history: if for no other reason, its important to remember the history of product releases if Im doing work at a client site thats still using SQL Server 2008 or even (gasp) SQL Server 2005, I have to remember what features were implemented in the versions over time. Ever have a favorite doctor whom you liked because heshe explained things in plain English, gave you the straight truth, and earned your trust to operate on you Those are mad skills . and are the result of experience and HARD WORK that take years and even decades to cultivate. There are no guarantees of job success focus on the facts, take a few calculated risks when youre sure you can see your way to the finish line, let the chips fall where they may, and never lose sight of being just like that doctor who earned your trust. Even though some days I fall short, I try to treat my client and their data as a doctor would treat patients. Even though a doctor makes more money There are many clichs I detest but heres one I dont hate: There is no such thing as a bad question. As a former instructor, one thing that drew my ire was hearing someone criticize another person for asking a supposedly, stupid question. A question indicates a person acknowledges they have some gap in knowledge theyre looking to fill. Yes, some questions are better worded than others, and some questions require additional framing before they can be answered. But the journey from forming a question to an answer is likely to generate an active mental process in others. There are all GOOD things. Many good and fruitful discussions originate with a stupid question. I work across the board in SSIS, SSAS, SSRS, MDX, PPS, SharePoint, Power BI, DAX all the tools in the Microsoft BI stack. I still write some code from time to time. But guess what I still spend so much time doing writing T-SQL code to profile data as part of the discovery process. All application developers should have good T-SQL chops. Ted Neward writes (correctly) about the need to adapt to technology changes. Ill add to that the need to adapt to clientemployer changes. Companies change business rules. Companies acquire other companies (or become the target of an acquisition). Companies make mistakes in communicating business requirements and specifications. Yes, we can sometimes play a role in helping to manage those changes and sometimes were the fly, not the windshield. These sometimes cause great pain for everyone, especially the I. T. people. This is why the term fact of life exists we have to deal with it. Just like no developer writes bug-free code every time, no I. T. person deals well with change every single time. One of the biggest struggles Ive had in my 28 years in this industry is showing patience and restraint when changes are flying from many different directions. Here is where my prior suggestion about searching for the rarified air can help. If you can manage to assimilate changes into your thought process, and without feeling overwhelmed, odds are youll be a significant asset. In the last 15 months Ive had to deal with a huge amount of professional change. Its been very difficult at times, but Ive resolved that change will be the norm and Ive tried to tweak my own habits as best I can to cope with frequent (and uncertain) change. Its hard, very hard. But as coach Jimmy Duggan said in the movie A League of Their Own: Of course its hard. If it wasnt hard, everyone would do it. The hard, is what makes it great . A powerful message. Theres been talk in the industry over the last few years about conduct at professional conferences (and conduct in the industry as a whole). Many respected writers have written very good editorials on the topic. Heres my input, for what its worth. Its a message to those individuals who have chosen to behave badly: Dude, it shouldnt be that hard to behave like an adult. A few years ago, CoDe Magazine Chief Editor Rod Paddock made some great points in an editorial about Codes of Conduct at conferences. Its definitely unfortunate to have to remind people of what they should expect out of themselves. But the problems go deeper. A few years ago I sat on a five-person panel (3 women, 2 men) at a community event on Women in Technology. The other male stated that men succeed in this industry because the Y chromosome gives men an advantage in areas of performance. The individual who made these remarks is a highly respected technology expert, and not some bozo making dongle remarks at a conference or sponsoring a programming contest where first prize is a date with a bikini model. Our world is becoming increasingly polarized (just watch the news for five minutes), sadly with emotion often winning over reason. Even in our industry, recently I heard someone in a position of responsibility bash software tool XYZ based on a ridiculous premise and then give false praise to a competing tool. So many opinions, so many arguments, but heres the key: before taking a stand, do your homework and get the facts . Sometimes both sides are partly rightor wrong. Theres only one way to determine: get the facts. As Robert Heinlein wrote, Facts are your single clue get the facts Of course, once you get the facts, the next step is to express them in a meaningful and even compelling way. Theres nothing wrong with using some emotion in an intellectual debate but it IS wrong to replace an intellectual debate with emotion and false agenda. A while back I faced resistance to SQL Server Analysis Services from someone who claimed the tool couldnt do feature XYZ. The specifics of XYZ dont matter here. I spent about two hours that evening working up a demo to cogently demonstrate the original claim was false. In that example, it worked. I cant swear it will always work, but to me thats the only way. Im old enough to remember life at a teen in the 1970s. Back then, when a person lost hisher job, (often) it was because the person just wasnt cutting the mustard. Fast-forward to today: a sad fact of life is that even talented people are now losing their jobs because of the changing economic conditions. Theres never a full-proof method for immunity, but now more than ever its critical to provide a high level of what I call the Three Vs (value, versatility, and velocity) for your employerclients. I might not always like working weekends or very late at night to do the proverbial work of two people but then I remember there are folks out there who would give anything to be working at 1 AM at night to feed their families and pay their bills. Always be yourselfyour BEST self. Some people need inspiration from time to time. Heres mine: the great sports movie, Glory Road. If youve never watched it, and even if youre not a sports fan I can almost guarantee youll be moved like never before. And Ill close with this. If you need some major motivation, Ill refer to a story from 2006. Jason McElwain, a high school student with autism, came off the bench to score twenty points in a high school basketball game in Rochester New York. Heres a great YouTube video. His mother said it all . This is the first moment Jason has ever succeeded and is proud of himself. I look at autism as the Berlin Wall. He cracked it. To anyone who wanted to attend my session at todays SQL Saturday event in DC I apologize that the session had to be cancelled. I hate to make excuses, but a combination of getting back late from Detroit (client trip), a car thats dead (blown head gasket), and some sudden health issues with my wife have made it impossible for me to attend. Back in August, I did the same session (ColumnStore Index) for PASS as a webinar. You can go to this link to access the video (itll be streamed, as all PASS videos are streamed) The link does require that you fill out your name and email address, but thats it. And then you can watch the video. Feel free to contact me if you have questions, at kgoffkevinsgoff November 15, 2013 Getting started with Windows Azure and creating SQL Databases in the cloud can be a bit daunting, especially if youve never tried out any of Microsofts cloud offerings. Fortunately, Ive created a webcast to help people get started. This is an absolute beginners guide to creating SQL Databases under Windows Azure. It assumes zero prior knowledge of Azure. You can go to the BDBI Webcasts of this website and check out my webcast (dated 11102013). Or you can just download the webcast videos right here: here is part 1 and here is part 2. You can also download the slide deck here. November 03, 2013 Topic this week: SQL Server Snapshot Isolation Levels, added in SQL Server 2005. To this day, there are still many SQL developers, many good SQL developers who either arent aware of this feature, or havent had time to look at it. Hopefully this information will help. Companion webcast will be uploaded in the next day look for it in the BDBI Webcasts section of this blog. October 26, 2013 Im going to start a weekly post of T-SQL tips, covering many different versions of SQL Server over the years Heres a challenge many developers face. Ill whittle it down to a very simple example, but one where the pattern applies to many situations. Suppose you have a stored procedure that receives a single vendor ID and updates the freight for all orders with that vendor id. create procedure dbo. UpdateVendorOrders update Purchasing. PurchaseOrderHeader set Freight Freight 1 where VendorID VendorID Now, suppose we need to run this for a set of vendor IDs. Today we might run it for three vendors, tomorrow for five vendors, the next day for 100 vendors. We want to pass in the vendor IDs. If youve worked with SQL Server, you can probably guess where Im going with this. The big question is how do we pass a variable number of Vendor IDs Or, stated more generally, how do we pass an array, or a table of keys, to a procedure Something along the lines of exec dbo. UpdateVendorOrders SomeListOfVendors Over the years, developers have come up with different methods: Going all the way back to SQL Server 2000, developers might create a comma-separated list of vendor keys, and pass the CSV list as a varchar to the procedure. The procedure would shred the CSV varchar variable into a table variable and then join the PurchaseOrderHeader table to that table variable (to update the Freight for just those vendors in the table). I wrote about this in CoDe Magazine back in early 2005 (code-magazinearticleprint. aspxquickid0503071ampprintmodetrue. Tip 3) In SQL Server 2005, you could actually create an XML string of the vendor IDs, pass the XML string to the procedure, and then use XQUERY to shred the XML as a table variable. I also wrote about this in CoDe Magazine back in 2007 (code-magazinearticleprint. aspxquickid0703041ampprintmodetrue. Tip 12)Also, some developers will populate a temp table ahead of time, and then reference the temp table inside the procedure. All of these certainly work, and developers have had to use these techniques before because for years there was NO WAY to directly pass a table to a SQL Server stored procedure. Until SQL Server 2008 when Microsoft implemented the table type. This FINALLY allowed developers to pass an actual table of rows to a stored procedure. Now, it does require a few steps. We cant just pass any old table to a procedure. It has to be a pre-defined type (a template). So lets suppose we always want to pass a set of integer keys to different procedures. One day it might be a list of vendor keys. Next day it might be a list of customer keys. So we can create a generic table type of keys, one that can be instantiated for customer keys, vendor keys, etc. CREATE TYPE IntKeysTT AS TABLE ( IntKey int NOT NULL ) So Ive created a Table Typecalled IntKeysTT . Its defined to have one column an IntKey. Nowsuppose I want to load it with Vendors who have a Credit Rating of 1..and then take that list of Vendor keys and pass it to a procedure: DECLARE VendorList IntKeysTT INSERT INTO VendorList SELECT BusinessEntityID from Purchasing. Vendor WHERE CreditRating 1 So, I now have a table type variable not just any table variable, but a table type variable (that I populated the same way I would populate a normal table variable). Its in server memory (unless it needs to spill to tempDB) and is therefore private to the connectionprocess. OK, can I pass it to the stored procedure now Well, not yet we need to modify the procedure to receive a table type. Heres the code: create procedure dbo. UpdateVendorOrdersFromTT IntKeysTT IntKeysTT READONLY update Purchasing. PurchaseOrderHeader set Freight Freight 1 FROM Purchasing. PurchaseOrderHeader JOIN IntKeysTT TempVendorList ON PurchaseOrderHeader. VendorID Te mpVendorList. IntKey Notice how the procedure receives the IntKeysTT table type as a Table Type (again, not just a regular table, but a table type). It also receives it as a READONLY parameter. You CANNOT modify the contents of this table type inside the procedure. Usually you wont want to you simply want to read from it. Well, now you can reference the table type as a parameter and then utilize it in the JOIN statement, as you would any other table variable. So there you have it. A bit of work to set up the table type, but in my view, definitely worth it. Additionally, if you pass values from , youre in luck. You can pass an ADO data table (with the same tablename property as the name of the Table Type) to the procedure. For developers who have had to pass CSV lists, XML strings, etc. to a procedure in the past, this is a huge benefit. Finally I want to talk about another approach people have used over the years. SQL Server Cursors. At the risk of sounding dogmatic, I strongly advise against Cursors, unless there is just no other way. Cursors are expensive operations in the server, For instance, someone might use a cursor approach and implement the solution this way: DECLARE VendorID int DECLARE dbcursor CURSOR FASTFORWARD FOR SELECT BusinessEntityID from Purchasing. Vendor where CreditRating 1 FETCH NEXT FROM dbcursor INTO VendorID WHILE FETCHSTATUS 0 EXEC dbo. UpdateVendorOrders VendorID FETCH NEXT FROM dbcursor INTO VendorID The best thing Ill say about this is that it works. And yes, getting something to work is a milestone. But getting something to work and getting something to work acceptably are two different things. Even if this process only takes 5-10 seconds to run, in those 5-10 seconds the cursor utilizes SQL Server resources quite heavily. Thats not a good idea in a large production environment. Additionally, the more the of rows in the cursor to fetch and the more the number of executions of the procedure, the slower it will be. When I ran both processes (the cursor approach and then the table type approach) against a small sampling of vendors (5 vendors), the processing times where 260 ms and 60 ms, respectively. So the table type approach was roughly 4 times faster. But then when I ran the 2 scenarios against a much larger of vendors (84 vendors), the different was staggering 6701 ms versus 207 ms, respectively. So the table type approach was roughly 32 times faster. Again, the CURSOR approach is definitely the least attractive approach. Even in SQL Server 2005, it would have been better to create a CSV list or an XML string (providing the number of keys could be stored in a scalar variable). But now that there is a Table Type feature in SQL Server 2008, you can achieve the objective with a feature thats more closely modeled to the way developers are thinking specifically, how do we pass a table to a procedure Now we have an answer Hope you find this feature help. Feel free to post a comment. SQL Server IO Performance Everything You Need To Consider SQL Server IO performance is crucial to overall performance. Access to data on disk is much slower than in memory, so getting the most out of local disk and SAN is essential. There is a lot of advice on the web and in books about SQL Server IO performance, but I havent found a single source listing everything to consider. This is my attempt to bring all the information together in one place. So here is a list of everything I can think of that can impact IO performance. I have ordered it starting at the physical disks and moving up the wire to the server and finally the code and database schema. Failed Disk When a drive fails in a disk array it will need to be replaced. The impact on performance before replacement depends on the storage array and RAID configuration used. RAID 5 and RAID 6 use distributed parity, and this parity is used to calculate the reads when a disk fails. Read performance loses the advantage of reading from multiple disks. This is also true, although to a lesser degree, on RAID 1 (mirrored) arrays. Reads lose the advantage of reading from multiple stripes for data on the failed disk, and writes may be slightly slower due to the increase in average seek time. Write Cache When a transaction is committed, the write to the transaction log has to complete before the transaction is marked as being committed. This is essential to ensure transactional integrity. It used to be that write cache was not recommended, but a lot of the latest storage arrays have battery-backed caches that are fully certified for use with SQL Server. If you have the option to vary the distribution of memory between read and write cache, try to allocate as much as possible to the write cache. This is because SQL Server performs its own read caching via the buffer pool, so any additional read cache on the disk controller has no benefit. Thin Provisioning Thin provisioning is a technology provided by some SANs whereby the actual disk storage used is just enough for the data, while appearing to the server to be full sized, with loads of free space. Where the total disk allocated to all servers exceeds the amount of physical storage, this is known as over-provisioning. Some SAN vendors try to claim that performance is not affected, but thats not always true. I saw this issue recently on a 3PAR array. Sequential reads were significantly slower on thin provisioned LUNs. Switching to thick provisioned LUNs more than doubled the sequential read throughput. Where Are The Disks Are they where you think they are It is perfectly possible to be connected to a storage array, but for the IO requests to pass through that array to another. This is sometimes done as a cheap way to increase disk space - using existing hardware that is being underutilized is less costly than purchasing more disks. The trouble is that this introduces yet another component into the path and is detrimental to performance - and the DBA may not even be aware of it. Make sure you know how the SAN is configured. Smart Tiering This is called different things by different vendors. The storage array will consist of two or more types of disk, of varying performance and cost. There are the slower 10K disks - these are the cheapest. Then you have the 15K disks. These are faster but more expensive. And then there may be some super-fast SSDs. These are even more expensive, although the price is coming down. Smart tiering migrates data between tiers so that more commonly accessed data is on the faster storage while less commonly used data drops down to the slower storage. This is OK in principle, but you are the DBA. You should already know which data needs to be accessed quickly and which can be slower. Do you really want an algorithm making this decision for you And regular maintenance tasks can confuse the whole thing anyway. Consider a load of index rebuilds running overnight. Lets suppose the last database to be processed is an archive database - do you want this is to be hogging the SSD when the users login first thing in the morning, while the mission critical database is languishing down in the bottom tier This is an oversimplification, of course. The tiering algorithms are more sophisticated than that, but my point stands. You should decide the priorities for your SQL Server data. Dont let the SAN vendors (or storage admins) persuade you otherwise. Storage Level Replication Storage level replication is a disaster recovery feature that copies block level data from the primary SAN to another - often located in a separate data center. The SAN vendors claim no impact on performance, and this is true if correctly configured. But I have seen poorly configured replication have a serious impact on performance. One client suffered a couple of years of poor IO performance. When I joined them I questioned whether the storage replication was responsible. I was told not to be so silly - the vendor has checked and it is not the problem - it must be SQL Server itself A few months later I was contacted again - they had turned off the replication while in the process of moving to a new data center and guess what Write latency improved by an order of magnitude. Let me repeat that this was caused by poor configuration and most storage replication does not noticeably affect performance. But its another thing to consider if youre struggling with SQL Server IO performance. Host Bus Adapters Check that the SAN and HBA firmware are compatible. Sometimes when a SAN is upgraded, the HBAs on the servers are overlooked. This can result in irregular errors, or even make the storage inaccessible. Have a look at the HBA queue depth. A common default is 32, which may not be optimal. Some studies have shown that increasing this to 64 or higher can improve performance. It could also make things worse, depending on workload, SAN make and model, disk layout, etc. So test thoroughly if you can. Some storage admins discourage modifying HBA queue depth as they think everyone will want the same on their servers and the storage array will be swamped. And theyre right, too Persuade them that it is just for you. Promise not to tell anyone else. Whatever. Just get your extra queue depth if you think it will benefit performance. Too Many Servers When a company forks out a small fortune on a storage area network, they want to get value for money. So naturally, every new server that comes along gets hooked up so it can make use of all that lovely disk space. This is fine until a couple of servers start issuing a lot of IO requests and other users complain of a performance slowdown. This is something I see repeatedly at so many clients, and there is no easy solution. The company doesnt want or cant afford to purchase another SAN. If you think this is a problem for you, put a schedule together of all jobs - across all servers - and try to reschedule some so that workload is distributed more evenly. Partition Alignment and Formatting I will briefly mention partition alignment, although Windows 2008 uses a default offset of 1MB so this is less of an issue than it used to be. I am also not convinced that a lot of modern SANs benefit much from the practise. I performed a test on an EVA a few years ago and found just a 2 improvement. Nevertheless, a few percent is still worth striving for. Unfortunately you will have to tear down your volumes and recreate your partitions if this is to be fixed on an existing system. This is probably not worth the hassle unless you are striving for every last inch of performance. Formatting is something else that should be performed correctly. SQL Server stores data in 8KB pages, but these are retrieved in blocks of 8, called extents. If the disks are formatted with 64KB allocation units, this can have a significant performance benefit. Multipathing If you are not using local disk then you should have some redundancy built into your storage subsystem. If you have a SAN you have a complicated network of HBAs, fabric, switches and controllers between SQL Server and the disks. There should be at least two HBAs, switches, etc. and these should all be connected together in such a way that there are multiple paths to the disks. This redundancy is primarily for high availability, but if the multipathing has been configured as activeactive you may see performance benefits as well. Network Attached Storage Since SQL Server 2008 R2 it has been possible to create, restore or attach a database on a file share. This has a number of possible uses, and particularly for devtest environments it can make capacity management easier, and make moving databases between servers much quicker. The question to be asked, though, is quotDo you really want this in productionquot Performance will not be as good as local or SAN drives. There are additional components in the chain, so reliability may not be as good. And by using the network, your data uses the same infrastructure as all the other TCPIP traffic, which again could impact performance. But theres good news While availability is still a worry, improvements in SMB on Windows Server 2012 (and via an update to WIndows Server 2008 R2) have made it significantly faster. I saw a quote from a Microsoft employee somewhere that claimed 97 of the performance of local storage. I cant find the quote now, and I dont remember if he was measuring latency or throughput. Disk Fragmentation How often do you use the Disk Defragmenter tool on your PC to analyze and defragment your C: drive How often do you check fragmentation on the disks on your SQL Servers For most people that is nowhere near as often, Ill bet. Yet volume fragmentation is just as detrimental to SQL Server performance as it is to your PC. You can reduce the likelihood of disk fragmentation in a number of ways: Pre-size data and log files, rather than rely on auto-growth Set auto-growth increments to sensible values instead of the default 10 Avoid shrinking data and log files Never, ever use the autoshrink database option Ensure disks are dedicated to SQL Server and not shared with other applications You can check fragmentation using the same tool as on your PC. Disk Defragmenter is available on all server versions of Windows. Another way to check is via the Win32Volume class in WMI. This bit of PowerShell reports the file percent fragmentation for all volumes on a given server. If you have significant fragmentation there are a couple of ways to fix it. My preferred option is as follows, but requires some downtime. Stop the SQL services Backup the files on the disk (especially mdf, ndf and ldf files - better safe than sorry) Run the Windows Disk Defragmenter tool Start the SQL services Check the error log to ensure no errors during startup Run CHECKDB against all databases (except tempdb). Ive never seen the defrag tool cause corruption, but you cant be too careful Another option that doesnt require downtime is to use a third party tool such as Diskeeper. This can be very effective at fixing and preventing disk fragmentation, but it costs money and uses a filter driver - see my comments below. Filter Drivers A filter driver is a piece of software that sits between an IO request and the write to disk. It allows the write to be examined and rejected, modified or audited. The most common type of filter driver is installed by anti-virus software. You do not want anti-virus software checking every single write to your database files. You also dont want it checking your backups either, or writes to the error log, or default trace. If you have AV software installed, you can specify exclusions. Exclude all folders used by SQL Server, plus the drives used by data and log files, plus the folders used for backups. Even better is to turn off online AV checking, and schedule a scan at a quiet time. OLTP and BI on the Same Server It is rare to find a system that is purely OLTP. Most will have some sort of reporting element as well. Unfortunately, the two types of workload do not always coexist happily. Ive been reading a lot of articles by Joe Chang, and in one article he explains why this is the case. Essentially, OLTP query plans retrieve rows in small batches (less than a threshold of 25 rows) and these IO requests are handled synchronously by the database engine, meaning that they wait for the data to be retrieved before continuing. Large BI workloads and reporting queries, often with parallel plans, issue asynchronous IO requests and take full advantage of the HBA ability to queue requests. As a result, the OLTP requests have to queue up behind the BI requests, causing OLTP performance to degrade significantly. Auto-grow and Instant File Initialization It is good to have auto-grow enabled, just as a precaution, although you should also pre-size data and log files so that it is rarely needed. However, what happens if a data file grows and you dont have instant file initialization enabled Especially if the auto-grow is set too big. All IO against the file has to wait for the file growth to complete, and this may be reported in the infamous quotIOs taken longer than 15 seconds to completequot message in the error log. Instant initialization wont help with log growth, so make sure log auto-growth increments are not too high. For more information about instant file initialization and how to enable it, see this link Database File Initialization . And while on the subject of auto-grow, see the section on proportional fill, below. Transaction Log Performance How long do your transaction log writes take Less than 1ms More than 5ms Look at virtual file stats, performance counters, or the WRITELOG wait time to see if log write latency is an issue for you. Writes to the transaction log are sequential, and so the write head on the disk should ideally be where it was from the last log write. This means no seek time, and blazingly fast write times. And since a transaction cannot commit until the log has hardened to disk, you rely on these fast writes for a performant system. Advice for years has been for the transaction log for each database to be on its own disk. And this advice is still good for local disk, and for some storage arrays. But now that a lot of SANs have their own battery-backed write cache, this advice is not as critical as it used to be. Provided the cache is big enough to cope with peak bursts of write activity (and see my earlier comments about allocating more cache to writes than to reads) you will get very low latency. So what if you dont have the luxury of a mega-bucks SAN and loads of write cache Then the advice thats been around since the 1990s is still valid: One transaction log file per database on its own drive RAID 1, RAID 10 or RAID 01 So assuming you are happy with your log file layout, what else could be slowing down your log writes Virtual Log Files Although a transaction log is written to sequentially, the file itself can become fragmented internally. When it is first created it consists of several chunks called virtual log files. Every time it is grown, whether manually or automatically, several more virtual log files are added. A transaction log that grows multiple times can end up with thousands of virtual log files. Having too many VLFs can slow down logging and may also slow down log backups. You also need to be careful to avoid VLFs that are too big. An inactive virtual log file is not cleared until the end is reached and the next one starts to be used. For full recovery model, this doesnt happen until the next log backup. So a log backup will suddenly have a lot more work to, and may cause performance problems while it takes place. The answer for a big transaction log is to set an initial size of maximum 8000MB, and then manually grow in chunks of 8000MB up to the target size. This results in maximum VLF size of 512MB, without creating an excessively large number of VLFs. Note: this advice is for manual growth only. Do not auto grow by 8000MB All transactions in the database will stop while the extra space is initialised. Autogrow should be much smaller - but try to manually size the file so that auto grow is unlikely to be needed. Log Manager Limits The database engine sets limits on the amount of log that can be in flight at any one time. This is a per-database limit, and depends on the version of SQL Server being used. SQL Server limits the number of outstanding IOs and MB per second. The limits vary with version and whether 32 bit or 64 bit. See Diagnosing Transaction Log Performance Issues and Limits of the Log Manager for more details. This is why the write latency should be as low as possible. If it takes 20ms to write to the transaction log, and you are limited to 32 IOs in flight at a time, that means a maximum of 1600 transactions per second, well below what a lot of high volume OLTP databases require. This also emphasises the importance of keeping transaction sizes small, as one very large transaction could conceivably hold up other transactions while it commits. If you think these limits are affecting log write performance in your databases there are several ways to tackle the problem: Work on increasing log write performance If you have minimally logged operations you can switch the database to use the BULK LOGGED recovery model. Careful though - a log backup containing a minimally logged operation has to be restored in full. Point in time restore is not possible. Split a high volume database into 2 or more databases, as the log limits apply per database Non-Sequential Log Activity There are actions performed by the database engine that move the write head away from the end of the log file. If transactions are still being committed while this happens, you have a seek overhead and log performance gets worse. Operations that read from the log files include rollback of large transactions, log backups and replication (the log reader agent). There is little you can do about most of these, but avoiding large rollbacks is something that should be tackled at the design and development stage of an application. Proportional Fill Very active tables can be placed in a file group that has multiple data files. This can improve read performance if they are on different physical disks, and it can improve write performance by limiting contention in the allocation pages (especially true for tempdb). You lose some of the benefit, though, if you dont take advantage of the proportional fill algorithm. Proportional fill is the process by which the database tries to allocate new pages in proportion to the amount of free space in each data file in the file group. To get the maximum benefit make sure that each file is the same size, and is always grown by the same increment. This is for both manual and auto growth. One thing to be aware of is how the auto growth works. SQL Server does its best to fill the files at the same rate, but one will always fill up just before the others, and this file will then auto grow on its own. This then gets more new page allocations than the others and becomes a temporary hotspot until the others also auto grow and catch up. This is unlikely to cause problems for most databases, although for tempdb it may be more noticeable. Trace flag 1117 causes all data files in a file group to grow together, so is worth considering if this is an issue for you. Personally I would rather manually size the files so that auto growth isnt necessary. tempdb Configuration Lets start with a few things that everybody agrees on: tempdb files should be placed on the fastest storage available. Local SSD is ideal, and from SQL Server 2012 this is even possible on a cluster Pre-size the data and log files, as auto growth may cause performance issues while it occurs New temporary objects are created all the time, so contention in the GAM, SGAM and PFS pages may be an issue in some environments And now some differences of opinion: There is loads of advice all over the web to create one tempdb data file per core to reduce allocation contention. Paul Randall disagrees (A SQL Server DBA myth a day: (1230) tempdb should always have one data file per processor core ). He says that too many files can actually make things worse. His solution is to create fewer files and to increase only if necessary There is more advice, often repeated, to separate tempdb files from other databases and put them on their own physical spindles. Joe Chang disagrees and has a very good argument for using the common pool of disks. (Data, Log and Temp file placement ). Ill leave you to decide what to do AutoShrink The AutoShrink database option has been around ever since I started using SQL Server, causing lots of performance problems for people who have enabled it without fully realising what it does. Often a third party application will install a database with this option enabled, and the DBA may not notice it until later. So why is it bad Two reasons: It is always used in conjunction with auto grow, and the continuous cycle of grow-shrink-grow causes a huge amount of physical disk fragmentation. Ive already covered that topic earlier in this article While it performs the shrink there is a lot of additional IO, which slows down the system for everything else Disable it. Allocate enough space for the data and log files, and size them accordingly. And dont forget to fix all that fragmentation while youre at it. Insufficient Memory This is an article about SQL Server IO performance, not memory. So I dont want to cover it in any detail here - that is a subject for a different article. I just want to remind you that SQL Server loves memory - the more the better. If your entire database(s) fits into memory youll have a much faster system, bypassing all that slow IO. Lack of memory can lead to dirty pages being flushed to disk more often to make space for more pages being read. Lack of memory can also lead to increased tempdb IO, as more worktables for sort and hash operations have to spool to disk. Anyway, the point of this section is really to make one statement: Fill your servers with as much memory as you can afford, and as much as the edition of SQL Server and Windows can address. SQL Server 2014 has a new feature allowing some tables to be retained in memory, and accessed via natively compiled stored procedures. Some redesign of some of your existing code may be needed to take advantage of this, but it looks like a great performance boost for those OLTP systems that start to use it. High Use of tempdb tempdb can be a major consumer of IO and may affect overall performance if used excessively. It is worth looking at the various reasons for its use, and examining your system to ensure you have minimized these as far as possible. User-created temporary objects The most common of these are temporary tables, table variables and cursors. If there is a high rate of creation this can lead to allocation page contention, although increasing the number of tempdb data-files may partially alleviate this. Processes creating very large temporary tables or table variables are a big no-no, as these can cause a lot of IO. Internal Objects The database engine creates work-tables in tempdb for handling hash joins, sorting and spooling of intermediate result sets. When sort operations or hash joins need more memory than has been granted they spill to disk (using tempdb) and you will see Hash warnings and Sort warnings in the default trace. I originally wrote a couple of paragraphs about how and why this happens and what you can do to prevent it, but then I found this post that explains it much better - Understanding Hash, Sort and Exchange Spill Events . Version Store The third use of tempdb is for the version store. This is used for row versioning. Row versions are created when snapshot isolation or read committed snapshot option is used. They are also created during online index rebuilds for updates and deletes made during the rebuild and for handling data modifications to multiple active result sets (MARS). A poorly written application (or rogue user) performing a large update that affects many thousands of rows when a row versioning based isolation level is in use may cause rapid growth in tempdb and adversely impact IO performance for other users. Table and Index Scans A table scan is a scan of a heap. An index scan is a scan of a clustered or non-clustered index. Both may be the best option if a covering index does not exist and a lot of rows are likely to be retrieved. A clustered index scan performs better than a table scan - yet another reason for avoiding heaps But what causes a scan to be used in the first place, and how can you make a seek more likely Out of date statistics Before checking indexes and code, make sure that statistics are up to date. Enable quotauto create statisticsquot. If quotauto update statisticsquot is not enabled make sure you run a manual statistics update regularly. This is a good idea even if quotauto update statisticsquot is enabled, as the threshold of approximately 20 of changed rows before the auto update kicks in is often not enough, especially where new rows are added with an ascending key. Index Choice Sometimes an existing index is not used. Have a look at improving its selectivity, possibly by adding additional columns, or modifying the column order. Consider whether a covering index could be created. A seek is more likely to be performed if no bookmark lookups will be needed. See these posts on the quottipping pointquot by Kimberly Tripp. The Tipping Point . Inefficient TSQL The way a query is written can also result in a scan, even if a useful index exists. Some of the reasons for this are: Non-sargable expressions in the WHERE clause. quotsargquot means Simple ARGument. So move calculations away from the columns and onto the constants instead. So for example, this will not use the index on OrderDate: WHERE DATEADD ( DAY. 1. OrderDate ) gt GETDATE () Whereas this will use an index if it exists (and it is selective enough): WHERE OrderDate gt DATEADD ( DAY. - 1. GETDATE ()) Implicit conversions in a query may also result in a scan. See this post by Jonathan Kehayias Implicit Conversions that cause Index Scans . Bad Parameter Sniffing Parameter sniffing is a good thing. It allows plan re-use and improves performance. But sometimes it results in a less efficient execution plan for some parameters. Index Maintenance Every index has to be maintained. Im not talking about maintenance plans, but about the fact that when rows are inserted, deleted and updated, the non-clustered indexes also have to be changed. This means additional IO for each index on a table. So it is a mistake to have more indexes than you need. Check that all indexes are being used. Check for duplicates and redundant indexes (where the columns in one are a subset of the columns in another). Check for indexes where the first column is identical but the rest are not - sometimes these can be merged. And of course, test, test, test. Index Fragmentation Index fragmentation affects IO performance in several ways. Range scans are less efficient, and less able to make use of read-ahead reads Empty space created in the pages reduces the density of the data, meaning more read IO is necessary The fragmentation itself is caused by page splits, which means more write IO There are a number things that can be done to reduce the impact of fragmentation, or to reduce the amount of fragmentation. Rebuild or reorganize indexes regularly Specify a lower fill factor so that page splits occur less often (though not too low, see below) Change the clustered index to use an ascending key so that new rows are appended to the end, rather than inserted in a random place in the middle Forwarded Records When a row in a heap is updated and requires more space, it is copied to a new page. But non-clustered indexes are not updated to point to the new page. Instead, a pointer is added to the original page to show where the row has moved to. This is called a forwarding pointer, and there could potentially be a long chain of these pointers to traverse to find the eventual data. Naturally, this means more IO. A heap cannot be defragmented by rebuilding the index (there isnt one). The only way to do this is to create a clustered index on the heap, and then drop it afterwards. Be aware that this will cause all non-clustered indexes to be rebuilt twice - once for the new clustered index, and again when it is dropped. If there are a lot of these it is a good idea to drop the non-clustered indexes first, and recreate them afterwards. Better still is to avoid heaps where possible. I accept there may be cases where they are the more efficient choice (inserting into archive tables, for example), but always consider whether a clustered index would be a better option - it usually is. Wasted Space In an ideal world every data page on disk (and in memory) would be 100 full. This would mean the minimum of IO is needed to read and write the data. In practise, there is wasted space in nearly all pages - sometimes a very high percent - and there are a lot of reasons why this occurs. Low fill factor Ive mentioned fill factor already. If it is too high, and page splits are occurring when rows are inserted or updated, it is sensible to rebuild the index with a lower fill factor. However, if the fill factor is too low you may have a lot of wasted space in the database pages, resulting in more IO and memory use. This is one of those quotsuck it and seequot scenarios. Sometimes a compromise is needed. Page splits This is also discussed above. But as well as fragmentation, page splits can also result in wasted space if the empty space is not reused. The solution is to defragment by rebuilding or reorganizing indexes regularly. Wasteful Choice of Data Types Use the smallest data types you can. And try to avoid the fixed length datatypes, like CHAR(255), unless you regularly update to the longest length and want to avoid page splits. The reasoning is simple. If you only use 20 characters out of 200, that is 90 wasted space, and more IO as result. The higher density of data per page the better. Lazy thinking might make developers create AddressLine1, AddressLine2, etc as CHAR(255), because they dont actually know what the longest should be. In this case, either do some research, find out that the longest is 50 characters (for example) and reduce them to CHAR(50), or use a variable length data type. Schema Design Ive already mentioned choice of data types above, but there are other schema design decisions that can affect the amount of IO generated by an application database. The most common one is designing tables that are too wide. I sometimes see a table with 20, 30, 50, even 100 columns. This means fewer rows fit on a page, and for some extreme cases there is room for just one row per page - and often a lot of wasted space as well (if the row is just slightly wider than half a page, thats 50 wasted). If you really do need 50 columns for your Customer table, ask yourself how many of these are regularly accessed. An alternative is to split into 2 tables. Customer, with just a few of the commonly used columns, and CustomerDetail with the rest. Of course, the choice of which columns to move is important. You dont want to start joining the tables for every query as that defeats the object of the exercise. Page or Row Compression Compression is another way of compacting the data onto a page to reduce disk space and IO. Use of row or page compression can dramatically improve IO performance, but CPU usage does increase. As long as you are not already seeing CPU bottlenecks, compression may be an option to consider. Be aware that compression is an Enterprise edition feature only. Backup Compression Since SQL Server 2008 R2, backup compression has been available on Standard edition as well as Enterprise. This is major benefit and I recommend that it be enabled on all instances. As well as creating smaller backups, it is also quicker and means less write IO. The small increase in CPU usage is well worth it. Enable it by default so that if someone sets off an ad hoc backup it will have minimal IO impact. Synchronous MirroringAlwaysOn High safety mode in database mirroring, or synchronous commit mode in AlwaysOn, both emphasise availability over performance. A transaction on the mirroring principal server or primary replica does not commit until it receives a message back from the mirror or secondary replica that the transaction has been hardened to the transaction log. This increases transactional latency, particularly when the servers are in different physical locations. Resource Governor in 2014 Up until and including SQL Server 2012 resource governor has only been able to throttle CPU and memory usage. Finally the ability to include IO in a resource pool has been added to SQL Server 2014. This has obvious use as a way of limiting the impact of reports on the system from a particular user, department or application. Gathering The Evidence There are a lot of ways you can measure SQL Server IO performance and identify which areas need looking at. Most of what follows is available in SQL CoPilot in graphical and tabular form, both as averages since last service start and as snapshots of current activity. Wait Types Use sys. dmoswaitstats to check number of waits and wait times for IOCOMPLETION, LOGBUFFER, WRITELOG and PAGEIOLATCH. Use this script to focus on the IO wait types: SELECT waittype. waitingtaskscount. waittimems - signalwaittimems AS totalwaittimems , 1. ( waittimems - signalwaittimems ) CASE WHEN waitingtaskscount 0 THEN 1 ELSE waitingtaskscount END AS avgwaitms FROM sys. dmoswaitstats WHERE waittype IN ( IOCOMPLETION. LOGBUFFER. WRITELOG. PAGEIOLATCHSH. PAGEIOLATCHUP. PAGEIOLATCHEX. PAGEIOLATCHDT. PAGEIOLATCHKP ) This shows averages since the last service restart, or since the wait stats were last cleared. To clear the wait stats, use DBCC SQLPERF (sys. dmoswaitstats, CLEAR) You can also check sys. dmoswaitingtasks to see what is currently being waited for. Virtual File Stats Query sys. dmiovirtualfilestats to find out which data and log files get the most read and write IO, and the latency for each file calculated using the stall in ms. SELECT d. name AS databasename. mf. name AS logicalfilename. numofbytesread. numofbyteswritten. numofreads. numofwrites. 1. iostallreadms ( numofreads 1 ) avgreadstallms. 1. iostallwritems ( numofwrites 1 ) avgwritestallms FROM sys. dmiovirtualfilestats (NULL, NULL) vfs JOIN sys. masterfiles mf ON vfs. databaseid mf. databaseid AND vfs. FILEID mf. FILEID JOIN sys. databases d ON mf. databaseid d. databaseid Performance Counters There are two ways of looking at performance counters. Select from sys. dmosperformancecounters, which shows all the SQL Server counters, or use Windows Performance Monitor (perfmon) to see the other OS counters as well. Some counters to look at are: SQL Server:Buffer Manager Lazy writessec The number of times per second that dirty pages are flushed to disk by the Lazy Writer process. An indication of low memory, but listed here as it causes more IO. Checkpoint pagessec The number of dirty pages flushed to disk per second by the checkpoint process. Page readssec Number of physical pages read from disk per second Page writessec Number of physical pages written to disk per second Readahead pagessec Pages read from disk in advance of them being needed. Expect to see high values in BI workloads, but not for OLTP SQL Server:Access Methods Forwarded recordssec Should be as low as possible. See above for explanation of forwarded records. Full scanssec The number of unrestricted full scans. Use of UDFs and table variables can contribute to this, but concentrating on seeks will help to keep the value down Page splitssec The number of page splits per second - combining splits due to pages being added to the end of a clustered index as well as quotgenuinequot splits when a row is moved to a new page. Use the technique from the link in the section on index fragmentation, above, to get a more accurate breakdown Skipped ghosted recordssec For information about ghosted records see An In-depth Look at Ghost Records in SQL Server Workfiles createdsec A measure of tempdb activity Worktables createdsec A measure of tempdb activity SQL Server:Databases Log bytes flushedsec The rate at which log records are written to disk Log flush wait time The duration of the last log flush for each database Log flush waitssec The number of commits per second waiting for a log flush Logical Disk Avg Disk secsRead Avg Disk secsWrite Avg Disk Read bytessec Avg Disk Write bytessec Using the sys. dmosperformancecounters DMV, a lot of counters display a raw value, which has to be monitored over time to see values per second. Others have to be divided by a base value to get a percentage. This makes this DMV less useful unless you perform these calculations and either monitor over time or take an average since the last server restart. This script uses the tempdb creation date to get the number of seconds since the service started and calculates the averages for these counters. It also retrieves all other counters and calculates those that are derived from a base value. USE master SET NOCOUNT ON DECLARE upsecs bigint SELECT upsecs DATEDIFF ( second. createdate. GETDATE ()) FROM sys. databases WHERE name tempdb SELECT RTRIM ( objectname ) objectname. RTRIM ( instancename ) instancename. RTRIM ( countername ) countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 65792 UNION ALL SELECT RTRIM ( objectname ), RTRIM ( instancename ), RTRIM ( countername ), 1. CAST ( cntrvalue AS bigint ) upsecs FROM sys. dmosperformancecounters WHERE cntrtype 272696576 UNION ALL SELECT RTRIM ( v. objectname ), RTRIM ( v. instancename ), RTRIM ( v. countername ), 100. v. cntrvalue CASE WHEN b. cntrvalue 0 THEN 1 ELSE b. cntrvalue END FROM ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 537003264 ) v JOIN ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 1073939712 ) b ON v. objectname b. objectname AND v. instancename b. instancename AND RTRIM ( v. countername ) base RTRIM ( b. countername ) UNION ALL SELECT RTRIM ( v. objectname ), RTRIM ( v. instancename ), RTRIM ( v. countername ), 1. v. cntrvalue CASE WHEN b. cntrvalue 0 THEN 1 ELSE b. cntrvalue END FROM ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 1073874176 ) v JOIN ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 1073939712 ) b ON v. objectname b. objectname AND v. instancename b. instancename AND REPLACE ( RTRIM ( v. countername ), (ms). ) Base RTRIM ( b. countername ) ORDER BY objectname. instancename. countername Dynamic Management Views and Functions As well as the DMVs in the above scripts, there are a number of others that are useful for diagnosing SQL Server IO performance problems. Here are all the ones I use. Ill add some sample scripts when I get the time: sys. dmoswaitstats sys. dmiovirtualfilestats sys. dmosperformancecounters sys. dmiopendingiorequests sys. dmdbindexoperationalstats sys. dmdbindexusagestats sys. dmdbindexphysicalstats sys. dmosbufferdescriptors It can also be useful to see what activity there is on the instance. Here are your options: The Profiler tool is quick and easy to use - you can start tracing in a matter of seconds. However, there is some overhead and it may impact performance itself - especially when a lot of columns are selected. A server side trace is a better option. A server-side trace has less of an impact than running Profiler. It has to be scripted using system stored procedures, but Profiler has the ability to generate the script for you. Extended Event Sessions Extended events were first introduced in SQL Server 2008, and have been considerably enhanced in SQL 2012. They are very lightweight, and the use of server-side traces and Profiler is now deprecated. Nevertheless, use of extended events may impact performance of high transaction systems if you are not careful. Use an asynchronous target and avoid complicated predicates to limit the overhead. There are a number of tools for gathering performance data from your servers. SQLIO is a simple tool that creates a file on disk and tests latency and throughput for randomsequential IO, at various block sizes and with a variable number of threads. These are all fully configurable. SQLIO is a great way of getting a baseline on a new server or storage, for future comparison. Third party tools are another option for viewing performance metrics. Some show you what is happening on the server right now. Others are built into more complex (and expensive) monitoring solutions. Performance metrics obtained on virtual servers are unreliable. Performance counters and wait stats may give the impression that everything is OK, when it is not. I recommend the use of the performance monitoring tools provided by the VM vendor. In the case of VMWare, this is very easy to use and is built into Virtual Center. This turned into a much bigger article than I expected - SQL Server IO performance is a big subject I started with everything I knew, and double checked my facts by searching the web and checking books. In the process I learnt a whole lot of new stuff and found a lot of useful links. It has been a useful exercise. Hopefully this has been useful for you too.
No comments:
Post a Comment