Python & Spreadsheets
Video in TIB AVPortal:
Python & Spreadsheets
Formal Metadata
Title 
Python & Spreadsheets

Subtitle 
State of the Union

Alternative Title 
Python & Spreadsheets: 2017 Edition

Title of Series  
Part Number 
5

Number of Parts 
48

Author 

Contributors 

License 
CC Attribution  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Content Metadata
Subject Area  
Abstract 
Spreadsheets are OFTEN terrible. They’re also everywhere! As one of the default forms of data exchange, learning to work with spreadsheets directly via Python can save time and effort. We’ll look at Openpyxl, a library that lets you do just that. We’ll look at at least two different (beginnerfriendly)example cases: transforming one spreadsheet into another spreadsheet and converting a spreadsheet into JSON. I’ll also use my experience as a former accountant to highlight some of the issues around reading from and writing to a spreadsheet file and how you might deal with them. You MAY even learn to make new friends and grow the Python community! True Story!

00:00
Slide rule
Context awareness
Touchscreen
Information
Link (knot theory)
Decision theory
Moment (mathematics)
State of matter
Disk readandwrite head
Revision control
Spreadsheet
Spreadsheet
Computer animation
Term (mathematics)
Quicksort
Musical ensemble
Library (computing)
01:46
Computer font
Demo (music)
Code
Multiplication sign
Software developer
Demo (music)
Data recovery
Code
Computer font
Computer programming
Programmer (hardware)
Spreadsheet
Process (computing)
Spreadsheet
Type theory
Computer animation
Video game
Quicksort
Data type
Task (computing)
Fundamental theorem of algebra
Fundamental theorem of algebra
03:09
Divisor
Software developer
Java applet
Code
Multiplication sign
Set (mathematics)
Mereology
Formal language
Programmer (hardware)
Pi
Spreadsheet
Game theory
Descriptive statistics
Software developer
Database
Basis <Mathematik>
Category of being
Word
Spreadsheet
Computer animation
Personal digital assistant
Normal (geometry)
Summierbarkeit
Quicksort
Cycle (graph theory)
Game theory
05:18
Laptop
Demo (music)
Code
Demo (music)
Code
Cartesian coordinate system
Spreadsheet
Latent heat
Computer animation
Personal digital assistant
Hypermedia
Internetworking
Right angle
Arithmetic progression
Library (computing)
06:13
Computer file
Web page
Code
Maxima and minima
Directory service
Dreizehn
Spreadsheet
Spreadsheet
Type theory
Computer configuration
Computer animation
Boom (sailing)
output
06:52
Gender
Code
Multiplication sign
Visual system
Open set
Mereology
Subset
Programmer (hardware)
Type theory
Kernel (computing)
Computer network
Logarithm
File format
Software developer
Sound effect
Special unitary group
Data management
Process (computing)
Spreadsheet
Computer configuration
Ergodentheorie
Selforganization
output
Quicksort
Data type
Resultant
Electric current
Row (database)
Point (geometry)
Functional (mathematics)
Service (economics)
Computer file
Directory service
Number
Product (business)
Local Group
Spreadsheet
Latent heat
Natural number
Wellformed formula
Term (mathematics)
Energy level
Utility software
output
Form (programming)
Graph (mathematics)
Cellular automaton
Projective plane
Mathematical analysis
Code
Division (mathematics)
Line (geometry)
Word
Integrated development environment
Personal digital assistant
Library (computing)
12:38
Empennage
Functional (mathematics)
Codierung <Programmierung>
Maxima and minima
Directory service
Price index
Open set
Attribute grammar
Workload
Spreadsheet
Different (Kate Ryan album)
Bridging (networking)
Kernel (computing)
GaußFehlerintegral
Game theory
Dialect
Link (knot theory)
Graph (mathematics)
Information
Cellular automaton
Web page
Constructor (objectoriented programming)
Electronic mailing list
Code
Letterpress printing
Category of being
Computer configuration
Spreadsheet
Normed vector space
Object (grammar)
Data type
14:51
Tuple
Focus (optics)
Functional (mathematics)
Codierung <Programmierung>
Reflection (mathematics)
Cellular automaton
Web page
Maxima and minima
Set (mathematics)
Electronic mailing list
Directory service
Freezing
Category of being
Latent heat
Computer animation
Kernel (computing)
Quadrilateral
Video game
Pattern language
Figurate number
Object (grammar)
16:36
Tuple
Theory of relativity
Electric generator
Demo (music)
Cellular automaton
Multiplication sign
Cellular automaton
Maxima and minima
Database
Electronic mailing list
Letterpress printing
Spreadsheet
Computer animation
Personal digital assistant
Semiconductor memory
Computer configuration
Different (Kate Ryan album)
Atomic number
Object (grammar)
output
Row (database)
17:54
Tuple
Dialect
Cellular automaton
Multiplication sign
Moment (mathematics)
Cellular automaton
Planning
Electronic mailing list
Letterpress printing
Mereology
Event horizon
Element (mathematics)
Type theory
Semiconductor memory
Different (Kate Ryan album)
Kernel (computing)
Right angle
Object (grammar)
Endliche Modelltheorie
Tuple
Row (database)
19:30
Functional (mathematics)
Demo (music)
Information
Cellular automaton
Multiplication sign
Cellular automaton
Letterpress printing
Division (mathematics)
Maxima and minima
Database
Bit
Division (mathematics)
Line (geometry)
Type theory
Spreadsheet
Computer animation
Different (Kate Ryan album)
String (computer science)
Object (grammar)
Data type
Library (computing)
20:36
Email
Demo (music)
Cellular automaton
Programmable readonly memory
Web page
Cellular automaton
Maxima and minima
Plastikkarte
Attribute grammar
Spreadsheet
Type theory
Computer animation
Personal digital assistant
Kernel (computing)
Object (grammar)
Data type
Row (database)
21:35
Computer font
Inheritance (objectoriented programming)
Information
Multiplication sign
Cellular automaton
Demo (music)
Cellular automaton
Set (mathematics)
Attribute grammar
Bit
Hyperlink
Mass
Type theory
Spreadsheet
Type theory
Spreadsheet
Atomic number
Logic
Object (grammar)
Kernel (computing)
String (computer science)
File viewer
Convex hull
Object (grammar)
23:10
Information
Demo (music)
Multiplication sign
Real number
Demo (music)
Cellular automaton
Insertion loss
Number
Inclusion map
Category of being
Spreadsheet
Word
Spreadsheet
Computer animation
Kernel (computing)
File viewer
Information
Loop (music)
Resultant
24:03
Building
Code
Multiplication sign
Plotter
Demo (music)
Programmable readonly memory
Letterpress printing
Maxima and minima
Division (mathematics)
Analogy
Set (mathematics)
Electronic mailing list
Mereology
Number
Frequency
Spreadsheet
Latent heat
Kernel (computing)
Cantor set
Software testing
Summierbarkeit
Loop (music)
Game theory
Data dictionary
Information
Uniqueness quantification
Forcing (mathematics)
Cellular automaton
Web page
Gradient
Electronic mailing list
Sampling (statistics)
Division (mathematics)
Line (geometry)
Multilateration
Subject indexing
Data management
Process (computing)
Computer animation
Personal digital assistant
Network topology
Order (biology)
Object (grammar)
Row (database)
28:07
Email
Computer file
Multiplication sign
Cellular automaton
Division (mathematics)
Design by contract
Dreizehn
Function (mathematics)
Parameter (computer programming)
Student's ttest
Data dictionary
Computer programming
Number
Spreadsheet
Programmschleife
Kernel (computing)
Process (computing)
Data dictionary
Email
Venn diagram
View (database)
Building
Cellular automaton
Lemma (mathematics)
Electronic mailing list
Mass
Division (mathematics)
Type theory
Subject indexing
Latent heat
Data management
Process (computing)
Spreadsheet
Frequency
Function (mathematics)
Selforganization
Right angle
Key (cryptography)
Object (grammar)
Quicksort
Table (information)
Resultant
Row (database)
31:24
Observational study
Multiplication sign
Division (mathematics)
Maxima and minima
Mass
Realtime operating system
Staff (military)
Total S.A.
Statistics
Host Identity Protocol
Metadata
Computer animation
Resultant
32:06
Email
Computer file
Multiplication sign
Computergenerated imagery
Maxima and minima
Staff (military)
Open set
Inclusion map
Spreadsheet
Frequency
Computer animation
Oval
Process (computing)
Object (grammar)
Row (database)
32:48
Computer animation
Kernel (computing)
Demo (music)
Division (mathematics)
Quicksort
Statistics
Staff (military)
Total S.A.
33:20
File format
Multiplication sign
Cellular automaton
Source code
Visual system
Spreadsheet
Spreadsheet
Computer animation
Visualization (computer graphics)
Function (mathematics)
output
output
Desktop publishing
34:07
Group action
Code
Multiplication sign
Range (statistics)
Neuroinformatik
Identity management
Stability theory
Arm
Channel capacity
File format
Attribute grammar
Bit
Twitter
Type theory
Arithmetic mean
Spreadsheet
Repository (publishing)
Right angle
Summierbarkeit
Video game console
Quicksort
Automation
Resultant
Reduction of order
Point (geometry)
Slide rule
Computer file
Cellular automaton
Web browser
Rule of inference
Twitter
Spreadsheet
Lecture/Conference
Operator (mathematics)
Data structure
Macro (computer science)
Tunis
Task (computing)
Module (mathematics)
Validity (statistics)
Interface (computing)
Cellular automaton
Mathematical analysis
Line (geometry)
Pivot element
Gravitation
Object (grammar)
Table (information)
Marginal distribution
Library (computing)
00:00
and the long and the
00:07
and the and
00:12
the so yeah I here uh I am the presented here my slides IGC I'm very fancy and I mean very fancy elaborate slides with white in context and talked about spreadsheets of course I'm very about fancy things the so that occupy Richard C. union on this 2017 uh it's called that for even just for information purposes my name is might for the handle is also on every slide so if you have question you want we the later than that sort of thing you can do on music pretty available at physical sitting in August 27 because I gave this talk initially in 2013 or version of 2013 for started using a library of talking about so right now I'm a QA specialist for stopping you because this is your health so I used to be an accountant and uh again MBA an array with China and was a college instructor there were few that'll Tokyu US missile although this lies in the i haven't opportunity for this talk will be available at the link there mind uh due on 20 17 that available at the moment but it will be soon note to self when I try to look at the screen from my body and my head because the British this term I had the money goes to that work this whatever we use no practices talk to them you realize is by the details
01:47
so the basic outline of what this often the To and talk about how I got here so my secret story on them for some the fundamentals of some of some the fundamental dealt datatypes that opened by Excel gives you and that's because a few things that are a little not obvious but then was explained the same you recognize hopefully then will take you a basic demo of some basic things you can do that by then it's in spreadsheets and then we'll look at some of the problem that you run into trying to use spreadsheets with cos there's certain things that are this some situations were using as a spreadsheet with code is straightforward simple but then there are a lot of situations or it's not science when I highlight some of those again you pain of my life is the recovery account can hopefully benefit to some so that the
02:37
uh the of Kojo basically professional spreadsheet fighter which is also attended accounts of laughter spreadsheets all the time be terrified by how many of you have that in the rent spreadsheets but always an interesting codon in learning the program but I never really needed to as someone was 1st a programmer so I decided in late 2012 to get more serious about teach myself to go and coming approach the developer in I with fro uh I young 1st development job in December 2015 that's sort of how I look at this point and
03:10
so my role on the part of the the is such that I've always had an interest in front a of contribution to the community that is someone who didn't come from a traditional CS background coding background I knew I would have started making contributions by writing off some code on day 1 you early on so I thought well I can unite these personal skills in interpersonal skills so I can maybe have grow the community and so I fell on the best ways to grow the pie like beauty was not by converting people to use other languages and other developers but by taking people who want to developers and bring them in about and community by showing them how I I can benefit from a new book for a 1 reason is if you're ready developing in Java or C + + 0 and you grow you have you builtin biases but at the same time there are more people who aren't programmers and there are people who are programmers and words such as the bigger growth factor I'm also going to look at some solutions there are not obvious developers so that sort of thing if you're a developer you think about the same thing that a certain way and when I 1st I give this talk 1 of the so 1 of the solutions to this problem the sum just with us but in in a database well if you already know SQL you have access to databases you you would have problems that I'm discussing the stock so this trying to come up with a solution that use for the people who are already developers so
04:40
who's the talk for a minute too so there's 2 sets of people talk is geared at by and again I'm from without again but if you read the description you might see nonphonemic cycle beginner from the possible and so we have 2 general categories of people 1 people who are using spreadsheets on a regular basis but what step the game of 1 of the you some different things the spreadsheets are in most cases XL so we 1 you do something outside the norm of the spreadsheet and then the next people are people who are in the pipeline developers but they keep being given to produce a key confronted with registers at where I do with the that the galleries and better word and these so hopefully that will benefit both sets of people so the code
05:20
here is not a very advanced again partially because I want to start a new media from the possible but also because the really interesting things code was going to be based on your specific application so pulling it as the spreadsheet right it back to the spreadsheet those things a fairly straightforward there's some of his library will help you do but the really interesting things are going to be based around your specific use case so I can't I don't know that is that I can demonstrate that code so I'm showing progress for a couple of basic applications to give you some idea of what can be done so now stab
05:59
demo this demo sponsored by Jupiter notebooks Jupiter books Pereiro find Internet books since 5 years then after long ago but still variable so
06:14
if step 1 that you can be as this readable people in the back yes OK but there's a kind of that that
06:25
to be that so the but the problems so the 1st thing that you just know data and for that and will take a look at the actual spreadsheet file
06:42
it's almost impossible to find a way to workers go programmatically work with a stretcher programmatically if you don't know what's in the spreadsheet and so here what got from my simple example and the pre 1 health you wouldn't say what
06:58
do I mean I what I say the spread of warranty there we go wonderment when my own mistakes on this but you can see how that goes so what we have here is we have some data we have what is that some simulated time she did so what we have here we have an employee number and employ number identified using employ so that the same employers that employ employ number the cost center that employs a workforce of this of this specific workgroup wealth that the division within the company said your company broken and what would divisions and so each division will have multiple courses in it then who was that employs manager of course you know fixing names and then the debate that they work so again it's a time she data so we're seeing here on this date the simple way with this naming this employing number you'll see that the sun will include fairly well show employee working 1 hour on a project and so this is simplified it at this for the purposes of if you work in a professional services firm job working accounting for a living for engineering companies and things of that nature 1 of the things that they look at is what's called utilization so how much time you spend working on actual billable project resource got over a time where you're working you doing things which are not working on a specific product they can bill for for this example everybody's workable project and so we're not doing that level of analysis that we that we have the country the the so this employee work 1 hour project and they work 3 hours on other projects the next day so and so forth so that 10 thousand that a thousand rows year that examples of work on before I had 10 15 point of this so this is the basic again and again if you can be as just like anything else a programming if you're going to be writing code to manipulate it familiar with that so here's the basic there we that and then we start with some basics reading a file and some give this data types so the logarithm you use called open pi Excel opened by Excel is a library but that let you read from and write to xlsx files and for that matter the the over S S file format is went uh with microsoft word used until I believe was 7 and then after that it's which the exhaust X format that that extra X. means that it's it's a Mystery of XML and it is compatible with the open document uh organizations formats and so at the time of 2013 when I started looking at this by Excel was 1 of few library that would actually work with that file for had you and so in very straightforward stuff and putting up a by itself and by show of hands how many of you wouldn't to yourselves of beginner or novice programmers of early in early stages of OK so how many of you might know some people who would be the general environment that you yourself of course but and I know some of the people who might be beginner another programmer there might be a better from hearing things described in that way so it so we we shouldn't tell a friend and in each of with them so as I said on finds that is an an offensive where again so that you could explain it to someone who's not experience developer and they get to get some benefit from it the so your input by Excel and therefore compact cell importing work book on and this is dual work ergodic and will talk about the distinction between work with the worksheets that sort of thing later and then this line of code and were quickly goes up by so that both work with this lower function this is the name of the Father we so that's pretty we that we just saw the spreadsheet this is
11:00
pi excel underscored everyone's were dying or kind of X or sex up the the throne that should be true of fly of climate is earlier in this tension here is this data only holds true is for situation where you have a formula in a spreadsheet which lasted for pretty that 1 of them effect that only the good a true what that's going to do it that's going to give you the results of the form In all cases the but agreed that the formula itself that's that enemy infantry remnants yeah lecture about everything behaving as it should right and so this is also the fairly straightforward Python stuff acceptance that only go through this is something specific open by Excel that the title of this distinction between what what's really were she spreadsheets and had what you would have to use the terms so interchangeably you know they'll say don't send me a spreadsheet or no I want that spreadsheet the spreadsheet so in in accounting adopt a a spreadsheet but there's no is a work book so xl this the fundamental work with the work what is the actual file itself so we see here is part of a work book the individual tend clean data we were using the Kpop graph laughter which is a whole different thing but know that the problem of things so each of these tags is no they worksheet In the multiple worksheets make it a work book and so that is important because when you using so multiple status
12:42
bridge your tab a spreadsheet but open Excel define them as a workbook which is a collection of individual worksheets and that's important because when you access to the data and you need to tell that you got a workbook files that you want to augment father were opening up here this cell but then you need to know which worksheet you get out that work well and the different objects that have different properties to the and so here we have this WB is the workbook that we're open which is the entire thing and for those of you who were again you would abide by the reference to know where the the DIR function on the graph function and by the so called because it would give you a list of the activities that are available for particular object that for a lot of basic I find constructs if you're in a familiar with these maybe you don't care but if you are using a new library like as a couple I accelerate Hassan different data types which are not going to be familiar with regions in before so when using the error function on them is helpful and so we see all the different attributes and when was here a couple of sheets and so this will show you that issues available uh C. happen were choosing copy of the worksheet create sheet which will see a little later there get she by name so you can go to if you can grab the specific worksheet of a workbook or if you don't know which is available get sheet names and so that will tell you OK what she directly and what were the so what are those if you an so that that and then so on here and other workload and a print work with that she name to the which into the now you will notice and the information that's the up between of you and see it the notice that work will get sheet names when we look at the the actual spreadsheet itself we sought to worksheets clean data think about graph
14:41
roster but when I print the work book she named 3 Japan spending pleaded think about graph so that Japan's spending worksheet is actually hidden uh when I was I
14:53
was working on this talk initially under the bed and I was trying to figure were overlaid on the and so I started of the life should have them money so show she depends spending so that 2nd worksheet
15:06
up with that because you should a situation where they're trying to hide something from someone and you wanna heighten she well if it is look at the OIC it but if
15:14
they do is the sheet names it still visible so those the worksheets we have available to us the 1 we want is a clean data worksheet and so he had to create this variable called double worksheet and when I did she binary pattern watching and so now this WB is a workbook object and actually we do that here alive could important this is that you need to what look object again which is not something that native exists I 5 so compact so that work with work without work work so a what what that if it has those different categories that we saw before that's where the foreign direct reflection on it and again here have previous double worksheets which opposes the worksheet and we see that it is a what should idea that has a specific set of active user we see with the directory function so those the different things you can do with it and the 2 we're gonna focus on most here our was in the cell so you take a cell from worksheet but we're gonna look at the columns and the rose so each worksheet again and a half years with the
16:39
worksheet columns going down the rows go side to side in this case each row represents a particular record like similar to in relation
16:46
database so those are the definitive each we've got 4 the work should object and these work work in which it operates and the cell obvious will see later that a lot of different atoms or different options this could time with all the members who try to go over some of the highlights so we've got this worksheet and demo worksheet and so when 1 is the data submitted a grant by rows and so the worksheet that rose so it's over the rows here you notice it returns this generator object and so generate object worksheet that cells by row and so what this does is it gives us generate erotic instead of reading every row out a spreadsheet it creates a generator object and if you're not familiar with generators of a simple a similar to think about them is that a generator object is something that has a a collection of items but is to give you all the light at once it would give them to you 1 at a time and this helps to save memory so instead of having all 10 thousand rows of the spreadsheet that is generated about that would give
17:55
you 1 road attack is needed that helps to save memory of it but I put that out because if you say alkyl show me the rows you're not going to just get plane rows of plain text of event that you're going to get this generator of the thing to give you a road at time and so and then become the so what you get is when you print out this general again to this the model were to go so for row and then what I worked in that row so I wanna see with the road like you notice in each row and I'll also get printing the types and so you get generator object but when it returns each object returns a tuple of cells and so you get this cell clean data in 1 reduce so selling 1 B 1 cell 1 of this cleaned a worksheet and so a 1 b 1 c 1 that correspond to the 1st row and then that goes to the
19:04
sectoral Thomas so forth and so you have a tuple so this generator right to tuples and each element to push the cell which is again to to Boston about 1 so did conflict a cell is not a cells another over parts so the contour and we'll see why that's important for this moment the real 1 would take a look at cells and is 1 of the show you some of the differences about cells
19:32
so and so for cell and next demo wrote them or 2 that are 0 so the again them orchard @ rose the generator object if you have a generator I realize that it gives you when you item at a time to get 1 item at a time uses next function and so just put your pulling 1 item about that generated time and so I'm having a print out a cell and then the cell name so here so that column to some more common cells inside or out the world then using string formatting you to make that looks tonight and I'm printing the cellcell receiving information about the cell and the type of cell is to demonstrate that this is different databases different datatype it's not made by plants provide about the library so so 1 so angry data bits of the type cell and the value in the last line is going to show you what's actually in that cells like when you with a spreadsheet that would directly
20:29
after you have to sell directly data and so we see a 1 the value is employed you its costs and see what it's division
20:40
so so this row 1 is the headers and so on and so forth until we move on so I just pop up that are produced to to demonstrate that and so we got the cell values in tight now the the public by syllable smart enough to tried to take the data that's in a cell the values that the in cell and convert them to the appropriate pipeline data type so here we're looking at the worksheet you 1 that value and so also do this to demonstrate that observed driving things by road you can go to a specific cell if you want to so in this case we're going to the demo worksheet which is the clean data worksheet and we're grabbing so he 1 specifically in reducing its the value and we do the same thing the cell into so take wouldn't
21:28
spreadsheet cell E 1 is they'd worked the 2 is that 1st data and so properties in
21:39
the 2 so 80 and 180 181 of basing the setting is shown from so that's the value of the mass the type so the value here is that working at the string the value here is that 1st date that's showing it is displayed here this or that you knew that this the time that the time object and so I find those ahead is a date and so that's useful once so cell attributes again so why the solid which is that we have to work with what should object you have a solid that that have all these different atoms and the primary reason for that news because when you looking in a spreadsheet this is logic new this is prejudice
22:22
and all out more going on than just what's in the cell the cell itself and then just the value there recognized so is it is it always to the styling is going on whether you have things to happen with the cell so the cell objects contain all that you use them and the 1 we can use the most here this is the new the yeah the value you so we'll be doing so that value to give the actual value of that's the actual bit will want to work with however other data in the cell so if you need to know if a cell is colored a certain way or users are and have a father or something like that you can grab that information and do things with it as well but for our purposes will be focusing mostly on that you sell styles and
23:11
the results of the bad enough time to really for me to do a demo on the selfstyled but there also the thing you can do working with styles over by Excel documentation is of course on redox thank you culture but were making the thing work with real words so there's a lot of style information and what that might not seem like the most of what we re dealing with data 1 you might be in a situation with a spreadsheet situate giver being styled in a certain way so maybe I'll a number that's a loss is red and a number that's you know a properties dreams of like that you can actually make use of that information so you can actually pull that out and
23:48
if you need in a writer results to a spreadsheet you can write them in an fashion as well so that making it was ready to the left as exercise with you're so valetudinarian you do so example 1 aggregating country info what would this they want to do here is take this time she
24:05
and instead of having these individual lines for each day what we want is we want to see OK how much time to each employee work in this smart so this is during the 20 17 so I said how much time you simply work as much so when aggregate that information and a lot of the pipeline used is it's not uh particularly impressive but I wanted to point out the spot where we use regular pipeline make within a specific to open by itself so here you can use for the forces at a set up a dataset comprehension because I wanted to be able to I want a set of employed use I did want every occurrence of applied you double also uses sec coverage in here I'm trying hunter who use here at the conference by does that accepted talk on we also comprehensible comprehensions and so you can take you a see tree after might holiday astray here in you can't do that and so if you see tree asking the company to the teller mycode told that and so if you do a better job of explaining comprehensions that I will but it what do we do with for the plot it is you do with comprehension so here am to creating this set comprehension employ ID is just so I have a list of unique employee ideas and that's what this looks like here and then I'm using that sets but I call the employer ideas 1 and I'm using that to create a dictionary that takes the hours and amusing list comprehensions here was in 2nd grade this year I went out what all the hours for that employee I want a set comprehension of the cost and it employs only work in 1 cluster together as a coverage of 1 process a 2nd printed against glacially work 1 division he employs only have 1 manager so this is again the part about knowing your data so I guess a company into the building those things notice here the role of 6 that value for row and an orchard and rose so in this case the devil worship that rose is again that the generator object that's giving us a rotor time so you and saying I want wrote index 6 4 hours every day look back at spreadsheets we
26:32
see 1 2 3 4 5 6 7 because spreadsheets that it's a and this is where you to frequency this later python indexes from 0 so if you have a list of 4 items I thought without the 0 1 2 3 spreadsheet index from 1 and so will see in the code a little later we get we get adjusted so here's the index 0 1 2 3 4 that when you 0 1 2 3 4 5 6 so I'm doing the hours there but but I mean that road that value rose 6 row index which various cell idea in the rodent that value sample value and that matters to my hours doing the same thing at the cost of the division manager I have a Carl I Duchi way where work now in ICT in order so I tend to try to you want test things the letter to some that searching here because I know that should only 1 cost 1 division 1 manager for employees so I just you that the and the that built this employer aggregate object In the end so a lot of this regular pipeline here and at some specifics to compile excel and that print is employed aggregates object is the preprinted it's a dictionary just so it's clear so what you end up with a is an employee ID is the key and and the cost of the division and the number of hours that employs and so this suddenly takes take the
28:07
spreadsheet turn it into a dictionary which can also be used as a as a design object and I know from some filtering of despotism infamous Richard at but I should have 49 employees so that was happening here right now he we look at reading data from a spreadsheet and processing and turning it into something in thought the next thing becomes what if you've already got a Python program frightening and you want those results be written to a spreadsheet we do the same thing that by itself so 1st you need to create a work what can I do it very creative output book but it's you work with object um caught up with book and then you need to create as well as the you can create a specific sheet here on creating a she got up sheet so output what that creates sheet which is a discrete is a method that belongs to the work with object and giving it a name he aggregate time animosity this are you near 0 is 0 argument means is going to be 0 if item the end of the war otherwise by the fall when you create work what you will have a XI 1 of the 1st object so here I'm saying make this the 1st item in the sheet that we see and then we look at what we see that it's this workbook type object then I decide to build a header because I what is right the raw data to the spreadsheet I also want something to matters so spreadsheet looks sort of organize gonna give some enough so they can understand it and so is building a header here closely by this talking the values of the dental worksheet and so again on is accessing the cell directly and I'm pretty ahead and make sure it's it's what I wanted so I've got that same header and for our data I build this table it's it's a list of lists and and I move through this employee aggregate and I'm creating new row I'm I'm building these new rows so I want the row with the employee the cost and the division manager and the number of hours here is the number of hours they were aggregated but from the earlier dictionaries are and then he now I am assigning those values that are in this up again contract I'm writing them to the output sheet and I am if you have a nested that nested for loops here the right by row and by cell and here we see that 2 students we see this row index that I've got here and the column number but and values plus 1 because the indexes they come from I passed over 0 this person 1 so the was step and so here is the output data contract that built and so you see the 1st list is the header in the next list is the aggregate numbers each employs of the support and 60 hours in the months on so for so you can and
31:13
now I can say that so how could book that saved and then I give it a filename and so this is the file name of the file that is being written to me
31:25
and you know in
31:40
and and so this is the result of another study taking my time but the small metadata so there is this that represent real time sheet with the sound like a check totals the total there is fully into the Parliament's 5 and 5 thousand 386 hours if if I go back to the
32:08
original spreadsheet and at the same total 5 thousand
32:14
386 hours which is Q. the bottom perhaps in the front row and again this isn't the most complicated things but it gives you an idea of yeah if you have a lot more spreadsheets to work with 10 100 or you have a lot more data the last thing we do it is I can take that object that every time the object that I created and I can write it out at the Jet some file and so then what I end up with is this this
32:52
story but it was a and so I get visitors of
33:04
father like and then used to configure something else or dual processing about like that
33:12
so the problem that you run into so that really reading and
33:18
writing that that that sort of thing not terribly
33:20
complicated the problem that you run into that often a spreadsheet can be viewed as a visual medium and source so someone wants is pretty to look nice so that's pretty might not make sense to code if if the only spreadsheet you enter you
33:34
ever get is 1 that looks like this then we'll be fine did you got pretty well structured data but reality is if some of Boston's pretty to look a certain way and so it's
33:44
been laid out this value desktop publishing with or whatever and then you've got a sort of go through it and in those situations my access individual cells or you may be able to convince them to maybe change some of the formatting with the idea that a we can speed this up by 100 times and so you might have visual input you might have a review of requirement that was of you so you making
34:09
friends by helping to coworker have automate some of this simple tasks with pipeline the the pipeline here that interfere with with terribly complicated the copy into the problem was solved contained in so you can't use every dating and do something with it and write the tobacco uh that's what I've got so
34:28
am transition on twitter if you have questions or you that the slides in the code will be available in my DiscoTope repository the shortly so of few with the other Python C is the module you will use did Frida and actually get name the columns and don't deserve opened pipe souls support that a have supports that then call the it was so I hear I remain I really think that by road you also using by columns I mentioned that have the visual spreadsheets would not be a good candidate for this sort of approach are there any types of actual data structures in spreadsheets that dude that would not be good for operatic narcissist with the structures in french uh and on that's so if you have a lot of computations being happening in macros in England and that's something I had not worked with very much but if you have a lot of macro character going on that might you which is that if you me you should really use gravity the the data so a macro or the results of the backward isolation so that probably would on the whole different approach that involves run pipeline inside a spreadsheet I might add that to the start inactivated with this far Portland just files it is in the way to make the tables and I know he's do tendency to do to the table tennis and write it statically to excel sheet but is there any way to you at home you there is some if you look at it because just in my my analysis can get rid of PDA folks but if you go to the right so that if I believe it is a pivotal point in their of recall of accurately and the at the validity of the bill and that yeah I just before I ask Nancy acyclicity I worked a lot with oil and if you make a pivot table and then I'm making use of a range you can actually use something like this T. popular have a table see the sort of have a pivotal is a question for years Austria means they can can in and stability so they have know city busses stability problems in and out when you go with large amounts of data is there any streaming interface you familiar with Nietzsche the continent and the means streaming interface and so I haven't users recently with huge amount of data from so I think you I think the fact that it's not pulling an entire I've seen widespread use of college xl itself to the slowdown and run it slowly and so I haven't run this with those same to both of which support would be to be formatted but I think the fact that this is the creating generator object and return small things that I would probably knew that he would have acted tested I needed you created the thing with the wanted hi arms so your your itself was pretty nicely on these 2 formats in which deal love cells all rules that are merged in from the in the middle of the sheets for example and there's sum of money these of the console it has some capacity to deal with that but again that's sort of a knowing your data and everything and there has been tracking were so that you can you can access the individual cell so that might be a situation where you might need to access and the use of the land that sure how overtired cells to use that you visualize it does like the hidden worksheets and you can see that worship of heights of pseudo Institute names so I'm not sure how access sees those using the margin is individual thing that naturally you be used to accelerate the Eurex so I w t and how does it compare to the is 1 of the more common question that you with this talk I'll use those a little bit but when I started my thought his library at the time is 2013 so those 2 libraries xl colliding and so that did they would work with xlsx file and I think I would never do that at the time they would and so I put the little bit of this like at the identity operator which he converted to an actual file often uses electric user elaborated within fiction review will not reveal my ignorance real quick but what tool reusing tune brand new Python a browser chance sampling that was uh you know what's so this but what about available line the that's a different right see I think you couldn't a so
39:22
C SHS and in are